Optimizing Web Traffic via the Media Scheduling …...3.2% Lars Backstrom, Jon Kleinberg, Ravi Kumar...

28
Optimizing Web Traffic via the Media Scheduling Problem Lars Backstrom 1 Jon Kleinberg 1 Ravi Kumar 2 1 Cornell University 2 Yahoo! Research June 30, 2009 Lars Backstrom, Jon Kleinberg, Ravi Kumar Optimizing Web Traffic via the Media Scheduling Problem

Transcript of Optimizing Web Traffic via the Media Scheduling …...3.2% Lars Backstrom, Jon Kleinberg, Ravi Kumar...

Page 1: Optimizing Web Traffic via the Media Scheduling …...3.2% Lars Backstrom, Jon Kleinberg, Ravi Kumar Optimizing Web Tra c via the Media Scheduling Problem Utility Decay on Yahoo! Given

Optimizing Web Traffic via the Media SchedulingProblem

Lars Backstrom1

Jon Kleinberg1

Ravi Kumar2

1Cornell University

2Yahoo! Research

June 30, 2009

Lars Backstrom, Jon Kleinberg, Ravi Kumar Optimizing Web Traffic via the Media Scheduling Problem

Page 2: Optimizing Web Traffic via the Media Scheduling …...3.2% Lars Backstrom, Jon Kleinberg, Ravi Kumar Optimizing Web Tra c via the Media Scheduling Problem Utility Decay on Yahoo! Given

Introduction

Featured items are common on many web pages

Amazon.com has a featured productFlickr.com has a featured photoYouTube.com has featured video(s)yahoo.com has a featured news story

Lars Backstrom, Jon Kleinberg, Ravi Kumar Optimizing Web Traffic via the Media Scheduling Problem

Page 3: Optimizing Web Traffic via the Media Scheduling …...3.2% Lars Backstrom, Jon Kleinberg, Ravi Kumar Optimizing Web Tra c via the Media Scheduling Problem Utility Decay on Yahoo! Given

Introduction

Featured items are common on many web pages

Amazon.com has a featured productFlickr.com has a featured photoYouTube.com has featured video(s)yahoo.com has a featured news story

Lars Backstrom, Jon Kleinberg, Ravi Kumar Optimizing Web Traffic via the Media Scheduling Problem

Page 4: Optimizing Web Traffic via the Media Scheduling …...3.2% Lars Backstrom, Jon Kleinberg, Ravi Kumar Optimizing Web Tra c via the Media Scheduling Problem Utility Decay on Yahoo! Given

Introduction

Featured items are common on many web pages

Amazon.com has a featured productFlickr.com has a featured photoYouTube.com has featured video(s)yahoo.com has a featured news story

Lars Backstrom, Jon Kleinberg, Ravi Kumar Optimizing Web Traffic via the Media Scheduling Problem

Page 5: Optimizing Web Traffic via the Media Scheduling …...3.2% Lars Backstrom, Jon Kleinberg, Ravi Kumar Optimizing Web Tra c via the Media Scheduling Problem Utility Decay on Yahoo! Given

Introduction

Featured items are common on many web pages

Amazon.com has a featured productFlickr.com has a featured photoYouTube.com has featured video(s)yahoo.com has a featured news story

Lars Backstrom, Jon Kleinberg, Ravi Kumar Optimizing Web Traffic via the Media Scheduling Problem

Page 6: Optimizing Web Traffic via the Media Scheduling …...3.2% Lars Backstrom, Jon Kleinberg, Ravi Kumar Optimizing Web Tra c via the Media Scheduling Problem Utility Decay on Yahoo! Given

Utility

In all these cases, some utility is gained through the featureditems

General user interestAd revenue on linked pagesProduct sales

Abstractly, some utility is gained per impression

At a high level, our goal is to maximize the utility gained fromthe featured item slot

In study of yahoo.com, we consider the unit of utility to beclicks

Lars Backstrom, Jon Kleinberg, Ravi Kumar Optimizing Web Traffic via the Media Scheduling Problem

Page 7: Optimizing Web Traffic via the Media Scheduling …...3.2% Lars Backstrom, Jon Kleinberg, Ravi Kumar Optimizing Web Tra c via the Media Scheduling Problem Utility Decay on Yahoo! Given

Framework

All visitors to the site will view the same featured item (nopersonalization)

The website operator has a pool of items that can potentiallybe featured over the course of a day

Available items are known ahead of timeQualities of items are also known ahead of time (perhapsthrough bucket testing)

Items will be presented during only one contiguous interval

Example:

Available articles for day: {How to spot fake money, Top 10Summer Movies, Britney Spears ...}

Lars Backstrom, Jon Kleinberg, Ravi Kumar Optimizing Web Traffic via the Media Scheduling Problem

Page 8: Optimizing Web Traffic via the Media Scheduling …...3.2% Lars Backstrom, Jon Kleinberg, Ravi Kumar Optimizing Web Tra c via the Media Scheduling Problem Utility Decay on Yahoo! Given

The Problem

0 200 400 600 800 1000 1200 1400

Vis

its

Minutes

Traffic over One Day

Three components to our problem

Varying Traffic – traffic to any webpage varies over the courseof a day, with peak traffic typically reached around middayStaleness – Utility decays over time when we leave an item inthe featured spotItem Variation – Some items are inherently better (higherutility) than others

Lars Backstrom, Jon Kleinberg, Ravi Kumar Optimizing Web Traffic via the Media Scheduling Problem

Page 9: Optimizing Web Traffic via the Media Scheduling …...3.2% Lars Backstrom, Jon Kleinberg, Ravi Kumar Optimizing Web Tra c via the Media Scheduling Problem Utility Decay on Yahoo! Given

The Problem

0 10 20 30 40 50 60 70 80

CT

R

Minutes

Declining CTR over Time

Three components to our problem

Varying Traffic – traffic to any webpage varies over the courseof a day, with peak traffic typically reached around middayStaleness – Utility decays over time when we leave an item inthe featured spotItem Variation – Some items are inherently better (higherutility) than others

Lars Backstrom, Jon Kleinberg, Ravi Kumar Optimizing Web Traffic via the Media Scheduling Problem

Page 10: Optimizing Web Traffic via the Media Scheduling …...3.2% Lars Backstrom, Jon Kleinberg, Ravi Kumar Optimizing Web Tra c via the Media Scheduling Problem Utility Decay on Yahoo! Given

The Problem

5 10 15 20 25 30 35

CT

R

Minutes

CTR Decay of Different Articles

Three components to our problem

Varying Traffic – traffic to any webpage varies over the courseof a day, with peak traffic typically reached around middayStaleness – Utility decays over time when we leave an item inthe featured spotItem Variation – Some items are inherently better (higherutility) than others

Lars Backstrom, Jon Kleinberg, Ravi Kumar Optimizing Web Traffic via the Media Scheduling Problem

Page 11: Optimizing Web Traffic via the Media Scheduling …...3.2% Lars Backstrom, Jon Kleinberg, Ravi Kumar Optimizing Web Tra c via the Media Scheduling Problem Utility Decay on Yahoo! Given

Varying Traffic

Traffic is highly variable, with significant difference betweenpeak and offpeak times

Each day has a slightly different shape, but traffic is mostlyconsistent from one week to the next

Important because it means we can accurately predict trafficahead of time

Lars Backstrom, Jon Kleinberg, Ravi Kumar Optimizing Web Traffic via the Media Scheduling Problem

Page 12: Optimizing Web Traffic via the Media Scheduling …...3.2% Lars Backstrom, Jon Kleinberg, Ravi Kumar Optimizing Web Tra c via the Media Scheduling Problem Utility Decay on Yahoo! Given

Staleness

Items become lessvaluable the longerthey appear on a site

A significant fractionof all visitors will bereturning

0 50 100 150 200 250

Clic

k-T

hrou

gh R

ate

Minutes

Fits of a single article

Actual DataBest fit power lay decay

Best fit linear decayBest fit exponential decay

If a visitor returns to the same featured item, typically one oftwo things will have already happened

He has already rejected the item, and will not ‘consume’ itHe has already consumed the item and will not do so again

Utility per impression decays with time

We say the utility of a item i after t minutes if fi (t)

Note that in figure here, exponential decay seems to fit best

Lars Backstrom, Jon Kleinberg, Ravi Kumar Optimizing Web Traffic via the Media Scheduling Problem

Page 13: Optimizing Web Traffic via the Media Scheduling …...3.2% Lars Backstrom, Jon Kleinberg, Ravi Kumar Optimizing Web Tra c via the Media Scheduling Problem Utility Decay on Yahoo! Given

Item Variation

Cou

nt

Quality

Article Quality Distribution

The utility of items naturally varies according to the itemIn most cases this variation can be observed ahead of time

For some things, like products, historical sales data can be usedIn other cases we can use ‘bucket testing’ to discover thisvariation

For short intervals divide all users into many ‘buckets’Show users within each bucket a different itemUse gathered data to gauge item quality

Best items may be order of magnitude better than average

Lars Backstrom, Jon Kleinberg, Ravi Kumar Optimizing Web Traffic via the Media Scheduling Problem

Page 14: Optimizing Web Traffic via the Media Scheduling …...3.2% Lars Backstrom, Jon Kleinberg, Ravi Kumar Optimizing Web Tra c via the Media Scheduling Problem Utility Decay on Yahoo! Given

The Media Scheduling Problem Formalized

InputsNumber of visitors γτ during minute τSet of N items with associated value functions fi (t) givingexpected utility per impression

OutputNon-overlapping intervals [Si ,Ti ] for each article i

Goal is to maximize the total utility:∑i

Ti∑τ=Si

γτ fi (τ − Si )

Lars Backstrom, Jon Kleinberg, Ravi Kumar Optimizing Web Traffic via the Media Scheduling Problem

Page 15: Optimizing Web Traffic via the Media Scheduling …...3.2% Lars Backstrom, Jon Kleinberg, Ravi Kumar Optimizing Web Tra c via the Media Scheduling Problem Utility Decay on Yahoo! Given

Dataset

Items are ‘featured news’ articles – these are rarely breakingnews

Comes from yahoo.com server logs

Recorded over three weeks in 2008

Captures page views and click rates

Our measure of utility here is clicks, so fi (·) is the CTR

Lars Backstrom, Jon Kleinberg, Ravi Kumar Optimizing Web Traffic via the Media Scheduling Problem

Page 16: Optimizing Web Traffic via the Media Scheduling …...3.2% Lars Backstrom, Jon Kleinberg, Ravi Kumar Optimizing Web Tra c via the Media Scheduling Problem Utility Decay on Yahoo! Given

Utility Decay on Yahoo!

5 10 15 20 25 30 35

CT

R

Minutes

CTR Decay of Different Articles

Item quality varies greatly between ‘best’ and ‘worst’ articles,perhaps by an order of magnitude

However, given initial quality fi (0), articles share similar decayfunctions

We find that all articles can be aligned to a single ‘universal’decay function such that the average relative error is only3.2%

Lars Backstrom, Jon Kleinberg, Ravi Kumar Optimizing Web Traffic via the Media Scheduling Problem

Page 17: Optimizing Web Traffic via the Media Scheduling …...3.2% Lars Backstrom, Jon Kleinberg, Ravi Kumar Optimizing Web Tra c via the Media Scheduling Problem Utility Decay on Yahoo! Given

Utility Decay on Yahoo!

Item quality varies greatly between ‘best’ and ‘worst’ articles,perhaps by an order of magnitude

However, given initial quality fi (0), articles share similar decayfunctions

We find that all articles can be aligned to a single ‘universal’decay function such that the average relative error is only3.2%

Lars Backstrom, Jon Kleinberg, Ravi Kumar Optimizing Web Traffic via the Media Scheduling Problem

Page 18: Optimizing Web Traffic via the Media Scheduling …...3.2% Lars Backstrom, Jon Kleinberg, Ravi Kumar Optimizing Web Tra c via the Media Scheduling Problem Utility Decay on Yahoo! Given

Utility Decay on Yahoo!

Given universal decay function g(·), fi (t) = g(t + σi )

Furthermore, universal decay is quite similar to exponentialdecay

Single universal exponential parameter λ gives average relativeerror of 4.6%

All this suggests that if we know fi (0) we know fi (t) for all t

Lars Backstrom, Jon Kleinberg, Ravi Kumar Optimizing Web Traffic via the Media Scheduling Problem

Page 19: Optimizing Web Traffic via the Media Scheduling …...3.2% Lars Backstrom, Jon Kleinberg, Ravi Kumar Optimizing Web Tra c via the Media Scheduling Problem Utility Decay on Yahoo! Given

Algorithms

General problem is NP-Hard

Naively examine N! permutationsBetter optimal algorithm uses dynamic programming – takesO(T 2N2N) where T is the number of discrete time units

To do better, we need to use structure from problem observedin data

Recall that to close approximation, we observed thatfi (t) = g(t + σi ) and that g(·) is monotonically decreasingConsider case where traffic pattern γ is monotonicallyincreasingOptimal ordering is from worst to best

To prove, we consider an inversion in this orderingWill show that we can correct this inversion to get an orderingwhich is no worse

Lars Backstrom, Jon Kleinberg, Ravi Kumar Optimizing Web Traffic via the Media Scheduling Problem

Page 20: Optimizing Web Traffic via the Media Scheduling …...3.2% Lars Backstrom, Jon Kleinberg, Ravi Kumar Optimizing Web Tra c via the Media Scheduling Problem Utility Decay on Yahoo! Given

Proof for increasing traffic case

Claim: Ordering from worst to best is optimal

Consider an inversion where a better article was placed first

We can swap the two articlesWe get lower CTR in the beginningWe get higher CTR later onArea of gain is equal to area of loss, but traffic is higher ingain region

Lars Backstrom, Jon Kleinberg, Ravi Kumar Optimizing Web Traffic via the Media Scheduling Problem

Page 21: Optimizing Web Traffic via the Media Scheduling …...3.2% Lars Backstrom, Jon Kleinberg, Ravi Kumar Optimizing Web Tra c via the Media Scheduling Problem Utility Decay on Yahoo! Given

Proof for increasing traffic case

Claim: Ordering from worst to best is optimal

Consider an inversion where a better article was placed firstWe can swap the two articles

We get lower CTR in the beginningWe get higher CTR later onArea of gain is equal to area of loss, but traffic is higher ingain region

Lars Backstrom, Jon Kleinberg, Ravi Kumar Optimizing Web Traffic via the Media Scheduling Problem

Page 22: Optimizing Web Traffic via the Media Scheduling …...3.2% Lars Backstrom, Jon Kleinberg, Ravi Kumar Optimizing Web Tra c via the Media Scheduling Problem Utility Decay on Yahoo! Given

Proof for increasing traffic case

Claim: Ordering from worst to best is optimal

Consider an inversion where a better article was placed firstWe can swap the two articlesWe get lower CTR in the beginning

We get higher CTR later onArea of gain is equal to area of loss, but traffic is higher ingain region

Lars Backstrom, Jon Kleinberg, Ravi Kumar Optimizing Web Traffic via the Media Scheduling Problem

Page 23: Optimizing Web Traffic via the Media Scheduling …...3.2% Lars Backstrom, Jon Kleinberg, Ravi Kumar Optimizing Web Tra c via the Media Scheduling Problem Utility Decay on Yahoo! Given

Proof for increasing traffic case

Claim: Ordering from worst to best is optimal

Consider an inversion where a better article was placed firstWe can swap the two articlesWe get lower CTR in the beginningWe get higher CTR later onArea of gain is equal to area of loss, but traffic is higher ingain region

Lars Backstrom, Jon Kleinberg, Ravi Kumar Optimizing Web Traffic via the Media Scheduling Problem

Page 24: Optimizing Web Traffic via the Media Scheduling …...3.2% Lars Backstrom, Jon Kleinberg, Ravi Kumar Optimizing Web Tra c via the Media Scheduling Problem Utility Decay on Yahoo! Given

Solving the problem

Decreasing traffic is similar: order from best to worstTo find exact interval lengths, use dynamic programmingOrder is known, compute the value at time t using first n ofthe N items as opt(t, n) = maxt′ opt(t ′, n−1) + value(t, t ′, n)

Lars Backstrom, Jon Kleinberg, Ravi Kumar Optimizing Web Traffic via the Media Scheduling Problem

Page 25: Optimizing Web Traffic via the Media Scheduling …...3.2% Lars Backstrom, Jon Kleinberg, Ravi Kumar Optimizing Web Tra c via the Media Scheduling Problem Utility Decay on Yahoo! Given

Solving the problem

When traffic unimodal, items on each side of peak are orderedfrom best to worst as they fall away from peakAllows dynamic programming algorithm

Try all possibilities for base case article straddling peakOrder remaining articles from best to worstUse dynamic programming to compute optimal solution forinterval [a, b) using first n items:opt(a, b, n) = max(

maxt opt(a, t, n − 1) + value(t, b, n) ,maxt opt(t, b, n − 1) + value(a, t, n) )

Lars Backstrom, Jon Kleinberg, Ravi Kumar Optimizing Web Traffic via the Media Scheduling Problem

Page 26: Optimizing Web Traffic via the Media Scheduling …...3.2% Lars Backstrom, Jon Kleinberg, Ravi Kumar Optimizing Web Tra c via the Media Scheduling Problem Utility Decay on Yahoo! Given

Results

0 100 200 300 400 500 600 700 800 900

Vie

ws

Clic

k-T

hrou

gh R

ate

Minutes

Views Our Algorithm

Optimal Scheduling

Figure shows trafficover one day in red –note that it is closeto, but not quiteunimodal

Article CTRs are shown for two schedules: optimal and thatof our algorithm

Similar, but not quite the same due to lack of completeunimodality

Over 21 day observation period, always within 0.1% of optimal

Compared to actual schedule picked by human editors, a 26%improvement in total clicks

Lars Backstrom, Jon Kleinberg, Ravi Kumar Optimizing Web Traffic via the Media Scheduling Problem

Page 27: Optimizing Web Traffic via the Media Scheduling …...3.2% Lars Backstrom, Jon Kleinberg, Ravi Kumar Optimizing Web Tra c via the Media Scheduling Problem Utility Decay on Yahoo! Given

Generative Model

0 10 20 30 40 50 60 70

Clic

k-T

hrou

gh R

ate

Minutes Since First Display

Declining Click-Through Rate for a Typical Article and Simulated CTR

Click-Through RateSimulation Results

Declining CTR can be explained by a generative model takinga few user traits into account

Distribution of visit rates for different usersGiven overall visit rate, the distribution of interarrival time gapsThe attentuation curve – the chance of clicking an articlegiven that a user has returned to the page and seen the samearticle K times

Putting these factors together, we can simulate users and findthat these three ingredients explain the declining click throughrates we observeLars Backstrom, Jon Kleinberg, Ravi Kumar Optimizing Web Traffic via the Media Scheduling Problem

Page 28: Optimizing Web Traffic via the Media Scheduling …...3.2% Lars Backstrom, Jon Kleinberg, Ravi Kumar Optimizing Web Tra c via the Media Scheduling Problem Utility Decay on Yahoo! Given

Further Directions

What can we do ifnew items appearduring the day?

How can these results be combined with personalization?

What if there are multiple featured items?

To what extent do these results generalize to other datasets?

Are the conditions here approximately met elsewhere also

Lars Backstrom, Jon Kleinberg, Ravi Kumar Optimizing Web Traffic via the Media Scheduling Problem