Probabilistic Rating Systemssergey/slides/N15_RatingsAIST.pdf · Rating systems This work...

32
Rating systems This work Probabilistic Rating Systems Sergey Nikolenko Steklov Mathematical Institute at St. Petersburg Laboratory for Internet Studies, National Research University Higher School of Economics, St. Petersburg April 9, 2015 Sergey Nikolenko Probabilistic rating systems

Transcript of Probabilistic Rating Systemssergey/slides/N15_RatingsAIST.pdf · Rating systems This work...

Page 1: Probabilistic Rating Systemssergey/slides/N15_RatingsAIST.pdf · Rating systems This work Probabilistic Rating Systems SergeyNikolenko Steklov Mathematical Institute at St. Petersburg

Rating systemsThis work

Probabilistic Rating Systems

Sergey Nikolenko

Steklov Mathematical Institute at St. Petersburg

Laboratory for Internet Studies,National Research University Higher School of Economics, St. Petersburg

April 9, 2015

Sergey Nikolenko Probabilistic rating systems

Page 2: Probabilistic Rating Systemssergey/slides/N15_RatingsAIST.pdf · Rating systems This work Probabilistic Rating Systems SergeyNikolenko Steklov Mathematical Institute at St. Petersburg

Rating systemsThis work

Problem setting and motivationTrueSkill

Outline

1 Rating systemsProblem setting and motivationTrueSkill

2 This workChanges in the problem settingModels and results

Sergey Nikolenko Probabilistic rating systems

Page 3: Probabilistic Rating Systemssergey/slides/N15_RatingsAIST.pdf · Rating systems This work Probabilistic Rating Systems SergeyNikolenko Steklov Mathematical Institute at St. Petersburg

Rating systemsThis work

Problem setting and motivationTrueSkill

My personal motivation

«What? Where? When?»: a team game of answeringquestions. Sometimes it looks like this...

Sergey Nikolenko Probabilistic rating systems

Page 4: Probabilistic Rating Systemssergey/slides/N15_RatingsAIST.pdf · Rating systems This work Probabilistic Rating Systems SergeyNikolenko Steklov Mathematical Institute at St. Petersburg

Rating systemsThis work

Problem setting and motivationTrueSkill

My personal motivation

...but usually it looks like this:

Sergey Nikolenko Probabilistic rating systems

Page 5: Probabilistic Rating Systemssergey/slides/N15_RatingsAIST.pdf · Rating systems This work Probabilistic Rating Systems SergeyNikolenko Steklov Mathematical Institute at St. Petersburg

Rating systemsThis work

Problem setting and motivationTrueSkill

My personal motivation

Teams of ≤ 6 players answer questions.

Whoever gets the most correct answers wins.

My motivation was to create a rating system that wouldpredict tournament results by team rosters.Characteristic features that make the problem hard:

it’s a hobby: players have no contracts, teams do not havepermanent rosters, playing for a different team is common;hence, we cannot just make a rating list of the teams, we needto go deeper, to individual players;but we do not know how players do, only team results;relatively few questions per tournament (36, 45, 60), hencemultiway ties;undersized teams are common.

Sergey Nikolenko Probabilistic rating systems

Page 6: Probabilistic Rating Systemssergey/slides/N15_RatingsAIST.pdf · Rating systems This work Probabilistic Rating Systems SergeyNikolenko Steklov Mathematical Institute at St. Petersburg

Rating systemsThis work

Problem setting and motivationTrueSkill

Introduction

In probabilistic rating models, Bayesian inference aims to finda linear ordering on a certain set given noisy comparisons ofrelatively small subsets of this set.

Useful whenever there is no way to compare a large number ofentities directly, but only partial (noisy) comparisons areavailable.

We will stick to the metaphor of matches and players.

Sergey Nikolenko Probabilistic rating systems

Page 7: Probabilistic Rating Systemssergey/slides/N15_RatingsAIST.pdf · Rating systems This work Probabilistic Rating Systems SergeyNikolenko Steklov Mathematical Institute at St. Petersburg

Rating systemsThis work

Problem setting and motivationTrueSkill

Introduction

Elo rating system: first probabilistic rating model (chess: twoplayers).

Bradley–Terry models: assume that each player has a “true”rating γi , and the win probability is proportional to this rating:γ1 wins over γ2 with probability γ1

γ1+γ2.

Inference: fit this model to the data from matches played.

Several extensions, but large matches are hard forBradley–Terry models.

The model that looked right to us for «What? Where?When» was TrueSkill.

Sergey Nikolenko Probabilistic rating systems

Page 8: Probabilistic Rating Systemssergey/slides/N15_RatingsAIST.pdf · Rating systems This work Probabilistic Rating Systems SergeyNikolenko Steklov Mathematical Institute at St. Petersburg

Rating systemsThis work

Problem setting and motivationTrueSkill

TrueSkill

TrueSkill was initially developed in MS Research for Xbox Livegaming servers [Graepel, Minka, Herbrich, 2007].

Given results of team competitions, learn the ratings of playersof these teams.

Direct application – matchmaking: find interesting opponentsfor a player or team.

[Graepel et al., 2010]: AdPredictor. Predicts CTRs ofadvertisements based on a set of features: the features are ateam, and the team wins whenever a user clicks the ad.

Sergey Nikolenko Probabilistic rating systems

Page 9: Probabilistic Rating Systemssergey/slides/N15_RatingsAIST.pdf · Rating systems This work Probabilistic Rating Systems SergeyNikolenko Steklov Mathematical Institute at St. Petersburg

Rating systemsThis work

Problem setting and motivationTrueSkill

TrueSkill factor graph

Sergey Nikolenko Probabilistic rating systems

Page 10: Probabilistic Rating Systemssergey/slides/N15_RatingsAIST.pdf · Rating systems This work Probabilistic Rating Systems SergeyNikolenko Steklov Mathematical Institute at St. Petersburg

Rating systemsThis work

Problem setting and motivationTrueSkill

TrueSkill

There is no evidence per se, it is incorporated in the structureof the graph, we just have to marginalize by message passing.The marginalization problem is complicated by the stepfunctions at the bottom; solved with Expectation Propagation[Minka, 2001]:

approximate messages from I(di > ε) and I(|di | ≤ ε) to di

with normal distributions;repeat message passing on the bottom layer of the graph untilconvergence.

Sergey Nikolenko Probabilistic rating systems

Page 11: Probabilistic Rating Systemssergey/slides/N15_RatingsAIST.pdf · Rating systems This work Probabilistic Rating Systems SergeyNikolenko Steklov Mathematical Institute at St. Petersburg

Rating systemsThis work

Problem setting and motivationTrueSkill

Example: a match of four players

Sergey Nikolenko Probabilistic rating systems

Page 12: Probabilistic Rating Systemssergey/slides/N15_RatingsAIST.pdf · Rating systems This work Probabilistic Rating Systems SergeyNikolenko Steklov Mathematical Institute at St. Petersburg

Rating systemsThis work

Problem setting and motivationTrueSkill

TrueSkill problems and solutions

As I said, TrueSkill looked perfect for «What? Where?When».But it didn’t really work due to the following properties of the«What? Where? When» dataset:

large multiway ties are common; it is common to have 30–40different places (because there were 35-50 questions in total)in a tournament with a thousand teams;teams vary in size (max 6 players, but often incomplete).

Sergey Nikolenko Probabilistic rating systems

Page 13: Probabilistic Rating Systemssergey/slides/N15_RatingsAIST.pdf · Rating systems This work Probabilistic Rating Systems SergeyNikolenko Steklov Mathematical Institute at St. Petersburg

Rating systemsThis work

Problem setting and motivationTrueSkill

TrueSkill problems and solutions

An undesired feature of TrueSkill is the assumption that ateam’s performance is the sum of player performances.

In many competitions (and comparison problems), anundersized team stands a very good chance against a full one,and it would be an unfair boost for the smaller team.

To alleviate the team performance formula problem, we simplyselect a different function (linear functions fit naturally).

Sergey Nikolenko Probabilistic rating systems

Page 14: Probabilistic Rating Systemssergey/slides/N15_RatingsAIST.pdf · Rating systems This work Probabilistic Rating Systems SergeyNikolenko Steklov Mathematical Institute at St. Petersburg

Rating systemsThis work

Problem setting and motivationTrueSkill

Multiway ties

Large multiway ties are deadly for TrueSkill. Consider fourteams in a tournament with performances p1, . . . , p4.

Team 1 has won, while teams 2–4, listed in this order, drewbehind the first.

Then the factor graph tells us that

t2 < t1 − ε, |t2 − t3| ≤ ε, |t3 − t4| ≤ ε.

Team 3’s performance t3 may actually nearly equal t1, and t4may exceed t1!

Sergey Nikolenko Probabilistic rating systems

Page 15: Probabilistic Rating Systemssergey/slides/N15_RatingsAIST.pdf · Rating systems This work Probabilistic Rating Systems SergeyNikolenko Steklov Mathematical Institute at St. Petersburg

Rating systemsThis work

Problem setting and motivationTrueSkill

Multiway ties

Moreover, these boundary cases are realized in practice whenunexpected results occur.

t2 < t1 − ε, |t2 − t3| ≤ ε, |t3 − t4| ≤ ε.

Suppose the winning team t1 was an underdog, and its priordistribution fell behind the priors of t2, t3, and t4, t4 being theprior leader of all four.

Then the maximum likelihood value of t4 is likely to exceed t1.

Sergey Nikolenko Probabilistic rating systems

Page 16: Probabilistic Rating Systemssergey/slides/N15_RatingsAIST.pdf · Rating systems This work Probabilistic Rating Systems SergeyNikolenko Steklov Mathematical Institute at St. Petersburg

Rating systemsThis work

Problem setting and motivationTrueSkill

Changes in the factor graph

For the multiway tie problem, we add another layer in thefactor graph, namely the layer of place performances li .

Each team performs in the ε-neighborhood of its placeperformance, and place performances relate to each other withstrict inequalities like l2 < l1 − 2ε.

Then it’s inference as usual. We have not experienced anyslowdown in convergence.

Sergey Nikolenko Probabilistic rating systems

Page 17: Probabilistic Rating Systemssergey/slides/N15_RatingsAIST.pdf · Rating systems This work Probabilistic Rating Systems SergeyNikolenko Steklov Mathematical Institute at St. Petersburg

Rating systemsThis work

Problem setting and motivationTrueSkill

New factor graph

Sergey Nikolenko Probabilistic rating systems

Page 18: Probabilistic Rating Systemssergey/slides/N15_RatingsAIST.pdf · Rating systems This work Probabilistic Rating Systems SergeyNikolenko Steklov Mathematical Institute at St. Petersburg

Rating systemsThis work

Problem setting and motivationTrueSkill

Experimental results

100 200 300 400 500 600 700

0.5

0.6

0.7

0.8

Tournaments

AUC

TSa TSbTS2a TS2bTS2c

Average AUC over a sliding window of 50 tournaments.

Sergey Nikolenko Probabilistic rating systems

Page 19: Probabilistic Rating Systemssergey/slides/N15_RatingsAIST.pdf · Rating systems This work Probabilistic Rating Systems SergeyNikolenko Steklov Mathematical Institute at St. Petersburg

Rating systemsThis work

Changes in the problem settingModels and results

Outline

1 Rating systemsProblem setting and motivationTrueSkill

2 This workChanges in the problem settingModels and results

Sergey Nikolenko Probabilistic rating systems

Page 20: Probabilistic Rating Systemssergey/slides/N15_RatingsAIST.pdf · Rating systems This work Probabilistic Rating Systems SergeyNikolenko Steklov Mathematical Institute at St. Petersburg

Rating systemsThis work

Changes in the problem settingModels and results

Changes

A couple of years ago, «What? Where? When?» tournamentdatabase started collecting question-wise data.

That is, we now know which specific questions a team hasanswered; previously we only had standings in a tournament.

So when I got back to the problem of «What? Where?When?» ratings, I found the problem greatly simplified.

Sergey Nikolenko Probabilistic rating systems

Page 21: Probabilistic Rating Systemssergey/slides/N15_RatingsAIST.pdf · Rating systems This work Probabilistic Rating Systems SergeyNikolenko Steklov Mathematical Institute at St. Petersburg

Rating systemsThis work

Changes in the problem settingModels and results

Changes

Sample relevant application:consider a test suite with many questions that tests something(e.g., IQ);participants answer a random subset of questions;we need to rate participants but questions are different, so thecomplexity level cannot be perfectly balanced.

«What? Where? When?» is just like that, but participants areworking on the test in teams.

Sergey Nikolenko Probabilistic rating systems

Page 22: Probabilistic Rating Systemssergey/slides/N15_RatingsAIST.pdf · Rating systems This work Probabilistic Rating Systems SergeyNikolenko Steklov Mathematical Institute at St. Petersburg

Rating systemsThis work

Changes in the problem settingModels and results

Notation

Notation:set of tournaments D = {d1, . . . , d|D|};each tournament contains a certain number nd of individualtasks Q(d), d ∈ D, and each question has binary results: it canbe either performed correctly or not;teams T (d) = {t(d)1 , . . . , t(d)

M(d) } participate in tournamentd ∈ D;each team t ∈ T (d) consists of players Pt = {p1, . . . , pmt }, atotal of mt players in the team;for each question q ∈ Q(d) we know for every participatingteam t ∈ T (d) whether this question has been answeredcorrectly; we denote these bits by xtq.

Sergey Nikolenko Probabilistic rating systems

Page 23: Probabilistic Rating Systemssergey/slides/N15_RatingsAIST.pdf · Rating systems This work Probabilistic Rating Systems SergeyNikolenko Steklov Mathematical Institute at St. Petersburg

Rating systemsThis work

Changes in the problem settingModels and results

Baseline: logistic regression

Baseline model – logistic regression; we model:each player i with his or her skill si ,each question q with its complexity score cq,add the global average µ,and train the logistic model

p(xtq | si , cq) ∼ σ(µ+ si + cq)

for each player i ∈ t of a participating team t ∈ T (d) and eachquestion q ∈ Q(d), where σ(x) = 1/(1+ ex) is the logisticsigmoid.

Sergey Nikolenko Probabilistic rating systems

Page 24: Probabilistic Rating Systemssergey/slides/N15_RatingsAIST.pdf · Rating systems This work Probabilistic Rating Systems SergeyNikolenko Steklov Mathematical Institute at St. Petersburg

Rating systemsThis work

Changes in the problem settingModels and results

Model with latent variables

The logistic model basically assumes that each playersuccessfully answered every question that the team hadanswered.

But in fact we do not know which player or players haveanswered.

We only can assume that if the team has failed then no onefrom this team has done it.

This situation is similar in spirit to presence-only data modelsfound in, e.g., ecology [Ward et al., 2009; Royle et al., 2012].

Sergey Nikolenko Probabilistic rating systems

Page 25: Probabilistic Rating Systemssergey/slides/N15_RatingsAIST.pdf · Rating systems This work Probabilistic Rating Systems SergeyNikolenko Steklov Mathematical Institute at St. Petersburg

Rating systemsThis work

Changes in the problem settingModels and results

Model with latent variables

Hence, a model with latent variables.

For each player-question pair, we add a latent variable ziq

which means “player i has answered question q”.For these variables, we have the following constraints:

if xtq = 0 then ziq = 0 for every player i ∈ t;if xtq = 1 then ziq = 1 for at least one player i ∈ t.

Sergey Nikolenko Probabilistic rating systems

Page 26: Probabilistic Rating Systemssergey/slides/N15_RatingsAIST.pdf · Rating systems This work Probabilistic Rating Systems SergeyNikolenko Steklov Mathematical Institute at St. Petersburg

Rating systemsThis work

Changes in the problem settingModels and results

Model with latent variables

Model parameters are still skill and complexity of the tasks:

p(ziq | si , cq) ∼ σ(µ+ si + cq).

Training with EM:E-step: fix all si and cq, compute expected values of latentvariables ziq as

E [ziq ] =

0 if xtq = 0,p(ziq = 1 | ∃j ∈ t zjq = 1) = σ(µ+si+cq)

1−∏

j∈t(1−σ(µ+sj+cq)), if xtq = 1;

M-step: fix E [ziq] and train the logistic model

E [ziq] ∼ σ(µ+ si + cq).

Sergey Nikolenko Probabilistic rating systems

Page 27: Probabilistic Rating Systemssergey/slides/N15_RatingsAIST.pdf · Rating systems This work Probabilistic Rating Systems SergeyNikolenko Steklov Mathematical Institute at St. Petersburg

Rating systemsThis work

Changes in the problem settingModels and results

Metrics and results

And, sure enough, it works fine.Evaluation metrics – ranking quality:

AUC – probability that the relative standings of a random pairof teams have been predicted correctly;WTA (winner takes all) – 1 if the top rated team has won thetournament, 0 otherwise;Top3 – correctly predicted teams among the top 3;Top5 – correctly predicted teams among the top 5;MAP (mean average precision) – correctly predicted teamsamong the top 10.

Sergey Nikolenko Probabilistic rating systems

Page 28: Probabilistic Rating Systemssergey/slides/N15_RatingsAIST.pdf · Rating systems This work Probabilistic Rating Systems SergeyNikolenko Steklov Mathematical Institute at St. Petersburg

Rating systemsThis work

Changes in the problem settingModels and results

Results

MAP

0.6

0.7

0.8

AUC

04.20

12

07.20

12

10.20

12

01.20

13

05.20

13

08.20

13

11.20

13

03.20

14

06.20

14

09.20

140.7

0.8

0.9

Sergey Nikolenko Probabilistic rating systems

Page 29: Probabilistic Rating Systemssergey/slides/N15_RatingsAIST.pdf · Rating systems This work Probabilistic Rating Systems SergeyNikolenko Steklov Mathematical Institute at St. Petersburg

Rating systemsThis work

Changes in the problem settingModels and results

Implementation

Sergey Nikolenko Probabilistic rating systems

Page 30: Probabilistic Rating Systemssergey/slides/N15_RatingsAIST.pdf · Rating systems This work Probabilistic Rating Systems SergeyNikolenko Steklov Mathematical Institute at St. Petersburg

Rating systemsThis work

Changes in the problem settingModels and results

Implementation

Sergey Nikolenko Probabilistic rating systems

Page 31: Probabilistic Rating Systemssergey/slides/N15_RatingsAIST.pdf · Rating systems This work Probabilistic Rating Systems SergeyNikolenko Steklov Mathematical Institute at St. Petersburg

Rating systemsThis work

Changes in the problem settingModels and results

Final points

So here’s the takeaway:try to collect new data! the new model is much simpler thanTrueSkill because we have new data available;do not be afraid to work on your passions; if you are excitedabout the problem, you will make better progress, and «real»applications will find you.

RuSSIR 2015: August 24–28, St. Petersburg, Higher School ofEconomics.

Sergey Nikolenko Probabilistic rating systems

Page 32: Probabilistic Rating Systemssergey/slides/N15_RatingsAIST.pdf · Rating systems This work Probabilistic Rating Systems SergeyNikolenko Steklov Mathematical Institute at St. Petersburg

Rating systemsThis work

Changes in the problem settingModels and results

Thank you!

Thank you for your attention!

Sergey Nikolenko Probabilistic rating systems