Learning to Rank - University of Texas at Austinmooney/ir-course/slides/Ravikumar/IR_LTR.pdf · Is...

Post on 19-Jun-2020

2 views 0 download

Transcript of Learning to Rank - University of Texas at Austinmooney/ir-course/slides/Ravikumar/IR_LTR.pdf · Is...

Learning to Rank

Pradeep Ravikumar pradeepr@cs.utexas.edu !CS 371R !Adapted from Hang Li, MSR Asia, 09!

Ranking

Coke or Pepsi or Sprite)?

Ranking

Ranking Chess Players

Ranking

Ranking Chess Players

Golf Players, etc.

Whither Machine Learning?

Machine Learning Magic Box

Coke, Pepsi, Sprite Coke > Sprite > Pepsi

List of Objects Ranking of Objects

Whither Machine Learning?

Machine Learning Magic Box

Coke, Pepsi, Sprite Coke > Sprite > Pepsi

List of Objects Ranking of Objects

From “training” data

RankingRanking(Problem

16

sns

s

s

o

o

o

,

2,

1,

�s

request

offerings

ranking(of(offerings

� �NoooO ,,, 21 ��

),( osf

Objects

Ranking of Objects

Query

Ranking via Machine Learning

Learning(to(Rank

1,1

2,1

1,1

1

no

o

os

��

mnm

m

m

m

o

o

os

,

2,

1,

Learning(System

Ranking(System

1�ms

),(

),( ),(

11 ,11,1

2,112,1

1,111,1

�� ���

���

���

mm nmmnm

mmm

mmm

osfo

osfo

osfo

),( osf

17

� �NoooO ,,, 21 ��

“Training” data

Is learning to rank necessary for IR?

• Has been successfully used in web-document search

• Has been the focus of increasing research at the intersection of machine learning + IR

Example: Machine TranslationRe5Ranking(in(Machine(Translation

138

Re5Ranking(Model

1000

2

1

e

ee

�ranked(sentencecandidates(in(target(language

GenerativeModel

f

sentence(source(language

e~re5ranked(topsentence(in(targetlanguage(

Example: Collaborative FilteringCollaborative(Filtering

130

Item1 I tem2 Item3 ...

User1 5 4

User2 1 2 2

... ? ? ?

User M 4 3

Example: Collaborative FilteringCollaborative(Filtering

130

Item1 I tem2 Item3 ...

User1 5 4

User2 1 2 2

... ? ? ?

User M 4 3

Input: User Ratings on some itemsOutput: User Ratings on other items

Example: Collaborative FilteringCollaborative(Filtering

130

Item1 I tem2 Item3 ...

User1 5 4

User2 1 2 2

... ? ? ?

User M 4 3

Any user is a query; based on this query, rank the set of items (based on user-item features)

Example: Key Phrase Extraction

• Input: Document

• Output: Key phrases in document

• Pre-processing step: extraction of possible phrases in document

• Ranking approach: rank the extracted phrases

Other IR applications of learning to rank

• Question Answering

• Document Summarization Opinion Mining

• Sentiment Analysis

• Machine Translation

• Sentence Parsing

• ….

Our focus: ranking documentsRanking(Problem:(Example(=(Document(Retrieval

qnq

q

q

d

d

d

,

2,

1,

qquery

documents

ranking(of(documents

� �NdddD ,,, 21 ��

),( dqf

20

ranking(based(on(relevance,(importance,(

preference

Traditional approach to “learning to rank”

Probabilistic(Model

Nd

dd

2

1

q

),|(~

),|(~),|(~

22

11

nn dqrPd

dqrPddqrPd

query

documents

ranking(of(documents

}0,1{),|(

�RdqrP

23

Modern approach to learning to rank

Learning(to(Rank

1,1

2,1

1,1

1

nd

d

dq

mnm

m

m

m

d

d

dq

,

2,

1,

Learning(System

Ranking(System

1�mq

),(

),( ),(

11 ,11,1

2,112,1

1,111,1

�� ���

���

���

mm nmmnm

mmm

mmm

dqfd

dqfd

dqfd

),( dqf

26

� �NdddD ,,, 21 ��

Learning to Rank: Training Process

1.(Data(Labeling(rank) 3.(Learning

)(xf

2.(Feature(Extraction

&&

%

&&

$

#

1,1

2,1

1,1

1

nd

d

d

q�

&&

%

&&

$

#

mnm

m

m

m

d

d

d

q

,

2,

1,

&&

%

&&

$

#

11 ,1,1

2,12,1

1,11,1

1

nn yd

yd

yd

q�

&&

%

&&

$

#

mm nmnm

mm

mm

m

yd

yd

yd

q

,,

2,2,

1,1,

&&

%

&&

$

#

11 ,1,1

2,12,1

1,11,1

nn yx

yx

yx

&&

%

&&

$

#

mm nmnm

mm

mm

yx

yx

yx

,,

2,2,

1,1,

Training(Process

27

Learning to Rank: Training Process

1.(Data(Labeling(rank) 3.(Learning

)(xf

2.(Feature(Extraction

&&

%

&&

$

#

1,1

2,1

1,1

1

nd

d

d

q�

&&

%

&&

$

#

mnm

m

m

m

d

d

d

q

,

2,

1,

&&

%

&&

$

#

11 ,1,1

2,12,1

1,11,1

1

nn yd

yd

yd

q�

&&

%

&&

$

#

mm nmnm

mm

mm

m

yd

yd

yd

q

,,

2,2,

1,1,

&&

%

&&

$

#

11 ,1,1

2,12,1

1,11,1

nn yx

yx

yx

&&

%

&&

$

#

mm nmnm

mm

mm

yx

yx

yx

,,

2,2,

1,1,

Training(Process

27

Learning to Rank: Training Process

1.(Data(Labeling(rank) 3.(Learning

)(xf

2.(Feature(Extraction

&&

%

&&

$

#

1,1

2,1

1,1

1

nd

d

d

q�

&&

%

&&

$

#

mnm

m

m

m

d

d

d

q

,

2,

1,

&&

%

&&

$

#

11 ,1,1

2,12,1

1,11,1

1

nn yd

yd

yd

q�

&&

%

&&

$

#

mm nmnm

mm

mm

m

yd

yd

yd

q

,,

2,2,

1,1,

&&

%

&&

$

#

11 ,1,1

2,12,1

1,11,1

nn yx

yx

yx

&&

%

&&

$

#

mm nmnm

mm

mm

yx

yx

yx

,,

2,2,

1,1,

Training(Process

27

features are functions of query AND document

Learning to Rank: Testing Process

1.(Data(Labeling(rank)

2.(Feature(Extraction

Testing(Process

28

3.((Ranking(with )(xf

&&

%

&&

$

#

��� ���

���

���

)(

)( )(

111 ,1,1,1

2,12,12,1

1,11,11,1

mmm nmnmnm

mmm

mmm

yxfx

yxfx

yxfx

4.((Evaluation

EvaluationResult

&&

%

&&

$

#

��

1,1

2,1

1,1

1

mnm

m

m

m

d

d

d

q�

&&

%

&&

$

#

�� ��

��

��

11 ,1,1

2,12,1

1,11,1

1

mm nmnm

mm

mm

m

yd

yd

yd

q�

&&

%

&&

$

#

�� ��

��

��

11 ,1,1

2,12,1

1,11,1

mm nmnm

mm

mm

yx

yx

yx

Learning to Rank: Testing Process

1.(Data(Labeling(rank)

2.(Feature(Extraction

Testing(Process

28

3.((Ranking(with )(xf

&&

%

&&

$

#

��� ���

���

���

)(

)( )(

111 ,1,1,1

2,12,12,1

1,11,11,1

mmm nmnmnm

mmm

mmm

yxfx

yxfx

yxfx

4.((Evaluation

EvaluationResult

&&

%

&&

$

#

��

1,1

2,1

1,1

1

mnm

m

m

m

d

d

d

q�

&&

%

&&

$

#

�� ��

��

��

11 ,1,1

2,12,1

1,11,1

1

mm nmnm

mm

mm

m

yd

yd

yd

q�

&&

%

&&

$

#

�� ��

��

��

11 ,1,1

2,12,1

1,11,1

mm nmnm

mm

mm

yx

yx

yx

Ground truth ranks

Learning to Rank: Testing Process

1.(Data(Labeling(rank)

2.(Feature(Extraction

Testing(Process

28

3.((Ranking(with )(xf

&&

%

&&

$

#

��� ���

���

���

)(

)( )(

111 ,1,1,1

2,12,12,1

1,11,11,1

mmm nmnmnm

mmm

mmm

yxfx

yxfx

yxfx

4.((Evaluation

EvaluationResult

&&

%

&&

$

#

��

1,1

2,1

1,1

1

mnm

m

m

m

d

d

d

q�

&&

%

&&

$

#

�� ��

��

��

11 ,1,1

2,12,1

1,11,1

1

mm nmnm

mm

mm

m

yd

yd

yd

q�

&&

%

&&

$

#

�� ��

��

��

11 ,1,1

2,12,1

1,11,1

mm nmnm

mm

mm

yx

yx

yx

Ground truth ranks

Learning to Rank: Testing Process

1.(Data(Labeling(rank)

2.(Feature(Extraction

Testing(Process

28

3.((Ranking(with )(xf

&&

%

&&

$

#

��� ���

���

���

)(

)( )(

111 ,1,1,1

2,12,12,1

1,11,11,1

mmm nmnmnm

mmm

mmm

yxfx

yxfx

yxfx

4.((Evaluation

EvaluationResult

&&

%

&&

$

#

��

1,1

2,1

1,1

1

mnm

m

m

m

d

d

d

q�

&&

%

&&

$

#

�� ��

��

��

11 ,1,1

2,12,1

1,11,1

1

mm nmnm

mm

mm

m

yd

yd

yd

q�

&&

%

&&

$

#

�� ��

��

��

11 ,1,1

2,12,1

1,11,1

mm nmnm

mm

mm

yx

yx

yx

Ground truth ranks

Issues in learning to rank

• Data Labeling

• Feature Extraction

• Evaluation Measure

• Learning Method (Model, Loss Function, Algorithm)

Data Labeling

Data(Labeling(Problem� E.g.,(relevance(of(documents(w.r.t.(query

32

Doc(A

Doc(B

Doc(C

Query

• Ground truth/supervision: “ranks” of set of documents/objects

Data Labeling: Supervision takes different forms

• Multiple levels (e.g., relevant, partially relevant, irrelevant)

‣ Widely used in IR

• Ordered Pairs

‣ e.g. ordered pairs between documents (e.g. A>B, B>C)

‣ e.g. implicit relevance judgments derived from click-through data

• Fully ordered list (i.e. a permutation) of documents is given

‣ Ideal but difficult to implement

Data Labeling: Implicit Relevance JudgementImplicit(Relevance(Judgment

Doc(A

Doc(B

Doc(CB(>(A

ranking(of(documents(at(search(system

users(often(clicked(on(Doc(B

ordered(pair

34

Feature ExtractionFeature(Extraction

35

Doc(A

Doc(B

Doc(C

Query

BM25

BM25

BM25

PageRank

PageRank

PageRank

.............

.............

.............

Query5document(feature

Document(feature

Feature(Vectors

Feature Extraction: Example Features

• BM25 (a classical ranking score function)

• Proximity/distance measures between query and document

• Whether query exactly occurs in document

• PageRank

• …

Evaluation Measures

• Allows us to quantify how good the predicted ranking is when compared with ground truth ranking

• Typical evaluation metrics care more about ranking top results correctly

• Popular Measures

‣ NDCG (Normalized Discounted Cumulative Gain)

‣ MAP (Mean Average Precision)

‣ MRR (Mean Reciprocal Rank)

‣ WTA (Winners Take All)

‣ Kendall’s Tau

Kendall’s Tau�������-!���#

� Comparing(agreement(between(two(permutations

� Number(of(pairs(� Number(of(concordant(pairs(and(disconcordantpairs(

� Completely(agree� Completely(disagree(

)1(21

-

��

nn

QP�

)1(21

�nn

QP,

41

1���

-1��

Kendall’s Tau�������-!���#�0���"-1

� ExampleA((B((CC((A((B

42

31

32-1

����

Learning to Rank Methods

• Three major approaches

‣ Pointwise approach  

‣ Pairwise approach

‣ Listwise approach

Pointwise Approach

• Transforming ranking to regression, classification, or ordinal regression

• Query-document group structure is ignored

Pointwise ApproachPointwise(Approach

Learning

Regression Classification Ordinal2Regression

Input Feature(vector

Output Real(number Category Ordered category

Model Ranking(function

Loss2Function Regression(loss Classification loss Ordinal(regression(

loss

51

x

)(xfy � ))(sign( xfy � ))((thresold xfy �

)(xf

Pairwise Approach

• Transforming ranking to pairwise classification

• Query-document group structure is ignored

Pairwise ApproachPairwise(Approach

53

Learning Ranking

Input Ordered(feature(vector(pair Feature(vectors

Output Classification on(order(of(vector(pair Permutation on(vectors

Model Ranking function((

Loss2Function Pairwise classification(loss( Ranking evaluation(measure

)(xf

))(sign(, jiji xxfy ��

),( ji xx

))}({sort( 1niixf ���

niix 1}{ ��x

Pairwise(Approach

53

Learning Ranking

Input Ordered(feature(vector(pair Feature(vectors

Output Classification on(order(of(vector(pair Permutation on(vectors

Model Ranking function((

Loss2Function Pairwise classification(loss( Ranking evaluation(measure

)(xf

))(sign(, jiji xxfy ��

),( ji xx

))}({sort( 1niixf ���

niix 1}{ ��x

Pairwise RankingPairwise(Classification

� Converting(document(list(to(document(pairs

62

Doc(A

Doc(B

Doc(C

Query

Doc(A

Doc(B Doc(C

Query

Doc(ADoc(B

Doc(C

Rank(1

Rank(2

Rank(3

Transformed Pairwise Classification ProblemTransformed(Pairwise(Classification(Problem

66

);( wxf31 xx �

32 xx �

21 xx �

+151

12 xx �

23 xx �

13 xx �

Positive(Examples

Negative(Examples

Listwise Approach

• Entire list of documents as input-instance

• Query-document group structure is used

• Straightforwardly represents learning to rank problem

‣ Many state of the art methods: ListNet, ListMLE, etc.

Listwise ApproachListwise(Approach

55

Learning &2Ranking

Input Feature(vectors

Output Permutation(on(feature(vectors

Model Ranking(function

Loss2Function Listwise loss(function(ranking(evaluation(measure)

niix 1}{ ��x

)(xf

))}({sort( 1niixf ���