Learning to Rank - University of Texas at Austinmooney/ir-course/slides/Ravikumar/IR_LTR.pdf · Is...

43
Learning to Rank Pradeep Ravikumar [email protected] CS 371R Adapted from Hang Li, MSR Asia, 09

Transcript of Learning to Rank - University of Texas at Austinmooney/ir-course/slides/Ravikumar/IR_LTR.pdf · Is...

Page 1: Learning to Rank - University of Texas at Austinmooney/ir-course/slides/Ravikumar/IR_LTR.pdf · Is learning to rank necessary for IR? • Has been successfully used in web-document

Learning to Rank

Pradeep Ravikumar [email protected] !CS 371R !Adapted from Hang Li, MSR Asia, 09!

Page 2: Learning to Rank - University of Texas at Austinmooney/ir-course/slides/Ravikumar/IR_LTR.pdf · Is learning to rank necessary for IR? • Has been successfully used in web-document

Ranking

Coke or Pepsi or Sprite)?

Page 3: Learning to Rank - University of Texas at Austinmooney/ir-course/slides/Ravikumar/IR_LTR.pdf · Is learning to rank necessary for IR? • Has been successfully used in web-document

Ranking

Ranking Chess Players

Page 4: Learning to Rank - University of Texas at Austinmooney/ir-course/slides/Ravikumar/IR_LTR.pdf · Is learning to rank necessary for IR? • Has been successfully used in web-document

Ranking

Ranking Chess Players

Golf Players, etc.

Page 5: Learning to Rank - University of Texas at Austinmooney/ir-course/slides/Ravikumar/IR_LTR.pdf · Is learning to rank necessary for IR? • Has been successfully used in web-document

Whither Machine Learning?

Machine Learning Magic Box

Coke, Pepsi, Sprite Coke > Sprite > Pepsi

List of Objects Ranking of Objects

Page 6: Learning to Rank - University of Texas at Austinmooney/ir-course/slides/Ravikumar/IR_LTR.pdf · Is learning to rank necessary for IR? • Has been successfully used in web-document

Whither Machine Learning?

Machine Learning Magic Box

Coke, Pepsi, Sprite Coke > Sprite > Pepsi

List of Objects Ranking of Objects

From “training” data

Page 7: Learning to Rank - University of Texas at Austinmooney/ir-course/slides/Ravikumar/IR_LTR.pdf · Is learning to rank necessary for IR? • Has been successfully used in web-document

RankingRanking(Problem

16

sns

s

s

o

o

o

,

2,

1,

�s

request

offerings

ranking(of(offerings

� �NoooO ,,, 21 ��

),( osf

Objects

Ranking of Objects

Query

Page 8: Learning to Rank - University of Texas at Austinmooney/ir-course/slides/Ravikumar/IR_LTR.pdf · Is learning to rank necessary for IR? • Has been successfully used in web-document

Ranking via Machine Learning

Learning(to(Rank

1,1

2,1

1,1

1

no

o

os

��

mnm

m

m

m

o

o

os

,

2,

1,

Learning(System

Ranking(System

1�ms

),(

),( ),(

11 ,11,1

2,112,1

1,111,1

�� ���

���

���

mm nmmnm

mmm

mmm

osfo

osfo

osfo

),( osf

17

� �NoooO ,,, 21 ��

“Training” data

Page 9: Learning to Rank - University of Texas at Austinmooney/ir-course/slides/Ravikumar/IR_LTR.pdf · Is learning to rank necessary for IR? • Has been successfully used in web-document

Is learning to rank necessary for IR?

• Has been successfully used in web-document search

• Has been the focus of increasing research at the intersection of machine learning + IR

Page 10: Learning to Rank - University of Texas at Austinmooney/ir-course/slides/Ravikumar/IR_LTR.pdf · Is learning to rank necessary for IR? • Has been successfully used in web-document

Example: Machine TranslationRe5Ranking(in(Machine(Translation

138

Re5Ranking(Model

1000

2

1

e

ee

�ranked(sentencecandidates(in(target(language

GenerativeModel

f

sentence(source(language

e~re5ranked(topsentence(in(targetlanguage(

Page 11: Learning to Rank - University of Texas at Austinmooney/ir-course/slides/Ravikumar/IR_LTR.pdf · Is learning to rank necessary for IR? • Has been successfully used in web-document

Example: Collaborative FilteringCollaborative(Filtering

130

Item1 I tem2 Item3 ...

User1 5 4

User2 1 2 2

... ? ? ?

User M 4 3

Page 12: Learning to Rank - University of Texas at Austinmooney/ir-course/slides/Ravikumar/IR_LTR.pdf · Is learning to rank necessary for IR? • Has been successfully used in web-document

Example: Collaborative FilteringCollaborative(Filtering

130

Item1 I tem2 Item3 ...

User1 5 4

User2 1 2 2

... ? ? ?

User M 4 3

Input: User Ratings on some itemsOutput: User Ratings on other items

Page 13: Learning to Rank - University of Texas at Austinmooney/ir-course/slides/Ravikumar/IR_LTR.pdf · Is learning to rank necessary for IR? • Has been successfully used in web-document

Example: Collaborative FilteringCollaborative(Filtering

130

Item1 I tem2 Item3 ...

User1 5 4

User2 1 2 2

... ? ? ?

User M 4 3

Any user is a query; based on this query, rank the set of items (based on user-item features)

Page 14: Learning to Rank - University of Texas at Austinmooney/ir-course/slides/Ravikumar/IR_LTR.pdf · Is learning to rank necessary for IR? • Has been successfully used in web-document

Example: Key Phrase Extraction

• Input: Document

• Output: Key phrases in document

• Pre-processing step: extraction of possible phrases in document

• Ranking approach: rank the extracted phrases

Page 15: Learning to Rank - University of Texas at Austinmooney/ir-course/slides/Ravikumar/IR_LTR.pdf · Is learning to rank necessary for IR? • Has been successfully used in web-document

Other IR applications of learning to rank

• Question Answering

• Document Summarization Opinion Mining

• Sentiment Analysis

• Machine Translation

• Sentence Parsing

• ….

Page 16: Learning to Rank - University of Texas at Austinmooney/ir-course/slides/Ravikumar/IR_LTR.pdf · Is learning to rank necessary for IR? • Has been successfully used in web-document

Our focus: ranking documentsRanking(Problem:(Example(=(Document(Retrieval

qnq

q

q

d

d

d

,

2,

1,

qquery

documents

ranking(of(documents

� �NdddD ,,, 21 ��

),( dqf

20

ranking(based(on(relevance,(importance,(

preference

Page 17: Learning to Rank - University of Texas at Austinmooney/ir-course/slides/Ravikumar/IR_LTR.pdf · Is learning to rank necessary for IR? • Has been successfully used in web-document

Traditional approach to “learning to rank”

Probabilistic(Model

Nd

dd

2

1

q

),|(~

),|(~),|(~

22

11

nn dqrPd

dqrPddqrPd

query

documents

ranking(of(documents

}0,1{),|(

�RdqrP

23

Page 18: Learning to Rank - University of Texas at Austinmooney/ir-course/slides/Ravikumar/IR_LTR.pdf · Is learning to rank necessary for IR? • Has been successfully used in web-document

Modern approach to learning to rank

Learning(to(Rank

1,1

2,1

1,1

1

nd

d

dq

mnm

m

m

m

d

d

dq

,

2,

1,

Learning(System

Ranking(System

1�mq

),(

),( ),(

11 ,11,1

2,112,1

1,111,1

�� ���

���

���

mm nmmnm

mmm

mmm

dqfd

dqfd

dqfd

),( dqf

26

� �NdddD ,,, 21 ��

Page 19: Learning to Rank - University of Texas at Austinmooney/ir-course/slides/Ravikumar/IR_LTR.pdf · Is learning to rank necessary for IR? • Has been successfully used in web-document

Learning to Rank: Training Process

1.(Data(Labeling(rank) 3.(Learning

)(xf

2.(Feature(Extraction

&&

%

&&

$

#

1,1

2,1

1,1

1

nd

d

d

q�

&&

%

&&

$

#

mnm

m

m

m

d

d

d

q

,

2,

1,

&&

%

&&

$

#

11 ,1,1

2,12,1

1,11,1

1

nn yd

yd

yd

q�

&&

%

&&

$

#

mm nmnm

mm

mm

m

yd

yd

yd

q

,,

2,2,

1,1,

&&

%

&&

$

#

11 ,1,1

2,12,1

1,11,1

nn yx

yx

yx

&&

%

&&

$

#

mm nmnm

mm

mm

yx

yx

yx

,,

2,2,

1,1,

Training(Process

27

Page 20: Learning to Rank - University of Texas at Austinmooney/ir-course/slides/Ravikumar/IR_LTR.pdf · Is learning to rank necessary for IR? • Has been successfully used in web-document

Learning to Rank: Training Process

1.(Data(Labeling(rank) 3.(Learning

)(xf

2.(Feature(Extraction

&&

%

&&

$

#

1,1

2,1

1,1

1

nd

d

d

q�

&&

%

&&

$

#

mnm

m

m

m

d

d

d

q

,

2,

1,

&&

%

&&

$

#

11 ,1,1

2,12,1

1,11,1

1

nn yd

yd

yd

q�

&&

%

&&

$

#

mm nmnm

mm

mm

m

yd

yd

yd

q

,,

2,2,

1,1,

&&

%

&&

$

#

11 ,1,1

2,12,1

1,11,1

nn yx

yx

yx

&&

%

&&

$

#

mm nmnm

mm

mm

yx

yx

yx

,,

2,2,

1,1,

Training(Process

27

Page 21: Learning to Rank - University of Texas at Austinmooney/ir-course/slides/Ravikumar/IR_LTR.pdf · Is learning to rank necessary for IR? • Has been successfully used in web-document

Learning to Rank: Training Process

1.(Data(Labeling(rank) 3.(Learning

)(xf

2.(Feature(Extraction

&&

%

&&

$

#

1,1

2,1

1,1

1

nd

d

d

q�

&&

%

&&

$

#

mnm

m

m

m

d

d

d

q

,

2,

1,

&&

%

&&

$

#

11 ,1,1

2,12,1

1,11,1

1

nn yd

yd

yd

q�

&&

%

&&

$

#

mm nmnm

mm

mm

m

yd

yd

yd

q

,,

2,2,

1,1,

&&

%

&&

$

#

11 ,1,1

2,12,1

1,11,1

nn yx

yx

yx

&&

%

&&

$

#

mm nmnm

mm

mm

yx

yx

yx

,,

2,2,

1,1,

Training(Process

27

features are functions of query AND document

Page 22: Learning to Rank - University of Texas at Austinmooney/ir-course/slides/Ravikumar/IR_LTR.pdf · Is learning to rank necessary for IR? • Has been successfully used in web-document

Learning to Rank: Testing Process

1.(Data(Labeling(rank)

2.(Feature(Extraction

Testing(Process

28

3.((Ranking(with )(xf

&&

%

&&

$

#

��� ���

���

���

)(

)( )(

111 ,1,1,1

2,12,12,1

1,11,11,1

mmm nmnmnm

mmm

mmm

yxfx

yxfx

yxfx

4.((Evaluation

EvaluationResult

&&

%

&&

$

#

��

1,1

2,1

1,1

1

mnm

m

m

m

d

d

d

q�

&&

%

&&

$

#

�� ��

��

��

11 ,1,1

2,12,1

1,11,1

1

mm nmnm

mm

mm

m

yd

yd

yd

q�

&&

%

&&

$

#

�� ��

��

��

11 ,1,1

2,12,1

1,11,1

mm nmnm

mm

mm

yx

yx

yx

Page 23: Learning to Rank - University of Texas at Austinmooney/ir-course/slides/Ravikumar/IR_LTR.pdf · Is learning to rank necessary for IR? • Has been successfully used in web-document

Learning to Rank: Testing Process

1.(Data(Labeling(rank)

2.(Feature(Extraction

Testing(Process

28

3.((Ranking(with )(xf

&&

%

&&

$

#

��� ���

���

���

)(

)( )(

111 ,1,1,1

2,12,12,1

1,11,11,1

mmm nmnmnm

mmm

mmm

yxfx

yxfx

yxfx

4.((Evaluation

EvaluationResult

&&

%

&&

$

#

��

1,1

2,1

1,1

1

mnm

m

m

m

d

d

d

q�

&&

%

&&

$

#

�� ��

��

��

11 ,1,1

2,12,1

1,11,1

1

mm nmnm

mm

mm

m

yd

yd

yd

q�

&&

%

&&

$

#

�� ��

��

��

11 ,1,1

2,12,1

1,11,1

mm nmnm

mm

mm

yx

yx

yx

Ground truth ranks

Page 24: Learning to Rank - University of Texas at Austinmooney/ir-course/slides/Ravikumar/IR_LTR.pdf · Is learning to rank necessary for IR? • Has been successfully used in web-document

Learning to Rank: Testing Process

1.(Data(Labeling(rank)

2.(Feature(Extraction

Testing(Process

28

3.((Ranking(with )(xf

&&

%

&&

$

#

��� ���

���

���

)(

)( )(

111 ,1,1,1

2,12,12,1

1,11,11,1

mmm nmnmnm

mmm

mmm

yxfx

yxfx

yxfx

4.((Evaluation

EvaluationResult

&&

%

&&

$

#

��

1,1

2,1

1,1

1

mnm

m

m

m

d

d

d

q�

&&

%

&&

$

#

�� ��

��

��

11 ,1,1

2,12,1

1,11,1

1

mm nmnm

mm

mm

m

yd

yd

yd

q�

&&

%

&&

$

#

�� ��

��

��

11 ,1,1

2,12,1

1,11,1

mm nmnm

mm

mm

yx

yx

yx

Ground truth ranks

Page 25: Learning to Rank - University of Texas at Austinmooney/ir-course/slides/Ravikumar/IR_LTR.pdf · Is learning to rank necessary for IR? • Has been successfully used in web-document

Learning to Rank: Testing Process

1.(Data(Labeling(rank)

2.(Feature(Extraction

Testing(Process

28

3.((Ranking(with )(xf

&&

%

&&

$

#

��� ���

���

���

)(

)( )(

111 ,1,1,1

2,12,12,1

1,11,11,1

mmm nmnmnm

mmm

mmm

yxfx

yxfx

yxfx

4.((Evaluation

EvaluationResult

&&

%

&&

$

#

��

1,1

2,1

1,1

1

mnm

m

m

m

d

d

d

q�

&&

%

&&

$

#

�� ��

��

��

11 ,1,1

2,12,1

1,11,1

1

mm nmnm

mm

mm

m

yd

yd

yd

q�

&&

%

&&

$

#

�� ��

��

��

11 ,1,1

2,12,1

1,11,1

mm nmnm

mm

mm

yx

yx

yx

Ground truth ranks

Page 26: Learning to Rank - University of Texas at Austinmooney/ir-course/slides/Ravikumar/IR_LTR.pdf · Is learning to rank necessary for IR? • Has been successfully used in web-document

Issues in learning to rank

• Data Labeling

• Feature Extraction

• Evaluation Measure

• Learning Method (Model, Loss Function, Algorithm)

Page 27: Learning to Rank - University of Texas at Austinmooney/ir-course/slides/Ravikumar/IR_LTR.pdf · Is learning to rank necessary for IR? • Has been successfully used in web-document

Data Labeling

Data(Labeling(Problem� E.g.,(relevance(of(documents(w.r.t.(query

32

Doc(A

Doc(B

Doc(C

Query

• Ground truth/supervision: “ranks” of set of documents/objects

Page 28: Learning to Rank - University of Texas at Austinmooney/ir-course/slides/Ravikumar/IR_LTR.pdf · Is learning to rank necessary for IR? • Has been successfully used in web-document

Data Labeling: Supervision takes different forms

• Multiple levels (e.g., relevant, partially relevant, irrelevant)

‣ Widely used in IR

• Ordered Pairs

‣ e.g. ordered pairs between documents (e.g. A>B, B>C)

‣ e.g. implicit relevance judgments derived from click-through data

• Fully ordered list (i.e. a permutation) of documents is given

‣ Ideal but difficult to implement

Page 29: Learning to Rank - University of Texas at Austinmooney/ir-course/slides/Ravikumar/IR_LTR.pdf · Is learning to rank necessary for IR? • Has been successfully used in web-document

Data Labeling: Implicit Relevance JudgementImplicit(Relevance(Judgment

Doc(A

Doc(B

Doc(CB(>(A

ranking(of(documents(at(search(system

users(often(clicked(on(Doc(B

ordered(pair

34

Page 30: Learning to Rank - University of Texas at Austinmooney/ir-course/slides/Ravikumar/IR_LTR.pdf · Is learning to rank necessary for IR? • Has been successfully used in web-document

Feature ExtractionFeature(Extraction

35

Doc(A

Doc(B

Doc(C

Query

BM25

BM25

BM25

PageRank

PageRank

PageRank

.............

.............

.............

Query5document(feature

Document(feature

Feature(Vectors

Page 31: Learning to Rank - University of Texas at Austinmooney/ir-course/slides/Ravikumar/IR_LTR.pdf · Is learning to rank necessary for IR? • Has been successfully used in web-document

Feature Extraction: Example Features

• BM25 (a classical ranking score function)

• Proximity/distance measures between query and document

• Whether query exactly occurs in document

• PageRank

• …

Page 32: Learning to Rank - University of Texas at Austinmooney/ir-course/slides/Ravikumar/IR_LTR.pdf · Is learning to rank necessary for IR? • Has been successfully used in web-document

Evaluation Measures

• Allows us to quantify how good the predicted ranking is when compared with ground truth ranking

• Typical evaluation metrics care more about ranking top results correctly

• Popular Measures

‣ NDCG (Normalized Discounted Cumulative Gain)

‣ MAP (Mean Average Precision)

‣ MRR (Mean Reciprocal Rank)

‣ WTA (Winners Take All)

‣ Kendall’s Tau

Page 33: Learning to Rank - University of Texas at Austinmooney/ir-course/slides/Ravikumar/IR_LTR.pdf · Is learning to rank necessary for IR? • Has been successfully used in web-document

Kendall’s Tau�������-!���#

� Comparing(agreement(between(two(permutations

� Number(of(pairs(� Number(of(concordant(pairs(and(disconcordantpairs(

� Completely(agree� Completely(disagree(

)1(21

-

��

nn

QP�

)1(21

�nn

QP,

41

1���

-1��

Page 34: Learning to Rank - University of Texas at Austinmooney/ir-course/slides/Ravikumar/IR_LTR.pdf · Is learning to rank necessary for IR? • Has been successfully used in web-document

Kendall’s Tau�������-!���#�0���"-1

� ExampleA((B((CC((A((B

42

31

32-1

����

Page 35: Learning to Rank - University of Texas at Austinmooney/ir-course/slides/Ravikumar/IR_LTR.pdf · Is learning to rank necessary for IR? • Has been successfully used in web-document

Learning to Rank Methods

• Three major approaches

‣ Pointwise approach  

‣ Pairwise approach

‣ Listwise approach

Page 36: Learning to Rank - University of Texas at Austinmooney/ir-course/slides/Ravikumar/IR_LTR.pdf · Is learning to rank necessary for IR? • Has been successfully used in web-document

Pointwise Approach

• Transforming ranking to regression, classification, or ordinal regression

• Query-document group structure is ignored

Page 37: Learning to Rank - University of Texas at Austinmooney/ir-course/slides/Ravikumar/IR_LTR.pdf · Is learning to rank necessary for IR? • Has been successfully used in web-document

Pointwise ApproachPointwise(Approach

Learning

Regression Classification Ordinal2Regression

Input Feature(vector

Output Real(number Category Ordered category

Model Ranking(function

Loss2Function Regression(loss Classification loss Ordinal(regression(

loss

51

x

)(xfy � ))(sign( xfy � ))((thresold xfy �

)(xf

Page 38: Learning to Rank - University of Texas at Austinmooney/ir-course/slides/Ravikumar/IR_LTR.pdf · Is learning to rank necessary for IR? • Has been successfully used in web-document

Pairwise Approach

• Transforming ranking to pairwise classification

• Query-document group structure is ignored

Page 39: Learning to Rank - University of Texas at Austinmooney/ir-course/slides/Ravikumar/IR_LTR.pdf · Is learning to rank necessary for IR? • Has been successfully used in web-document

Pairwise ApproachPairwise(Approach

53

Learning Ranking

Input Ordered(feature(vector(pair Feature(vectors

Output Classification on(order(of(vector(pair Permutation on(vectors

Model Ranking function((

Loss2Function Pairwise classification(loss( Ranking evaluation(measure

)(xf

))(sign(, jiji xxfy ��

),( ji xx

))}({sort( 1niixf ���

niix 1}{ ��x

Pairwise(Approach

53

Learning Ranking

Input Ordered(feature(vector(pair Feature(vectors

Output Classification on(order(of(vector(pair Permutation on(vectors

Model Ranking function((

Loss2Function Pairwise classification(loss( Ranking evaluation(measure

)(xf

))(sign(, jiji xxfy ��

),( ji xx

))}({sort( 1niixf ���

niix 1}{ ��x

Page 40: Learning to Rank - University of Texas at Austinmooney/ir-course/slides/Ravikumar/IR_LTR.pdf · Is learning to rank necessary for IR? • Has been successfully used in web-document

Pairwise RankingPairwise(Classification

� Converting(document(list(to(document(pairs

62

Doc(A

Doc(B

Doc(C

Query

Doc(A

Doc(B Doc(C

Query

Doc(ADoc(B

Doc(C

Rank(1

Rank(2

Rank(3

Page 41: Learning to Rank - University of Texas at Austinmooney/ir-course/slides/Ravikumar/IR_LTR.pdf · Is learning to rank necessary for IR? • Has been successfully used in web-document

Transformed Pairwise Classification ProblemTransformed(Pairwise(Classification(Problem

66

);( wxf31 xx �

32 xx �

21 xx �

+151

12 xx �

23 xx �

13 xx �

Positive(Examples

Negative(Examples

Page 42: Learning to Rank - University of Texas at Austinmooney/ir-course/slides/Ravikumar/IR_LTR.pdf · Is learning to rank necessary for IR? • Has been successfully used in web-document

Listwise Approach

• Entire list of documents as input-instance

• Query-document group structure is used

• Straightforwardly represents learning to rank problem

‣ Many state of the art methods: ListNet, ListMLE, etc.

Page 43: Learning to Rank - University of Texas at Austinmooney/ir-course/slides/Ravikumar/IR_LTR.pdf · Is learning to rank necessary for IR? • Has been successfully used in web-document

Listwise ApproachListwise(Approach

55

Learning &2Ranking

Input Feature(vectors

Output Permutation(on(feature(vectors

Model Ranking(function

Loss2Function Listwise loss(function(ranking(evaluation(measure)

niix 1}{ ��x

)(xf

))}({sort( 1niixf ���