Towards Diverse Recommendation

Towards Diverse Recommendation


Neil Hurley

Complex Adaptive System LaboratoryComputer Science and Informatics

University College Dublin

Clique Strategic Research Clusterclique.ucd.ie

October 2011

DiveRS: International Workshop on Novelty and Diversity in Recommender Systems

clique.ucd.ie


Outline

1 Setting the Context

2 Novelty and Diversity in Information retrievalIR Measures of DiversityIR Measures of Novelty

3 Diversity Research in Recommender SystemsConcentration Measures of DiversitySerendipity



Setting the Context

Outline






Setting the Context

Recommendation Performance I

Much effort has been spent on improving the performance ofrecommenders from the point of view of rating prediction.

It is a well-defined statistical problem;We have agreed objective measure of prediction quality.

Efficient algorithms have been developed that are good atmaximising predictive accuracy.

Not a completely solved problem – e.g. dealing with dynamicdata.

But, there are well accepted evaluation methodologies andquality measures.



Setting the Context

Recommendation Performance II

But good recommendation is not about ability to predict pastratings.

Recommendation quality is subjective;People’s tastes fluctuate;People can be influenced and persuaded;Recommendation can be as much about psychology asstatistics.

A number of ‘qualities’ are being more and more talked aboutwith regard to other dimensions of recommendation:

NoveltyInterestingnessDiversitySerendipityUser satisfaction



Setting the Context





Novelty

InterestingnessDiversitySerendipityUser satisfaction



Setting the Context





NoveltyInterestingness

DiversitySerendipityUser satisfaction



Setting the Context





NoveltyInterestingnessDiversity

SerendipityUser satisfaction



Setting the Context





NoveltyInterestingnessDiversitySerendipity

User satisfaction



Setting the Context





NoveltyInterestingnessDiversitySerendipityUser satisfaction



Setting the Context

Recommendation Performance III

Clearly user-surveys may be the only way to determine subjectsatisfaction with a system.

(Castagnos et al, 2010) present useful survey results on theimportance of diversity.

In order to make progress on recommendation algorithms thatseek improvements along these dimensions, we need

Agreed (objective?) measures of these qualities and agreedevaluation methodologies



Setting the Context

Agenda

Focus in this talk on measures of novelty and diversity, ratherthan algorithms for diversification.

Initially look at how these concepts are defined in IR research.

Then examine ideas that have emerged from the RScommunity.



Novelty and Diversity in Information retrieval

Outline







Novelty and Diversity in Information Retrieval

The Probability Ranking Principle

“If a reference retrieval system’s response to each request is aranking of the documents in the collection in order of decreasingprobability of relevance . . . the overall effectiveness of the systemto its user will be the best that is obtainable” (W.S. Cooper)

Nevertheless, relevance measured for each single documenthas been challenged since as long ago as 1964

Goffman (1964). . . one must define relevance in relation to theentire set of documents rather than to only one documentBoyce (1982) . . . A retrieval system which aspires to theretrieval of relevant documents should have a second stagewhich will order the topical set in a manner so as to providemaximum informativeness








Goffman (1964). . . one must define relevance in relation to theentire set of documents rather than to only one document

Boyce (1982) . . . A retrieval system which aspires to theretrieval of relevant documents should have a second stagewhich will order the topical set in a manner so as to providemaximum informativeness








Goffman (1964). . . one must define relevance in relation to theentire set of documents rather than to only one documentBoyce (1982) . . . A retrieval system which aspires to theretrieval of relevant documents should have a second stagewhich will order the topical set in a manner so as to providemaximum informativeness





The Maximal Marginal Relevance (MMR) criterion

“ reduce redundancy while maintaining query relevance inre-ranking retrieved documents” (Carbonell and Goldstein 1998)Given a set of retrieved documents R, for a query Q incrementallyrank the documents according to

MMR , arg maxDi∈R\S

[λsim1(Di, Q)− (1− λ) max

Dj∈Ssim2(Di, Dj)

]where S is the set of documents already ranked from R.

Iterative greedy approach to increasing the diversity of aranking.





The Expected Metric Principle

“in a probabilistic context, one should directly optimize for theexpected value of the metric of interest” Chen and Karger (2006).

Chen and Karger (2006) introduces a greedy optimisationframework in which the next document is selected to greedilyoptimise the selected objective.

An objective such as mean k-call at n where k-call is 1 if thetop-n result contains at least k relevant documents, naturallyincreases result-set diversity.

For 1-call, this results in an approach of selecting the nextdocument, assuming that all documents selected so far arenot relevant.





PMP – rank according to

Pr(r|d) =⇒ Pr(d|r)Pr(d|qr)

k-call at n – rank according to

Pr(at least k of r0, ..., rn−1|d0, d1, ...dn−1)

Consider a query such as Trojan Horse, whose meaning isambiguous. The PMP criterion would determine the mostlikely meaning and present a ranked list reflecting thatmeaning. A 1-call at n criterion would present a resultpertaining to each possible meaning, with an aim of getting atleast one right.





Figure: Results from Chen and Karger (2006) on TREC2004 RobustTrack.

MSL = Mean Search Length (mean of rank of first relevantdocument minus one)MRR = Mean Reciprocal Rank (mean of the reciprocal rank offirst relevant document)DiveRS: International Workshop on Novelty and Diversity in Recommender Systems




Agrawal et al. (2009) propose a similar approach of anobjective function to maximise the probability of finding atleast one relevant result.

They dub their approach the result diversification problem andstate it as

S = arg maxS⊆D,|S|=k

Pr(S|q)

Pr(S|q) =∑

c

Pr(c|q)(1−∏d∈S

(1− V (d|q, c)))

whereS is the retrieved result set of k documentsc ∈ C is a set of categoriesV (d|q, c) is the likelihood of the document satisfying the userintent, given the query q.





Zhai and Lafferty (2006) – risk minimization of a lossfunction over possible returned document rankings measuringhow unhappy the user is with that set.




Axioms of Diversification (Gollapudi and Sharma 2009)

r(.) : D ×Q→ R+ a measure of relevance

d(., .) : D ×D → R+ a similarity function

Diversification objective

R∗k = arg max{Rk⊆D,|Rk|=k}f(Rk, q, r(.), d(., .))

What properties should f() satisfy?




Axioms of Diversification (Gollapudi and Sharma 2009) I

1 Scale Invariance – insensitive to scaling distance andrelevance by constant.

2 Consistency – Making output more relevance and morediverse and other documents less relevant and less diverseshould not change output of the ranking.

3 Richness – Should be able to obtain any possible set asoutput by appropriate choice of r(.) and d(., .).

4 Stability – Output should not change arbitrarily with size:R∗k ⊆ R∗k+1.

5 Independence of Irrelevant Attributes f(R) independentof r(u) and d(u, v) for u, v /∈ S.




Axioms of Diversification (Gollapudi and Sharma 2009) II

6 Monotonicity – Addition of a document to R should notdecrease the score : f(R ∪ {d}) ≥ f(R).

7 Strength of Relevance – No f(.) ignores the relevancescores.

8 Strength of Similarity – No f(.) ignores the similarity scores.

No Function satisfies all 8 axioms

MaxSum DiversificationWeighted sum of the sums of relevance and dissimilarity ofitems in the selected set.safisfies all axioms except stability.

MaxMin DiversificationWeighted sum of the min relevance and min dissimilarity ofitems in the selected set.satisfies all axioms except consistency and stability.




IR Measures of Diversity

Outline








IR Measure of Diversity

S-recall (Zhai and Lafferty 2006)

S-recall at rank n is defined as the number of subtopics retrievedup to a given rank n divided by the total number of subtopics : LetSi ⊆ S be the number of subtopics in the ith document di then

S− recall@n =|⋃n

i=1 Si||S|

Let minrank(S, k) = size of the smallest subset of documentsthat cover at least k subtopics.

Usually most useful to consider S− recall@n wheren = minrank(S, |S|)






S-precision (Zhai and Lafferty 2006)

S-precision at rank n is the ratio of the minimum rank at which agiven recall value can optimally be achieved to the first rank atwhich the same recall value actually has been achieved.Let k = |

⋃ni=1 Si|. Then

S− precision@n =minrank(S, k)

m∗where m∗ = arg min

j|

j⋃i=1

Si| ≥ k






α-NDCG (Clarke et al. 2008)

Standard NDCG (Normalised Cumulative Discounted Gain)calculates a gain for each document based on its relevance and alogarithmic discount for the rank it appears at. Extended fordiversity evaluation, the gain is incremented by 1 for each newsubtopic, and αk(0 ≤ α ≤ 1) for a subtopic that has been seen ktimes in previously-ranked documents.






Intent-aware Precision (Agrawal et al. 2009)

Intent-aware precision precIA is calculated by first calculatingprecision for each distinct subtopic separately, then averaging theseprecisions according to a distribution of the proportion of usersthat are interested in that subtopic:

precIA@n =∑s∈S

Pr(s|q) 1n

n∑i=1

I(s ∈ di)




IR Measures of Novelty

Outline









Novelty Measures (Agrawal et al. 2009)

KL-divergence D(di||dj) is used to measure novelty of di wrtdj .

Alternatively, di can be modelled as a mixture of dj and abackground model. The higher the weight of dj in themixture, the less novel is di wrt dj .

Pairwise measures are combined to give overall measure ofnovelty wrt all documents in result set.





Summary of IR Research

Long recognised that the probability ranking principle doesnot adequately measure result list quality

– the usefulness of a document depends on what otherdocuments are on the list.

Considering that each document consists of a set of subtopics,information nuggets or facets

The novelty of a document is a measure of how muchredundancy it contains, where it is redundant w.r.t. a facet, ifthat facet is already covered by another document.The diversity of a result list is a measure of the number ofrelevant facets it contains.

No complete consensus here – e.g. Gollapudi and Sharma (2009)define “novelty” as fraction of topics covered

Consider selecting document with least redundancy vsselecting document that improves overall diversity.









The novelty of a document is a measure of how muchredundancy it contains, where it is redundant w.r.t. a facet, ifthat facet is already covered by another document.

The diversity of a result list is a measure of the number ofrelevant facets it contains.











The novelty of a document is a measure of how muchredundancy it contains, where it is redundant w.r.t. a facet, ifthat facet is already covered by another document.The diversity of a result list is a measure of the number ofrelevant facets it contains.








In general, IR lines of research wrt diversity and noveltyconsider the following:

Relevance scores for documents are not independent – need toconsider relevance wrt to the entire result set, rather than eachdocument in turn.Diversity is related to query ambiguity –

Difference between selecting documents according to theprobability of meaning; orSelecting documents to cover all meanings, so that at leastone is relevant.

Diversity is a measure of set; novelty is a measure of eachdocument wrt a particular set in which it is contained.



Diversity Research in Recommender Systems

Outline







Diversity – The Long Tail Problem

Figure: Sales Demand for 1000 productsDiveRS: International Workshop on Novelty and Diversity in Recommender Systems




Figure: Top 2% of Most Popular Products Account for 13% of SalesDiveRS: International Workshop on Novelty and Diversity in Recommender Systems




Figure: Least Popular Items Account for 30% of SalesDiveRS: International Workshop on Novelty and Diversity in Recommender Systems




“Less is More”

– Chris Anderson [Why the Future of Business is Selling Less ofMore]




Recommenders and The Long Tail Problem

To support an increase in sales, need to increase the diversityof the set of recommendations made to the end-user.

Recommend items in the long-tail that are highly likely to beliked by the current-user.

Implies finding those items that are liked by the current userand relatively few other users.




Diversity – The End-user Perspective

Definition

The diversity of a set L of size p is the average dissimilarity of theitems in the set

fD(L) =2

p(p− 1)

∑i∈L

∑j<i∈L

(1− s(i, j))

We have found it useful to define novelty (or relative diversity) asfollows:

Definition

The novelty of an item i in a set L is

nL(i) =1

p− 1

∑j∈L,j 6=i

(1− s(i, j))




Diversity – The End-user Perspective

User Profile from Movielens Dataset, |Pu| = 764, N = 20,|Tu| = 0.1× |Pu|40% of most novel items accrue no hits at all.




Other Definitions of Novelty/Diversity in RS

Castells et al. (2011) outlines some of the ways that noveltyimpacts on recommender system design.

Distinguishes item popularity and item similarity; user-relativemeasures and global measures

Popularity-based novelty:

novelty(i) = − log p(i) global measure

or 1− log(p(K|i))novelty(i) = − log(p(i|u)) user perspective

or 1− log(p(K|i, u))

Similarity perspective

novelty(i|S) =∑j∈S

p(j|S)d(i, j)




Concentration Measures of Diversity

Outline








Evaluating Diversity

In our 2009 RecSys paper, we evaluated our diversificationmethod on test sets T (µ) consisting of items chosen from thetop 100× (1− µ)% most novel items in the user profiles.





Toy Example

Motivate our diversity methodology using a toy example inwhich a user-base of four users, u1, u2, u3, u4 is recommendeditems from a catalogue of 4 items i1, i2, i3, i4.

The system recommends N = 2 items to each user.

Any particular scenario can be represented in a table thatindicates whether a user actually likes an item or not (1 or 0)and the probability that the recommender system willrecommend the corresponding item to the user.

Assume that G1 = {i1, i2} is a single genre (e.g. horrormovies) and G2 = {i3, i4} is another.

Simple similarity measure s(i1, i2) = s(i3, i4) = 1 andcross-genre similarities are zero.





Toy Example

Biased but Full Recommended Set Diversity

i1 i2 i3 i4u1 1 (1) 1 (0) 1 (1

2) 0 (12)

u2 1 (0) 1 (1) 1 (12) 0 (1

2)u3 0 (1

2) 1 (12) 1 (1) 1 (0)

u4 1 (1) 0 (0) 1(12) 1 (1

2)

Always recommends an item from G1 and an item from G2.Probability of i1 being recommended to a randomly selecteduser – 1

4(1 + 0 + 12 + 1) = 5

8 – is higher than that of i2 (38),

for instance.Recommendations do not spread evenly across the productcatalogue.

Biased towards consistently recommending i1 to u1 but neverrecommending i2 to u1.





Toy Example

No System Level Biases

i1 i2 i3 i4u1 1 (1) 1 (1

3) 1 (13) 0 (1

3)u2 0 (1

3) 1 (1) 1 (13) 1 (1

3)u3 1 (1

3) 0 (13) 1 (1) 1 (1

3)u4 1 (1

3) 1 (13) 0 (1

3) 1 (1)

The probability of recommending i1, to a randomly chosenrelevant user (i.e. u1, u3 or u4) is 1

3(1 + 13 + 1

3) = 59 .

Similarly, for i2, i3 and i4.

Focusing on the set of items that are relevant to u1 (i.e. i1, i2and i3), the algorithm is three times as likely to recommend i1as either of the other relevant items.





Toy Example

No System or User Level Biases

i1 i2 i3 i4u1 1 (1

3) 1 (13) 1 (1

3) 0 (1)u2 0 (1) 1 (1

3) 1 (13) 1 (1

3)u3 1 (1

3) 0 (1) 1 (13) 1 (1

3)u4 1 (1

3) 1 (13) 0 (1) 1 (1

3)

Same probability of recommending any relevant item to a user

Same probability that an item is recommended when it isrelevant.





Algorithm Diversity

Definition

We define an algorithm to be fully diverse from the userperspective if it recommends any of the user’s set of relevant itemswith equal probability.

Definition

We define an algorithm to be fully diverse from the systemperspective if the probability of recommending an item, when it isrelevant is equal across all items.





Lorenz Curve and the Gini Index

A plot of the cumulative proportion of the product catalogueagainst cumulative proportion of sales






69% of the sales are of the 10% top selling products.






G = 0 implies equal sales to all products. G = 1 when singleproduct gets all sales.





Measuring Recommendation Success

Measurement unit of success in recommender systems = Hit

Interpret as the recommendation of a product known to beliked by the user.





Hits Inequality – Concentration Curves of Hits

Lorenz curve and gini index measure inequality within thehits distribution over all items in the product catalogue.

Concentration curve and concentration index of hits vspopularity measures bias of hits distribution towards popularitems.

Concentraion curve and concentration index of hits vsnovelty measures bias of hits distribution towards novelitems.





Concentration Curves

n products accrue hits {h1, . . . , hn} – concentration curve dependson correlation between hits and popularity.





Temporal Diversity

Lathia et al. (2010) investigates diversity over time – dorecommendations change over time?

Now diversity is measured between two recommended sets,formed at different points in time

diversity(Ri+1, Ri) =1n|Ri+1\Ri|

And novelty is measured as the number of new items over alltime

novelty(Ri+1) =1n|Ri+1\ ∪i

j=1 Ri|

kNN algorihms exhibit more temporal diversity than SVDmatrix factorisation

Switching between multiple algorithms is offered as one meansto improve temporal diversity.




Serendipity

Outline







Serendipity

Measuring the Unexpected

Serendipity – the extent to which recommendations maypositively surprise users.Murakami et al. (2008) propose to measure unexpectedness asthe “distance between results produced by the method to beevaluated and those produced by a primitive predictionmethod”.

=1n

n∑i=1

max(Pr(si)− Prim(si), 0)× rel(si)×∑i

j=1 rel(sj)i

Ge et al. (2010) follow a similar approach, such that if R1 isthe recommended set returned by the RS and R2 is a setreturned by Prim then

serendipity =1

|R1\R2|∑

sj∈R1\R2

rel(sj)




Serendipity

Novelty vs Serendipity

Novelty with regard to a given set is a measure of howdifferent an item is to other items in the set;

It does not involve any notion of relevance

Is a serendipitous recommendation equivalent to a relevantnovel recommendation?

To me, serendipity encapsulates a higher degree of risk – anovel item with a low chance of relevance, according to ourmodel, which yet turns out to be relevant.




Serendipity

Conclusions

IR research gives some directions in how to define andevaluate diversity and novelty

We can ask

Are these adequate for RS research?Can we map them to the needs of RS evaluation?How are they deficient?

Recent research is beginning to clarify these issues for RS

I believe that objective measures are possible

I look forward to some interesting discussions on these issues!!




Serendipity

Thank You

My research is sponsored by Science Foundation Ireland undergrant 08/SRC/I1407: Clique: Graph and Network Analysis Cluster




Serendipity

References I

Agrawal, R., Gollapudi, S., Halverson, A. and Ieong, S.: 2009,Diversifying search results, Proceedings of the Second ACMInternational Conference on Web Search and Data Mining,WSDM ’09, ACM, New York, NY, USA, pp. 5–14.URL: http://doi.acm.org/10.1145/1498759.1498766

Boyce, B. R.: 1982, Beyond topicality : A two stage view ofrelevance and the retrieval process, Inf. Process. Manage.18(3), 105–109.




Serendipity

References II

Carbonell, J. and Goldstein, J.: 1998, The use of mmr,diversity-based reranking for reordering documents andproducing summaries, Proceedings of the 21st annualinternational ACM SIGIR conference on Research anddevelopment in information retrieval, SIGIR ’98, ACM, NewYork, NY, USA, pp. 335–336.URL: http://doi.acm.org/10.1145/290941.291025

Castells, P., Vargas, S. and Wang, J.: 2011, Novelty and DiversityMetrics for Recommender Systems: Choice, Discovery andRelevance, International Workshop on Diversity in DocumentRetrieval (DDR 2011) at the 33rd European Conference onInformation Retrieval (ECIR 2011).




Serendipity

References III

Chen, H. and Karger, D. R.: 2006, Less is more: probabilisticmodels for retrieving fewer relevant documents, in E. N.Efthimiadis, S. T. Dumais, D. Hawking and K. Jarvelin (eds),SIGIR, ACM, pp. 429–436.

Clarke, C. L., Kolla, M., Cormack, G. V., Vechtomova, O., Ashkan,A., Buttcher, S. and MacKinnon, I.: 2008, Novelty and diversityin information retrieval evaluation, Proceedings of the 31stannual international ACM SIGIR conference on Research anddevelopment in information retrieval, SIGIR ’08, ACM, NewYork, NY, USA, pp. 659–666.URL: http://doi.acm.org/10.1145/1390334.1390446




Serendipity

References IV

Ge, M., Delgado-Battenfeld, C. and Jannach, D.: 2010, Beyondaccuracy: evaluating recommender systems by coverage andserendipity, Proceedings of the fourth ACM conference onRecommender systems, RecSys ’10, ACM, New York, NY, USA,pp. 257–260.URL: http://doi.acm.org/10.1145/1864708.1864761

Goffman, W.: 1964, On relevance as a measure, InformationStorage and Retrieval 2(3), 201–203.

Gollapudi, S. and Sharma, A.: 2009, An axiomatic approach forresult diversification, Proceedings of the 18th internationalconference on World wide web, WWW ’09, ACM, New York,NY, USA, pp. 381–390.URL: http://doi.acm.org/10.1145/1526709.1526761




Serendipity

References V

Lathia, N., Hailes, S., Capra, L. and Amatriain, X.: 2010,Temporal diversity in recommender systems, in F. Crestani,S. Marchand-Maillet, H.-H. Chen, E. N. Efthimiadis andJ. Savoy (eds), SIGIR, ACM, pp. 210–217.

Murakami, T., Mori, K. and Orihara, R.: 2008, Metrics forevaluating the serendipity of recommendation lists, in K. Satoh,A. Inokuchi, K. Nagao and T. Kawamura (eds), New Frontiersin Artificial Intelligence, Vol. 4914 of Lecture Notes in ComputerScience, Springer Berlin / Heidelberg, pp. 40–46.

Zhai, C. and Lafferty, J.: 2006, A risk minimization framework forinformation retrieval, Inf. Process. Manage. 42, 31–55.URL: http://dx.doi.org/10.1016/j.ipm.2004.11.003


Towards Diverse Recommendation

Technology

Transcript of Towards Diverse Recommendation