Building Recommender Systems - Mendeley and Science Direct

42
| 0 Daniel Kershaw (@danjamker) Building Recommenders 20 th September 2017

Transcript of Building Recommender Systems - Mendeley and Science Direct

Page 1: Building Recommender Systems - Mendeley and Science Direct

| 0

Daniel Kershaw (@danjamker)

Building Recommenders

20th September 2017

Page 2: Building Recommender Systems - Mendeley and Science Direct

| 1

Mendeley

• Reference Manager

• Social Network

• Publication Catalogue

Page 3: Building Recommender Systems - Mendeley and Science Direct

| 2

Science Direct

• Scientific publication database

• Used by the majority of

university and research

institutions

• Contains 12 million articles of

content from 3,500 academic

journals and 34,000 e-books

Page 4: Building Recommender Systems - Mendeley and Science Direct

| 3

Why Recommendations

Pull

Allow users to discover more content

Make it easier to navigate catalogue

Page 5: Building Recommender Systems - Mendeley and Science Direct

| 4

Why Recommendations

Pull

Allow users to discover more content

Make it easier to navigate catalogue

Push

Highlight new content to users

Bring users back to service

Page 6: Building Recommender Systems - Mendeley and Science Direct

| 5

The five core components

Data Collection

Recommender Model

Recommendation Post Processing

Online Modules

User Interface

Page 7: Building Recommender Systems - Mendeley and Science Direct

| 6

Outline

Developed Algorithms – keeping it simple

Practical Considerations – don’t look stupid

Implementation – how to scale a system

Evaluation – what is good enough

Evolution – what’s changed over time

Future Direction – the future’s bright the future’s is deep

Page 8: Building Recommender Systems - Mendeley and Science Direct

| 7

Developed Algorithms

Page 9: Building Recommender Systems - Mendeley and Science Direct

| 8

Available Data

Implicit

User libraries (Mendeley)

User article interactions (Science Direct)

Content

Abstracts

Titles

References

Page 10: Building Recommender Systems - Mendeley and Science Direct

| 9

Content Based

Similarity between what users

have read

Similarity in references

Collaborative Collaborative

Matrix Factorization

KNN

LDA

Potential Methods

Page 11: Building Recommender Systems - Mendeley and Science Direct

| 10

User item interaction matrix

User base CF – (kNN)

https://buildingrecommenders.wordpress.com/2015/11/18/overview-of-recommender-algorithms-part-2/

Page 12: Building Recommender Systems - Mendeley and Science Direct

| 11

Similarity between query users and other readers

User base CF – (kNN)

https://buildingrecommenders.wordpress.com/2015/11/18/overview-of-recommender-algorithms-part-2/

Page 13: Building Recommender Systems - Mendeley and Science Direct

| 12

Similarity between all users

User base CF – (kNN)

https://buildingrecommenders.wordpress.com/2015/11/18/overview-of-recommender-algorithms-part-2/

Page 14: Building Recommender Systems - Mendeley and Science Direct

| 13

Generating recommendations for user

User base CF – (kNN)

https://buildingrecommenders.wordpress.com/2015/11/18/overview-of-recommender-algorithms-part-2/

Page 15: Building Recommender Systems - Mendeley and Science Direct

| 14

• Ability to scale

• Matrix incredibly sparse

Why not Matrix Factorization

Page 16: Building Recommender Systems - Mendeley and Science Direct

| 15

Practical Considerations

Page 17: Building Recommender Systems - Mendeley and Science Direct

| 16

Explore/Exploit (Dithering)

Recommendations generated in batch

Users want an interactive experience

Slight shuffles give the impression of

freshness

Allow for the exploration of the list if only

a proportion shown

𝑠𝑐𝑜𝑟𝑒𝑑𝑖𝑡ℎ𝑒𝑟𝑒𝑑 = log 𝑟𝑎𝑛𝑘 + 𝑁 0, log 𝜖

where 𝜀 =∆ 𝑟𝑎𝑛𝑘

𝑟𝑎𝑛𝑘and tipically 𝜀 ∈ [1.5,2]

Page 18: Building Recommender Systems - Mendeley and Science Direct

| 17

Impression Discounting

• Experience deteriorates if exposed to the same information

• Push recommendations seen before down the list

Rank

Impressions

Page 19: Building Recommender Systems - Mendeley and Science Direct

| 18

Impression Discounting

• Experience deteriorates if exposed to the same information

• Push recommendations seen before down the list

𝑠𝑐𝑜𝑟𝑒𝑛𝑒𝑤 = scoreoriginal ∗ (w1 ∗ g impCount + w2 ∗ g lastSeen )

See Lee, P. et. al

Page 20: Building Recommender Systems - Mendeley and Science Direct

| 19

Business Logic (Pre and Post Filtering)

Don’t show items they already have (bought, added, consumed)

Don’t feed the recommender positive feedback from recommender

Don’t recommend out of stock items

• A bad recommender has a cost

- Can be greater than not receiving a recommendation

Page 21: Building Recommender Systems - Mendeley and Science Direct

| 20

Implementation

Page 22: Building Recommender Systems - Mendeley and Science Direct

| 21

Systems Architecture

Impression

Discounting

API

Front End

AWS

Dithering

Candidate Selection

Conte

nt

Based

Item

2Ite

m

CF

Online

Offline

Logs

Page 23: Building Recommender Systems - Mendeley and Science Direct

| 22

The unbundled mess

Page 24: Building Recommender Systems - Mendeley and Science Direct

| 23

System

• Which run generated the

recommendation

• What was served to the user

• How was the score modified

• What was removed from the

recommendations

User (Feedback loop)

• What was displayed

• What was clicked

• When were they served

• Where the recommendations

displayed

Logging

Used for both debugging and feeding information to recommender

Page 25: Building Recommender Systems - Mendeley and Science Direct

| 24

Evolutions

Page 26: Building Recommender Systems - Mendeley and Science Direct

| 25

• User to Item CF

• Impression Discounting

Mendeley – Desktop Application

Page 27: Building Recommender Systems - Mendeley and Science Direct

| 26

Mendeley – Online

• Implicit – serves recommendations based on user libraries

• Recent Activity – based off recent additions to a users library

• Research Interests - based on user generated tags

• Discipline – based on their self identified discipline

Most Personalized

Least Personalized

See Hristakeva, M et. Al (2017)

Page 28: Building Recommender Systems - Mendeley and Science Direct

| 27

• Remove carousels

• Focus on implicit

recommendations

• Fall back to content based

solution

Mendeley – Online

Page 29: Building Recommender Systems - Mendeley and Science Direct

| 28

• Recommendation based of the

complete library of the user

• Don’t send the same

recommendations twice

Mendeley - Email

Page 30: Building Recommender Systems - Mendeley and Science Direct

| 29

• Item to Item

• Take user reading history

• Get recommendations for each

item

• Interleave recommendations

• Don’t send same

recommendations twice

Science Direct - Email

Page 31: Building Recommender Systems - Mendeley and Science Direct

| 30

Science Direct – Article Page

Item to Item

Dither

recommendations

every 30 minutes

Page 32: Building Recommender Systems - Mendeley and Science Direct

| 31

Evaluation

Page 33: Building Recommender Systems - Mendeley and Science Direct

| 32

Off-line Methodology

Train model QueryGround

truth

Time, user interactions

Test

Page 34: Building Recommender Systems - Mendeley and Science Direct

| 33

Off-line evaluation - Mendeley

From Hristakeva, M et. al

Page 35: Building Recommender Systems - Mendeley and Science Direct

| 34

Science Direct – Item-to-item

Page 36: Building Recommender Systems - Mendeley and Science Direct

| 35

• Infrastructure takes a long time to build

• Need feedback from users to learn

1. Generate recommendations off-line

2. Send to users via email (A/A)

3. Modify method based on feedback

4. Send second set of users split into A/B buckets

Static Recommendations for quick learnings

Email to users

Modify Recommender

Email to users

Page 37: Building Recommender Systems - Mendeley and Science Direct

| 36

Future Direction

Page 38: Building Recommender Systems - Mendeley and Science Direct

| 37

Learning to rank (LtR)

Currently only using implicit feedback

No content used

Use CF as candidate selection

Re-rank results based on learnt model

optimised for CtR

Use item and user features

Page 39: Building Recommender Systems - Mendeley and Science Direct

| 38

Deep Learning

Use to learn more complex features

Use as features in LtR

Build on the existing framework developed

Use pre-trained models before developing own

Page 40: Building Recommender Systems - Mendeley and Science Direct

| 39

Conclusion (Take Homes)

• Log EVERYTHING

• Start Simple

• Iterate quickly

• Get recommendations out quickly to learn

• Don’t look stupid

• CTR ≇ Off-line Evaluation

Page 41: Building Recommender Systems - Mendeley and Science Direct

| 40

www.elsevier.com/rd-solutions

Thank you,

Book chapter being written based on the content in this presentation

Page 42: Building Recommender Systems - Mendeley and Science Direct

| 41

References

Hristakeva, M., Kershaw, D., Rossetti, M., Knoth, P., Pettit, B., Vargas, S., & Jack, K. (2017). Building recommender systems for scholarly information. the 1st Workshop (pp. 25–32). New York, New York, USA: ACM. http://doi.org/10.1145/3057148.3057152

Rossetti, M., Stella, F., & Zanker, M. (2016). Contrasting Offline and Online Results when Evaluating Recommendation Algorithms (pp. 31–34). Presented at the Proceedings of the 10th ACM Conference on Recommender Systems, New York, NY, USA: ACM. http://doi.org/10.1145/2959100.2959176

Lee, P., Lakshmanan, L. V. S., Tiwari, M., & Shah, S. (2014). Modeling impression discounting in large-scale recommender systems (pp. 1837–1846). Presented at the Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, New York, USA: ACM Press. http://doi.org/10.1145/2623330.2623356

Koren, Y. (2010). Collaborative filtering with temporal dynamics. Communications of the ACM, 53(4), 89–97. http://doi.org/10.1145/1721654.1721677