Page 1 A Random Walk Method for Alleviating the Sparsity Problem in Collaborative Filtering Hilmi...

A Random Walk Method for Allevi-ating the Sparsity Problem in Col-

laborative Filtering

Hilmi Yıldırım and Mukkai S. Krishnamoorthy Rensselaer Polytechnic Institute Computer Science

Department Troy, New York {yildih2,moorthy}@cs.rpi.edu

2011 Spring Seminar Presented by Sangkeun Lee

HISTORICAL REVIEW – USER-BASED CF

5

4

7 7

8

Aggregation function: often weighted sum

Weight depends on similarity

Neighbours are people who have similar tastes as active user

Reference Lecture Slide from ‘http://www.abdn.ac.uk/~csc263/teaching/AIS/AIS/lectures/abdn.only/Collaborative-Filtering.ppt’

HISTORICAL REVIEW – ITEM-BASED CF

Item 3

2

18

7

Aggregation function: often weighted sum

Weight depends on similarity

Item 5

Item 4

Item 2

Item 1

Item 1

Item 2

Item 3

Item 4

Item 5

Item 6

Item 7

Item 8

Item 9

User 1 1 3 1 1 1

User 2 5 3 2 1 4 5

User 3 5 4 5 1 5

User 4 1 2 3 2 4

…

User m

5 3 2 1 4 ?

PageRank & A New Approach

• Motivation behind the algorithm: PageRank on ranking items. – Random walk of users on the entire web graph– Assigns each page a rank of its importance (authority).– Specifically the PageRank of a page is the probability of visiting in a random walk of

web graph

• Authors’ approach– Item graph where the nodes are items and the edges between nodes represent simi-

larities of items– Furthermore users are assumed to make a finite-length walk as a difference to

PageRank method.• Thus ranking becomes dependent on the initial distribution of the user

– Three components• Building the item graph which captures the similarity of items between each other• The second component computes the rank values of items for each user by simulating a

random walk• Finally the last component interprets and scales the rank scores as ratings for each

user-item pair.

The Model

• Similarity matrices are usually too sparse to capture actual dependencies be-tween items.

– item i that hasn’t been rated by any user who has rated item j : similarity score of 0

– However these items would be found as closely to each other, if another item t is similar to both items.

• Random Walk Recommender captures these transitive associations in vari-ous levels proportional to the length of the random walk.

• Parameterize the length of the walk according to the sparsity level of the rating matrix: α

• Markov chain model– probability of being in a state solely depends on the previous step– is the random variable for the user being at an item in step of his random walk.– In each step of the walk user decides to continue his walk with probability α– We build an item graph P of size m, where the weight of the edge between nodes i and

j is equal to the probability of passing from item i to item j of a user.

③

The Model

• Model– build an item graph P of size m– , represent the initial distribution of the user on items, namely =

①

②

④

i 노드에서 j 노드로 넘어갈 확률값을 가지는 행렬 P 구성

K step 에 유저 u 가 아이템 j 에 있을 확률 계산

종합하여 유저 u 가 아이템 j 에 있을 확률 계산

최종 아이템의 랭크는 단순 행렬 곱으로 표현됨

Note that various similarity measures can be usedSimilar Item? Or

uniform distribu-tion?

Scale Rank to Rat-ings

More about the model

Cosine Similarity

Adjusted Cosine Similarity

Computing Similarities

Interpreting Rank Scores

Basically the score is for top-K Rec-ommendation

But for Rating Prediction, authors lin-early scaled up each row of values such that the maximum of each row corre-sponds to 5.

I doubt it!Computational Cost

computing similarity matrix is O(m^2n)vector-matrix multiplication which has complexity O (m^2)

I doubt it! too 역행렬 계산 cost 고려하지 않음

Experiments

• MovieLens– This data set contains 1,000,209 ratings of 6040 anonymous MovieLens

users on 3952 movies

Discussion

• Summary– Presented and experimentally evaluated a model-based item-oriented collabora-

tive filtering algorithm. – outperforms a slightly modified version of item based top-N algorithm in all test

cases since top-N is a special case of Random Walk Recommender. – better than top-N algorithm especially when training data is sparse. – For extremely sparse data sets optimal α values approaches 1 whereas it ap-

proaches to 0 as data gets denser. – Random Walk Recommender captures some transitive associations between items.

• Questions?– Few doubts in Paper – Time Complexity, Linear scaling up?– Interesting application of random walk for recommendation– RWWR vs. finite steps of randomwalk

Page 1 A Random Walk Method for Alleviating the Sparsity Problem in Collaborative Filtering Hilmi...

Documents

Transcript of Page 1 A Random Walk Method for Alleviating the Sparsity Problem in Collaborative Filtering Hilmi...