Department of Automation Xiamen University Youchun Ji, Wenxing Hong*, Jianwei Qi November, 2015...

22
Department of Automation Xiamen University Youchun Ji , Wenxing Hong*, Jianwei Qi November, 2015 Missing Value Prediction Using Co-clustering and RBF for Collaborative Filtering

Transcript of Department of Automation Xiamen University Youchun Ji, Wenxing Hong*, Jianwei Qi November, 2015...

Page 1: Department of Automation Xiamen University Youchun Ji, Wenxing Hong*, Jianwei Qi November, 2015 Missing Value Prediction Using Co- clustering and RBF for.

Department of AutomationXiamen University

Youchun Ji, Wenxing Hong*, Jianwei QiNovember, 2015

Missing Value Prediction Using Co-clustering and RBF for

Collaborative Filtering

Page 2: Department of Automation Xiamen University Youchun Ji, Wenxing Hong*, Jianwei Qi November, 2015 Missing Value Prediction Using Co- clustering and RBF for.

Department of Environmental Science 8-Jun-20061

Page 3: Department of Automation Xiamen University Youchun Ji, Wenxing Hong*, Jianwei Qi November, 2015 Missing Value Prediction Using Co- clustering and RBF for.

Department of Environmental Science 8-Jun-2006

Case

i.xmrc.com.cnWebsite

Interest

5501.cn 17du.info

Job recommendationExpert finding News recommendation

2012-2014 2014-now 2014-now

2

Page 4: Department of Automation Xiamen University Youchun Ji, Wenxing Hong*, Jianwei Qi November, 2015 Missing Value Prediction Using Co- clustering and RBF for.

Department of Environmental Science 8-Jun-2006

Outline

Introduction1

Algorithms & Experiments

2

3

Conclusion4

The Problem Definition

3

Page 5: Department of Automation Xiamen University Youchun Ji, Wenxing Hong*, Jianwei Qi November, 2015 Missing Value Prediction Using Co- clustering and RBF for.

Department of Environmental Science 8-Jun-2006

Introduction

1.Jannach, D., M. Zanker, A. Felfernig, &G. Friedrich, Recommender systems: an introduction. 2010: Cambridge University Press.

2.Zheng, L., L. Li, W. Hong, &T. Li, PENETRATE: Personalized news recommendation using ensemble hierarchical clustering. Expert Systems with Applications, 2013. 40(6): p. 2127-2136.

3.Das, A.S., M. Datar, A. Garg, &S. Rajaram. Google news personalization: scalable online collaborative filtering. in Proceedings of the 16th international conference on World Wide Web. 2007. ACM.

4.Breese, J.S., D. Heckerman, &C. Kadie. Empirical analysis of predictive algorithms for collaborative filtering. in Proceedings of the Fourteenth conference on Uncertainty in artificial intelligence. 1998. Morgan Kaufmann Publishers Inc.

Help users find interesting articles that match the users’ preference as much as possible.

4

Page 6: Department of Automation Xiamen University Youchun Ji, Wenxing Hong*, Jianwei Qi November, 2015 Missing Value Prediction Using Co- clustering and RBF for.

Department of Environmental Science 8-Jun-2006

Introduction

Collaborative filtering is one of the most successful methods for news recommendation systems.

1.Pazzani, M.J., A framework for collaborative, content-based and demographic filtering. Artificial Intelligence Review, 1999. 13(5-6): p. 393-408.

2.Huang, Z., H. Chen, &D. Zeng, Applying associative retrieval techniques to alleviate the sparsity problem in collaborative filtering. ACM Transactions on Information Systems (TOIS), 2004. 22(1): p. 116-142.

3.Hofmann, T., Latent semantic models for collaborative filtering. ACM Transactions on Information Systems (TOIS), 2004. 22(1): p. 89-115.

4.Blei, D.M., A.Y. Ng, &M.I. Jordan, Latent dirichlet allocation. The Journal of machine Learning research, 2003. 3: p. 993-1022.

5

Page 7: Department of Automation Xiamen University Youchun Ji, Wenxing Hong*, Jianwei Qi November, 2015 Missing Value Prediction Using Co- clustering and RBF for.

Department of Environmental Science 8-Jun-2006

Motivation

1.Zhang, S., W. Wang, J. Ford, &F. Makedon. Learning from Incomplete Ratings Using Non-negative Matrix Factorization. in SDM. 2006. SIAM.

2.Dhillon, I.S. Co-clustering documents and words using bipartite spectral graph partitioning. in Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining. 2001. ACM.

The sparsity of user-item rating matrix will lead to the negative effect of collaborative filtering algorithm.

Scenario 2

Scenario 3

Scenario 1

In order to overcome the problem, we predict the values of user-item rating matrix combining two approaches: co-clustering and Radial Basis Function network (RBF).

The number of news which users have read is far less than the news published on the website.

6

Page 8: Department of Automation Xiamen University Youchun Ji, Wenxing Hong*, Jianwei Qi November, 2015 Missing Value Prediction Using Co- clustering and RBF for.

Department of Environmental Science 8-Jun-2006

Outline

Introduction1

Algorithms & Experiments

2

3

Conclusion4

The Problem Definition

7

Page 9: Department of Automation Xiamen University Youchun Ji, Wenxing Hong*, Jianwei Qi November, 2015 Missing Value Prediction Using Co- clustering and RBF for.

Department of Environmental Science 8-Jun-2006

The Problem Definition

1.George, T., &S. Merugu. A scalable collaborative filtering framework based on co-clustering. in Data Mining, Fifth IEEE International Conference on. 2005. IEEE.

8

Page 10: Department of Automation Xiamen University Youchun Ji, Wenxing Hong*, Jianwei Qi November, 2015 Missing Value Prediction Using Co- clustering and RBF for.

Department of Environmental Science 8-Jun-2006

Outline

Introduction1

Algorithms & Experiments

2

3

Conclusion4

The Problem Definition

9

Page 11: Department of Automation Xiamen University Youchun Ji, Wenxing Hong*, Jianwei Qi November, 2015 Missing Value Prediction Using Co- clustering and RBF for.

Department of Environmental Science 8-Jun-2006

Data sample – Reading History

v1 v2 v3 v4 v5 v6 v7 v8u1 3 3 0 0 0 0 0 0u2 0 0 2 0 0 0 0 0u3 0 2 0 0 0 0 0 0u4 0 0 4 0 3 0 0 0u5 1 0 0 0 0 0 0 0u6 0 0 5 4 0 0 0 5u7 0 0 0 0 3 0 0 0u8 1 0 0 0 4 0 0 0u9 0 0 0 0 4 0 0 4u10 0 0 0 0 0 0 0 0u11 0 0 4 0 0 0 0 0u12 0 0 0 0 3 0 0 0u13 0 0 0 0 0 0 0 0

News ID

User ID

10

Page 12: Department of Automation Xiamen University Youchun Ji, Wenxing Hong*, Jianwei Qi November, 2015 Missing Value Prediction Using Co- clustering and RBF for.

Department of Environmental Science 8-Jun-2006

Algorithms – Flow chart

11

Page 13: Department of Automation Xiamen University Youchun Ji, Wenxing Hong*, Jianwei Qi November, 2015 Missing Value Prediction Using Co- clustering and RBF for.

Department of Environmental Science 8-Jun-2006

Algorithms – Co-clustering

Algorithm I Co-clustering Input: user-item rating matrix ijR , clusters k Output: cluster matrices (k)R Method: 1. Initializ (k | u, v, r)p . Let (k | u, v, r) 1

k

p

2. Recalculate (k | u)p according to Eq. 3.

3. Recalculate (k | )p v according to Eq. 4.

4. Calculate (r | )p k according to Eq. 5. 5. Recalculate (k | u, v, r)p according to

Eq. 2. And choose cluster k with the maximum probability as the class.

6. Repeat step 2 until (k | u, v, r)p converges.

'

[p(k | u) ] [p(k | v) ] [p(r | k) ](k | u, v, r)

[p(k' | u) ] [p(k' | v) ] [p(r | k') ]

k

p

(u)

' (u)

(k | u, v, r)

(k | u)(z' | u, v, r)

v V

z v V

p

pp

(v)

' ( )

(k | u, v, r)

(k | )(z' | u, v, r)

u U

z u U v

p

p vp

'

(k | u, v, r)(r | )

( | u, v, r')

r

pp k

p k

(k 3 | u1, v1, r) 0.2109p 0.8756

0.7535(k 3 | u1) 0.6030p

(k 3 | 1) 0.3321p v 0.9400

(r 3 | 3) 0.7832p k 0.2241

1. Hu, W., W. Yong-Ji, W. Zhe, W. Xiu-Li, et al., Two-Phase Collaborative Filtering Algorithm Based on Co-Clustering. Journal of Software. 21: p. 1042-1054 (in Chinese).

2. George, T., &S. Merugu. A scalable collaborative filtering framework based on co-clustering. in Data Mining, Fifth IEEE International Conference on. 2005. IEEE.

3. Dhillon, I.S. Co-clustering documents and words using bipartite spectral graph partitioning. in Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining. 2001. ACM.

12

Page 14: Department of Automation Xiamen University Youchun Ji, Wenxing Hong*, Jianwei Qi November, 2015 Missing Value Prediction Using Co- clustering and RBF for.

Department of Environmental Science 8-Jun-2006

Algorithms - RBF

Algorithm II Radial Basis Function network Input: a matrix (k)R after co-clustering ( |1 &1 ijr i M j N ), max min& I I are the maximum and the minimum of RBF activation function. Output: matrix R'(k) after predicting the missing values. Method: 1. For j=1 to N 2. Calculate

1kc according to Eq. 6. 3. Make use of the activation function

(g)ip to calculate the Euclidean distance ip ip kg r c , when 11 p k .

4. Calculate the weight , using the formula:

max max min

max max min, ,1

( (r )) / ( ),1

( (r )) / ( )

I I iI I I

i n

I x I x xI x xx

i M

5. Calculate ' (r )ij ijr F according to Eq. 7.

1

1

ij 11

, ij 1

1, r 0, i 1,2, , k

10 r 0, i 1,2, , k

j

ij jk j

j

kr

c k i

1

(X ) ( )

K

ki i kk

F X C

2

2(r) exp( ), 0

2

r

1. https://en.wikipedia.org/wiki/Radial_basis_function_network.

2. Fuliang, X., &Z. Huiying, A Research of Collaborative Filtering Recommender MethodBased on SOM and RBFN Filling Missing Values. XIANDAI TUSHU QINGBAO JISHU, 2014. 7/8: p. 56-63 (in Chinese).

13

Page 15: Department of Automation Xiamen University Youchun Ji, Wenxing Hong*, Jianwei Qi November, 2015 Missing Value Prediction Using Co- clustering and RBF for.

Department of Environmental Science 8-Jun-2006

Platform - http://yiqidu.xmu.edu.cn/

14

Page 16: Department of Automation Xiamen University Youchun Ji, Wenxing Hong*, Jianwei Qi November, 2015 Missing Value Prediction Using Co- clustering and RBF for.

Department of Environmental Science 8-Jun-2006

Data set – XMU News

The experiment data comes from Xiamen University news reading website which is focus on campus news. It includes 9502 users, 6372 news and 932640 rating. The sparseness of the user-item rating matrix is 98.46%. The data set was divided into testing set and training sets.

Rating:: UserID:: NewsID:: News title

1. Jiang, S., &W. Hong. A vertical news recommendation system: CCNS—An example from Chinese campus news reading system. in Computer Science & Education (ICCSE), 2014 9th International Conference on. 2014. IEEE.

15

Page 17: Department of Automation Xiamen University Youchun Ji, Wenxing Hong*, Jianwei Qi November, 2015 Missing Value Prediction Using Co- clustering and RBF for.

Department of Environmental Science 8-Jun-2006

Experiments

The number of co-clustering is 36 in the experiment. After prediction the missing values, the sparseness of the user-item rating matrix reduce to about 60%.

Before>0.95Before>0.95

After<0.65After<0.65

co- clustering number

sparseness

16

Page 18: Department of Automation Xiamen University Youchun Ji, Wenxing Hong*, Jianwei Qi November, 2015 Missing Value Prediction Using Co- clustering and RBF for.

Department of Environmental Science 8-Jun-2006

Experiments

As the experiment result shows, the prediction method that combine co-clustering and RBF work effective on XMUNEWS data set. The root mean square error is 1.553.

Algorithm RMSE Time(s)

Co-clustering 2.455 40

RBF 2.092 150

Co-clustering & RBF 1.553 330

17

Page 19: Department of Automation Xiamen University Youchun Ji, Wenxing Hong*, Jianwei Qi November, 2015 Missing Value Prediction Using Co- clustering and RBF for.

Department of Environmental Science 8-Jun-2006

Outline

Introduction1

Algorithms & Experiments

2

3

Conclusion4

The Problem Definition

18

Page 20: Department of Automation Xiamen University Youchun Ji, Wenxing Hong*, Jianwei Qi November, 2015 Missing Value Prediction Using Co- clustering and RBF for.

Department of Environmental Science 8-Jun-2006

Conclusion

• Before prediction, the sparseness of the user-item rating matrix is above 96%. But after prediction, it reduce to below 60%.

• The root mean square error of true rating values and prediction rating values is 1.553 on XMUNEWS data set. As the experiment result shows, the combining algorithm is better than the separate algorithm.

• We built an online website to collect data and do experiments (http://yiqidu.xmu.edu.cn/).

• For future work, we will concentrate on how to improve the computational efficiency and how to choose the number of clusters.

19

Page 21: Department of Automation Xiamen University Youchun Ji, Wenxing Hong*, Jianwei Qi November, 2015 Missing Value Prediction Using Co- clustering and RBF for.

Department of Environmental Science 8-Jun-2006

References

1. Jannach, D., M. Zanker, A. Felfernig, &G. Friedrich, Recommender systems: an introduction. 2010: Cambridge University Press.

2. Zheng, L., L. Li, W. Hong, &T. Li, PENETRATE: Personalized news recommendation using ensemble hierarchical clustering. Expert Systems with Applications, 2013. 40(6): p. 2127-2136.

3. Das, A.S., M. Datar, A. Garg, &S. Rajaram. Google news personalization: scalable online collaborative filtering. in Proceedings of the 16th international conference on World Wide Web. 2007. ACM.

4. Breese, J.S., D. Heckerman, &C. Kadie. Empirical analysis of predictive algorithms for collaborative filtering. in Proceedings of the Fourteenth conference on Uncertainty in artificial intelligence. 1998. Morgan Kaufmann Publishers Inc.

5. Pazzani, M.J., A framework for collaborative, content-based and demographic filtering. Artificial Intelligence Review, 1999. 13(5-6): p. 393-408.

6. Huang, Z., H. Chen, &D. Zeng, Applying associative retrieval techniques to alleviate the sparsity problem in collaborative filtering. ACM Transactions on Information Systems (TOIS), 2004. 22(1): p. 116-142.

7. Hofmann, T., Latent semantic models for collaborative filtering. ACM Transactions on Information Systems (TOIS), 2004. 22(1): p. 89-115.

8. Blei, D.M., A.Y. Ng, &M.I. Jordan, Latent dirichlet allocation. The Journal of machine Learning research, 2003. 3: p. 993-1022.

9. Zhang, S., W. Wang, J. Ford, &F. Makedon. Learning from Incomplete Ratings Using Non-negative Matrix Factorization. in SDM. 2006. SIAM.

10. Dhillon, I.S. Co-clustering documents and words using bipartite spectral graph partitioning. in Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining. 2001. ACM.

11. George, T., &S. Merugu. A scalable collaborative filtering framework based on co-clustering. in Data Mining, Fifth IEEE International Conference on. 2005. IEEE.

12. https://en.wikipedia.org/wiki/Radial_basis_function_network.

13. Fuliang, X., &Z. Huiying, A Research of Collaborative Filtering Recommender MethodBased on SOM and RBFN Filling Missing Values. XIANDAI TUSHU QINGBAO JISHU, 2014. 7/8: p. 56-63 (in Chinese).

14. Jiang, S., &W. Hong. A vertical news recommendation system: CCNS—An example from Chinese campus news reading system. in Computer Science & Education (ICCSE), 2014 9th International Conference on. 2014. IEEE. 

1. Jannach, D., M. Zanker, A. Felfernig, &G. Friedrich, Recommender systems: an introduction. 2010: Cambridge University Press.

2. Zheng, L., L. Li, W. Hong, &T. Li, PENETRATE: Personalized news recommendation using ensemble hierarchical clustering. Expert Systems with Applications, 2013. 40(6): p. 2127-2136.

3. Das, A.S., M. Datar, A. Garg, &S. Rajaram. Google news personalization: scalable online collaborative filtering. in Proceedings of the 16th international conference on World Wide Web. 2007. ACM.

4. Breese, J.S., D. Heckerman, &C. Kadie. Empirical analysis of predictive algorithms for collaborative filtering. in Proceedings of the Fourteenth conference on Uncertainty in artificial intelligence. 1998. Morgan Kaufmann Publishers Inc.

5. Pazzani, M.J., A framework for collaborative, content-based and demographic filtering. Artificial Intelligence Review, 1999. 13(5-6): p. 393-408.

6. Huang, Z., H. Chen, &D. Zeng, Applying associative retrieval techniques to alleviate the sparsity problem in collaborative filtering. ACM Transactions on Information Systems (TOIS), 2004. 22(1): p. 116-142.

7. Hofmann, T., Latent semantic models for collaborative filtering. ACM Transactions on Information Systems (TOIS), 2004. 22(1): p. 89-115.

8. Blei, D.M., A.Y. Ng, &M.I. Jordan, Latent dirichlet allocation. The Journal of machine Learning research, 2003. 3: p. 993-1022.

9. Zhang, S., W. Wang, J. Ford, &F. Makedon. Learning from Incomplete Ratings Using Non-negative Matrix Factorization. in SDM. 2006. SIAM.

10. Dhillon, I.S. Co-clustering documents and words using bipartite spectral graph partitioning. in Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining. 2001. ACM.

11. George, T., &S. Merugu. A scalable collaborative filtering framework based on co-clustering. in Data Mining, Fifth IEEE International Conference on. 2005. IEEE.

12. https://en.wikipedia.org/wiki/Radial_basis_function_network.

13. Fuliang, X., &Z. Huiying, A Research of Collaborative Filtering Recommender MethodBased on SOM and RBFN Filling Missing Values. XIANDAI TUSHU QINGBAO JISHU, 2014. 7/8: p. 56-63 (in Chinese).

14. Jiang, S., &W. Hong. A vertical news recommendation system: CCNS—An example from Chinese campus news reading system. in Computer Science & Education (ICCSE), 2014 9th International Conference on. 2014. IEEE.  20

Page 22: Department of Automation Xiamen University Youchun Ji, Wenxing Hong*, Jianwei Qi November, 2015 Missing Value Prediction Using Co- clustering and RBF for.

Department of Environmental Science 8-Jun-2006

Acknowledgment

The research was supported by the National Natural Science Foundation of China under Grant No.61303081 and by the Fundamental Research Funds for the Xiamen University under Grant No. 20720152008.

Q&A

Thanks!

21