USER-CENTERED DATA ANALYTICS AND MODELINGnet.pku.edu.cn/daim/hongzhi.yin/slides/invited_talk.pdfMy...

49
USER-CENTERED DATA ANALYTICS AND MODELING Hongzhi Yin School of ITEE University of Queensland, Australia [email protected] May 26, 2018

Transcript of USER-CENTERED DATA ANALYTICS AND MODELINGnet.pku.edu.cn/daim/hongzhi.yin/slides/invited_talk.pdfMy...

Page 1: USER-CENTERED DATA ANALYTICS AND MODELINGnet.pku.edu.cn/daim/hongzhi.yin/slides/invited_talk.pdfMy General Research •Data Mining •KDD, ICDM, WSDM, TKDD and TKDE •Database •SIGMOD,

USER-CENTERED DATA ANALYTICS AND MODELING

Hongzhi YinSchool of ITEE

University of Queensland, Australia

[email protected]

May 26, 2018

Page 2: USER-CENTERED DATA ANALYTICS AND MODELINGnet.pku.edu.cn/daim/hongzhi.yin/slides/invited_talk.pdfMy General Research •Data Mining •KDD, ICDM, WSDM, TKDD and TKDE •Database •SIGMOD,

About The University of Queensland

2

Page 3: USER-CENTERED DATA ANALYTICS AND MODELINGnet.pku.edu.cn/daim/hongzhi.yin/slides/invited_talk.pdfMy General Research •Data Mining •KDD, ICDM, WSDM, TKDD and TKDE •Database •SIGMOD,

About The University of Queensland

3

Page 4: USER-CENTERED DATA ANALYTICS AND MODELINGnet.pku.edu.cn/daim/hongzhi.yin/slides/invited_talk.pdfMy General Research •Data Mining •KDD, ICDM, WSDM, TKDD and TKDE •Database •SIGMOD,

The University of Queensland

4

Page 5: USER-CENTERED DATA ANALYTICS AND MODELINGnet.pku.edu.cn/daim/hongzhi.yin/slides/invited_talk.pdfMy General Research •Data Mining •KDD, ICDM, WSDM, TKDD and TKDE •Database •SIGMOD,

School of ITEE

5

Page 6: USER-CENTERED DATA ANALYTICS AND MODELINGnet.pku.edu.cn/daim/hongzhi.yin/slides/invited_talk.pdfMy General Research •Data Mining •KDD, ICDM, WSDM, TKDD and TKDE •Database •SIGMOD,

School of ITEE

6

Page 7: USER-CENTERED DATA ANALYTICS AND MODELINGnet.pku.edu.cn/daim/hongzhi.yin/slides/invited_talk.pdfMy General Research •Data Mining •KDD, ICDM, WSDM, TKDD and TKDE •Database •SIGMOD,

My Basic Information

• Education Background

• 2009.9-2014.7, Peking University, Ph.D. in Computer Science

• Supervisor: Prof. Bin Cui (长江特聘教授)

• 2014.9-2015.12, The University of Queensland, Postdoc Research

Fellow

• Supervisor: Prof. Xiaofang Zhou (IEEE Fellow, 千人计划)

• Working Experiences

• 2016.1-2018.12: ARC DECRA Fellow (澳洲优秀青年基金)

• 2017.1-Present: Lecturer in Data Science (Continuing Position),

Deputy Director of Master of Computer Science in The

University of Queensland

• 2018.1 – Present: Chief AI Counselor in One-stop Warehouse

Company (Top-1 Wholesale distributors of Solar Products)

7

Page 8: USER-CENTERED DATA ANALYTICS AND MODELINGnet.pku.edu.cn/daim/hongzhi.yin/slides/invited_talk.pdfMy General Research •Data Mining •KDD, ICDM, WSDM, TKDD and TKDE •Database •SIGMOD,

Selected Research Awards and Impact• Google Scholar Citations: 1166, H-index: 16

• My paper “LCARS: A Location-Content-Aware Recommender System” is top 1 cited paper among all KDD-13 oral papers.

• My TOIS’16 Paper “Joint modelling of user check-in behaviours for real-time point-of-interest recommendation” won 21st ACM Annual Best of Computing Award.

• Also Invited to present this work in SIGIR’18

• 2017.11, EAIT Faculty Early Career Researcher Award, The University of Queensland. (Only one winner in school of ITEE)

• 2016.1, Australia Discovery Early Career Researcher Award. (澳洲优青;

Only 6 winners in the information area across the whole Australia)

• Best Paper Award, 2016 Australian Database Conference

• 2014.7, Distinguished Doctor Degree Thesis Award, Peking University (Only 1 winner in CS department)

• 2014.5, Top-10 Distinguished Academic Fellow Award, Peking University8

Page 9: USER-CENTERED DATA ANALYTICS AND MODELINGnet.pku.edu.cn/daim/hongzhi.yin/slides/invited_talk.pdfMy General Research •Data Mining •KDD, ICDM, WSDM, TKDD and TKDE •Database •SIGMOD,

Selected Research Awards and Impact

9

Page 10: USER-CENTERED DATA ANALYTICS AND MODELINGnet.pku.edu.cn/daim/hongzhi.yin/slides/invited_talk.pdfMy General Research •Data Mining •KDD, ICDM, WSDM, TKDD and TKDE •Database •SIGMOD,

Publications in 2018

• 9 CCF A Conference Papers

• 4 KDD, 3 ICDE, 1 SIGIR, 1 IJCAI

• 6 CCF B Conference Papers

• 1 WSDM, 5 Dasfaa

• 1 CCF A Journal Papers

• 1 TKDE

• 4 JCR Zone 1-2 Papers

• 1 ACM TIST, 1 Information Science, 1 Knowledge-Based Systems,

1 Future Generation Computer Systems

• 4 CCF B Journal Papers

10

Page 11: USER-CENTERED DATA ANALYTICS AND MODELINGnet.pku.edu.cn/daim/hongzhi.yin/slides/invited_talk.pdfMy General Research •Data Mining •KDD, ICDM, WSDM, TKDD and TKDE •Database •SIGMOD,

Graduation Ceremony of My first PhD student

11

Page 12: USER-CENTERED DATA ANALYTICS AND MODELINGnet.pku.edu.cn/daim/hongzhi.yin/slides/invited_talk.pdfMy General Research •Data Mining •KDD, ICDM, WSDM, TKDD and TKDE •Database •SIGMOD,

My General Research• Data Mining

• KDD, ICDM, WSDM, TKDD and TKDE

• Database

• SIGMOD, VLDB, ICDE, VLDB J, TKDE

• Information Retrieval and Web Mining

• WWW, SIGIR, WSDM, CIKM, ACM TOIS

• Artificial Intelligence

• AAAI, IJCAI, ACM TIST

• Harnessing User Generated and Consumed Data for User-Centered Research

• Top-Tier Conferences and Journals: 60+ (21 CCF A/B SCI Journals)

• CCF A : 35+, CCF B: 20+, JCR Zone 1/2: 6

• 1 Scholar Book, 2 Book Chapters

• 45+ publications as the leading author 12

Page 13: USER-CENTERED DATA ANALYTICS AND MODELINGnet.pku.edu.cn/daim/hongzhi.yin/slides/invited_talk.pdfMy General Research •Data Mining •KDD, ICDM, WSDM, TKDD and TKDE •Database •SIGMOD,

My Research Interests

• Recommender Systems• Spatial Item Recommendation (KDD’13, KDD’15, KDD’17, TKDE’16,

TKDE’17, ICDE’16, TOIS’14, TOIS’16, TIST’17, TIST’18, CIKM’15, CIKM’16, ACM Multimedia’15, DASFAA’17, DASFAA’18, WWWJ’18)

• Streaming Recommendation (VLDB’13, TOIS’16, IJCAI’17, SIGIR’18, KDD’18)

• Temporal Event-aware Recommendation (SIGMOD’14, TOIS’15)

• Long Tail Recommendation (VLDB’12)

• Semantic-Aware User Behaviour Prediction (ICDM’17)

• Joint Event-Partner Recommendation in Event-based Social Networks (ICDE’18)

• Integrating Category-Aware User Privacy Preference for Mobile App Recommendation (ICDE’17, KBS’18)

• Online Recommendation Efficiency (KDD’13, TOIS’14, TKDE’16, TOIS’16, ICDE’16, WSDM’18, KDD’18)

13

Page 14: USER-CENTERED DATA ANALYTICS AND MODELINGnet.pku.edu.cn/daim/hongzhi.yin/slides/invited_talk.pdfMy General Research •Data Mining •KDD, ICDM, WSDM, TKDD and TKDE •Database •SIGMOD,

Online Recommendation Efficiency

• To support real-time recommendation response, smart retrieval algorithms + effective indexing structure

• Threshold based Algorithm (TA)

• LCARS: A Location-Content-Aware Recommender System (KDD’13)

• TA-Approximation Algorithm.

• LCARS: A Spatial Item Recommender System (TOIS’14)

• Attribute pruning-based algorithm (AP)

• Adapting to User Interest Drift for POI Recommendation (TKDE’16)

• Clustering-based branch and bound algorithm (CBB)

• Joint Modeling of User Check-in Behaviors for Real-time Point-of-Interest Recommendation (TOIS’16)

• Asymmetric Locality-sensitive hashing (ALSH)

• SPORE: A Sequential Personalized Spatial Item Recommender System(ICDE’16)

• Learning to Hash (L2H)

• Discrete Deep Learning for Fast Content-Aware Recommendation. (WSDM’18, KDD’18)14

Page 15: USER-CENTERED DATA ANALYTICS AND MODELINGnet.pku.edu.cn/daim/hongzhi.yin/slides/invited_talk.pdfMy General Research •Data Mining •KDD, ICDM, WSDM, TKDD and TKDE •Database •SIGMOD,

Recommended Reading

• http://web.cs.ucla.edu/~yzsun/Tutorials.htm

• Yizhou Sun, Hongzhi Yin, Xiang Ren. Context-Rich Recommendation: Integrating Links, Text, and Spatio-Temporal Dimensions. (KDD17 Tutorial)

• Yizhou Sun, Hongzhi Yin*, Xiang Ren. Recommendation in Context-Rich Environment: An Information Network Analysis Approach. (WWW 17 Tutorial)

15

Page 16: USER-CENTERED DATA ANALYTICS AND MODELINGnet.pku.edu.cn/daim/hongzhi.yin/slides/invited_talk.pdfMy General Research •Data Mining •KDD, ICDM, WSDM, TKDD and TKDE •Database •SIGMOD,

RECOMMENDED READINGS

16

Chapter: "Spatio-Temporal Recommendation in

Geo-Social Networks"

Page 17: USER-CENTERED DATA ANALYTICS AND MODELINGnet.pku.edu.cn/daim/hongzhi.yin/slides/invited_talk.pdfMy General Research •Data Mining •KDD, ICDM, WSDM, TKDD and TKDE •Database •SIGMOD,

My Research Interests• User Linkage across Social Networks and Platforms

• User Name, Content, Language Features (WWW’17, Information Science’18, Future Generation Computer Systems’18, WWWJ’18)

• Spatial and Temporal Features (CIKM’2017, ICDE’18)

• Community Discovery Beyond Network Structure • User Generated Textual Content, Spatial-temporal Co-occurrence, Network

Structure (ICDE’16)

• Topic Discovery and Event Detection• Unifying discovery of user interest-related topics and event-related topics

(ICDE’13, SIGMOD’14)

• Network Embedding and Multi-Relation Learning• Adaptive and Adversarial Model Optimization Algorithms (ICDM’17,

ICDE’18, KDD’18)

• Information Diffusion and Influence Maximization in Social Network• Distinguishing re-sharing behaviours from re-creating behaviours in information

diffusion (ICDE’15, World Wide Web) 17

Page 18: USER-CENTERED DATA ANALYTICS AND MODELINGnet.pku.edu.cn/daim/hongzhi.yin/slides/invited_talk.pdfMy General Research •Data Mining •KDD, ICDM, WSDM, TKDD and TKDE •Database •SIGMOD,

SPTF: A Scalable Probabilistic Tensor

Factorization Model for Semantic-Aware

Behaviour Prediction

18

Hongzhi Yin1, Hongxu Chen1, Hao Wang2

Yang Wang3 , Quoc Viet Hung Nguyen4

1The University of Queensland, 2360 Search Lab, Qihoo 360 Inc3University of New South Wales, 4Griffith University

Page 19: USER-CENTERED DATA ANALYTICS AND MODELINGnet.pku.edu.cn/daim/hongzhi.yin/slides/invited_talk.pdfMy General Research •Data Mining •KDD, ICDM, WSDM, TKDD and TKDE •Database •SIGMOD,

Outlines

• Background

– Rich Interaction Behaviors with Items

• Problem Definition

– Semantic-Aware User Behavior Prediction

• Our Solution

– A Scalable Probabilistic Tensor Factorization Model

• Experiments

• Summary

19

Page 20: USER-CENTERED DATA ANALYTICS AND MODELINGnet.pku.edu.cn/daim/hongzhi.yin/slides/invited_talk.pdfMy General Research •Data Mining •KDD, ICDM, WSDM, TKDD and TKDE •Database •SIGMOD,

Outlines

• Background

– Rich Interaction Behaviors with Items

• Problem Definition

– Semantic-Aware User Behavior Prediction

• Our Solution

– A Scalable Probabilistic Tensor Factorization Model

• Experiments

• Summary

20

Page 21: USER-CENTERED DATA ANALYTICS AND MODELINGnet.pku.edu.cn/daim/hongzhi.yin/slides/invited_talk.pdfMy General Research •Data Mining •KDD, ICDM, WSDM, TKDD and TKDE •Database •SIGMOD,

Rich Interaction Behaviours with Items

Like

Click

Browse

Share

Favorite

Purchase

AddtoCart

Subscribe

Download

Add to

SendPin it

Visit

……

Page 22: USER-CENTERED DATA ANALYTICS AND MODELINGnet.pku.edu.cn/daim/hongzhi.yin/slides/invited_talk.pdfMy General Research •Data Mining •KDD, ICDM, WSDM, TKDD and TKDE •Database •SIGMOD,

User Behaviours in Youtube

Page 23: USER-CENTERED DATA ANALYTICS AND MODELINGnet.pku.edu.cn/daim/hongzhi.yin/slides/invited_talk.pdfMy General Research •Data Mining •KDD, ICDM, WSDM, TKDD and TKDE •Database •SIGMOD,

User Behaviours in Pinterest

Page 24: USER-CENTERED DATA ANALYTICS AND MODELINGnet.pku.edu.cn/daim/hongzhi.yin/slides/invited_talk.pdfMy General Research •Data Mining •KDD, ICDM, WSDM, TKDD and TKDE •Database •SIGMOD,

User Behaviours in JD.COM

Page 25: USER-CENTERED DATA ANALYTICS AND MODELINGnet.pku.edu.cn/daim/hongzhi.yin/slides/invited_talk.pdfMy General Research •Data Mining •KDD, ICDM, WSDM, TKDD and TKDE •Database •SIGMOD,

User Behaviours in Alibaba.com

Page 26: USER-CENTERED DATA ANALYTICS AND MODELINGnet.pku.edu.cn/daim/hongzhi.yin/slides/invited_talk.pdfMy General Research •Data Mining •KDD, ICDM, WSDM, TKDD and TKDE •Database •SIGMOD,

Characteristics of User Interaction Data

• Implicit Feedback

– Only Positive Feedback is available and observed

– The unobserved user interaction behaviors

• Real negative feedback

• Potential positive feedback

• Heterogeneous Interaction Behaviors

– Different types of user behaviours imply different semantics

and user intention

– The way people interact with items is important for

understanding user intents and interests.

Page 27: USER-CENTERED DATA ANALYTICS AND MODELINGnet.pku.edu.cn/daim/hongzhi.yin/slides/invited_talk.pdfMy General Research •Data Mining •KDD, ICDM, WSDM, TKDD and TKDE •Database •SIGMOD,

Characteristics of User Interaction Data

• Skewed Interaction Behavior data

– The distribution of user interaction data w.r.t. behavior types

is heavily skewed

Click87%

Add2Favorite5%

Add2Cart6%

Purchase2%

Page 28: USER-CENTERED DATA ANALYTICS AND MODELINGnet.pku.edu.cn/daim/hongzhi.yin/slides/invited_talk.pdfMy General Research •Data Mining •KDD, ICDM, WSDM, TKDD and TKDE •Database •SIGMOD,

Outlines

• Background

– Rich Interaction Behaviors with Items

• Problem Definition

– Semantic-Aware User Behavior Prediction

• Our Solution

– A Scalable Probabilistic Tensor Factorization Model

• Experiments

• Summary

28

Page 29: USER-CENTERED DATA ANALYTICS AND MODELINGnet.pku.edu.cn/daim/hongzhi.yin/slides/invited_talk.pdfMy General Research •Data Mining •KDD, ICDM, WSDM, TKDD and TKDE •Database •SIGMOD,

Problem Definition

• Semantic-Aware Behavior Prediction

– Given a target user 𝑢𝑖 and an action type 𝑡𝑘, we aim to

predict top-n items on which 𝑢𝑖 will perform action 𝑡𝑘.

Page 30: USER-CENTERED DATA ANALYTICS AND MODELINGnet.pku.edu.cn/daim/hongzhi.yin/slides/invited_talk.pdfMy General Research •Data Mining •KDD, ICDM, WSDM, TKDD and TKDE •Database •SIGMOD,

Semantic-Aware User Behavior Prediction

• An alternative definition

– Given a target user 𝑢𝑖, we aim to predict top-n action-item pairs

(𝑡𝑘 , 𝑣𝑗) that 𝑢𝑖 will perform action 𝑡𝑘 on item 𝑣𝑗.

• What is important is not just what users interact with, but how they

interact with them

Page 31: USER-CENTERED DATA ANALYTICS AND MODELINGnet.pku.edu.cn/daim/hongzhi.yin/slides/invited_talk.pdfMy General Research •Data Mining •KDD, ICDM, WSDM, TKDD and TKDE •Database •SIGMOD,

Outlines

• Background

– Rich Interaction Behaviors with Items

• Problem Definition

– Semantic-Aware User Behavior Prediction

• Our Solution

– A Scalable Probabilistic Tensor Factorization Model

• Experiments

• Summary

31

Page 32: USER-CENTERED DATA ANALYTICS AND MODELINGnet.pku.edu.cn/daim/hongzhi.yin/slides/invited_talk.pdfMy General Research •Data Mining •KDD, ICDM, WSDM, TKDD and TKDE •Database •SIGMOD,

Representation of Heterogeneous Interaction Data

A set of users ; A set of items

A set of behavior types

A triple is used to represent an interaction record

All possible triples in can be grouped in a tensor

𝑦𝑖𝑘𝑗 is 1 if the triple 𝑥𝑖𝑘𝑗 is observed; otherwise it is 0;

Page 33: USER-CENTERED DATA ANALYTICS AND MODELINGnet.pku.edu.cn/daim/hongzhi.yin/slides/invited_talk.pdfMy General Research •Data Mining •KDD, ICDM, WSDM, TKDD and TKDE •Database •SIGMOD,

Tucker Decomposition

Complexity of model equation is cubic in k (the dimension of latent factors)

Page 34: USER-CENTERED DATA ANALYTICS AND MODELINGnet.pku.edu.cn/daim/hongzhi.yin/slides/invited_talk.pdfMy General Research •Data Mining •KDD, ICDM, WSDM, TKDD and TKDE •Database •SIGMOD,

Canonical Decomposition

Complexity of model equation is linear in k (the dimension of latent factors);

CD corresponds to TD with a static, diagonal core tensor.

Page 35: USER-CENTERED DATA ANALYTICS AND MODELINGnet.pku.edu.cn/daim/hongzhi.yin/slides/invited_talk.pdfMy General Research •Data Mining •KDD, ICDM, WSDM, TKDD and TKDE •Database •SIGMOD,

Limitations of Classic TF Methods

• Cannot apply to large-scale datasets

– Treat all non-observed examples as negative examples

– The numbers of users and items are in the scale of millions or even

billions, leading to a super-big dense tensor and huge computation cost

• Limited prediction accuracy

– Some of non-observed examples are potentially positive examples

• Cannot overcome the skewness issue of user interaction behaviors

– Treat each type of observed interaction behaviors equally

• Fail to capture user and item biases

– A user tends to perform “add-to-cart” behaviours rather than “add-to-

favourite” (user bias)

– A video has received more “like” than other videos (item bias)

Page 36: USER-CENTERED DATA ANALYTICS AND MODELINGnet.pku.edu.cn/daim/hongzhi.yin/slides/invited_talk.pdfMy General Research •Data Mining •KDD, ICDM, WSDM, TKDD and TKDE •Database •SIGMOD,

A Scalable Probabilistic Tensor Factorization Model

• For each triple , the probabilistic generative process is

as follows:

• The posterior distribution of the latent vectors of users, items and

behavior types is computed as follows:

Page 37: USER-CENTERED DATA ANALYTICS AND MODELINGnet.pku.edu.cn/daim/hongzhi.yin/slides/invited_talk.pdfMy General Research •Data Mining •KDD, ICDM, WSDM, TKDD and TKDE •Database •SIGMOD,

A Scalable Probabilistic Tensor Factorization Model

• Objective Function

• Pairwise Interaction Factorization to implement the utility function

– Its complexity of model equation is linear with the dimension 𝐷, much lower

than Tucker Decomposition (TD)

Item BiasUser Bias

Page 38: USER-CENTERED DATA ANALYTICS AND MODELINGnet.pku.edu.cn/daim/hongzhi.yin/slides/invited_talk.pdfMy General Research •Data Mining •KDD, ICDM, WSDM, TKDD and TKDE •Database •SIGMOD,

Model Optimization – Negative Sampling with SGD

• Directly optimizing the objective function is computationally

expensive, as the number of unobserved examples is cubic to the

number of users or items.

• Besides, not all unobserved examples are real negative examples.

• Inspired by the negative sampling technique proposed in word2vector

model, instead of treating all unobserved examples as negative, we

select a few most likely negative examples for model optimization.

• We propose a popularity-biased Bidirectional Negative Sampling

method to generate negative examples.

Page 39: USER-CENTERED DATA ANALYTICS AND MODELINGnet.pku.edu.cn/daim/hongzhi.yin/slides/invited_talk.pdfMy General Research •Data Mining •KDD, ICDM, WSDM, TKDD and TKDE •Database •SIGMOD,

Algorithm of Training SPTF

Page 40: USER-CENTERED DATA ANALYTICS AND MODELINGnet.pku.edu.cn/daim/hongzhi.yin/slides/invited_talk.pdfMy General Research •Data Mining •KDD, ICDM, WSDM, TKDD and TKDE •Database •SIGMOD,

How to sample a positive example in SGD

• User behaviours are heterogeneous in our problem, and the

distribution of positive examples w.r.t. behaviour types is

heavily skewed. In our collected T-mall dataset:

– Click behaviours: 86.58%

– Add-to-favourite: 4.93%

– Add-to-cart:5.91%

– Purchase: 2.57%

• For the widely-used uniform sampling method

– most of sampled positive examples would be associated with click

behaviours and the trained model would heavily bias towards click

behaviours.

Page 41: USER-CENTERED DATA ANALYTICS AND MODELINGnet.pku.edu.cn/daim/hongzhi.yin/slides/invited_talk.pdfMy General Research •Data Mining •KDD, ICDM, WSDM, TKDD and TKDE •Database •SIGMOD,

Adaptive ranking-based sampling approach

• A desirable sampler is expected to choose adversarial positive

examples with high probabilities

– Informative at the current state of learning and more helpful to

correct the model

• An intuitive idea is that that positive examples at a lower rank

should have a higher probability to be sampled, as this kind of

positive examples are more informative and helpful to correct

the current model parameters.

Page 42: USER-CENTERED DATA ANALYTICS AND MODELINGnet.pku.edu.cn/daim/hongzhi.yin/slides/invited_talk.pdfMy General Research •Data Mining •KDD, ICDM, WSDM, TKDD and TKDE •Database •SIGMOD,

Outlines

• Background

– Rich Interaction Behaviors with Items

• Problem Definition

– Semantic-Aware User Behavior Prediction

• Our Solution

– A Scalable Probabilistic Tensor Factorization Model

• Experiments

• Summary

44

Page 43: USER-CENTERED DATA ANALYTICS AND MODELINGnet.pku.edu.cn/daim/hongzhi.yin/slides/invited_talk.pdfMy General Research •Data Mining •KDD, ICDM, WSDM, TKDD and TKDE •Database •SIGMOD,

Dataset and Measurement

• This dataset (T-Mall) contains 480,723 products, 10000

users and their generated twenty-million behaviour records

during 18/11/2014 – 18/12/2014.

• We adopt the Hits Ratio and MRR (Mean Reciprocal Rank)

to measure the prediction accuracy.

Page 44: USER-CENTERED DATA ANALYTICS AND MODELINGnet.pku.edu.cn/daim/hongzhi.yin/slides/invited_talk.pdfMy General Research •Data Mining •KDD, ICDM, WSDM, TKDD and TKDE •Database •SIGMOD,

Comparison Method

• BPTF: A Bayesian Probabilistic Tensor Factorization (Xiong et al. SDM’10)

– Designed for rating prediction on explicit feedback datasets

– Only consider the observed examples

• RESCAL: A state-of-the-art tensor factorization model that was

proposed for factoring knowledge graph (Nickel et al. WWW’12)

– The behaviour type is represented by a matrix rather than a vector

– Only consider the observed examples

• BPR-PITF: Pairwise Interaction Tensor Factorization model

optimized by BPR-optimization framework (Rendle et al. WSDM’10)

– All unobserved examples are treated equally, and each positive example is

uniformly drawn

• BPR-SMF means applying BPR-based matrix factorization for each

type of user behaviours separately.

Page 45: USER-CENTERED DATA ANALYTICS AND MODELINGnet.pku.edu.cn/daim/hongzhi.yin/slides/invited_talk.pdfMy General Research •Data Mining •KDD, ICDM, WSDM, TKDD and TKDE •Database •SIGMOD,

Experimental Results

Page 46: USER-CENTERED DATA ANALYTICS AND MODELINGnet.pku.edu.cn/daim/hongzhi.yin/slides/invited_talk.pdfMy General Research •Data Mining •KDD, ICDM, WSDM, TKDD and TKDE •Database •SIGMOD,

Experimental Results

SPTF, BPR-PITF and BPR-SMF achieve much higher prediction accuracy than

RESCAL and BPTF, showing the importance of exploiting negative examples.

Both SPTF and BPR-PITF outperform BPR-SMF significantly. This demonstrates the

advantage of collective factorization over the separate factorization-based method.

BPR-SMF achieves its best prediction performance on the click behaviours, as the click

matrix is much denser than other three matrices.

The other three models achieve their highest prediction accuracy on other three types of

behaviours, as clicking behaviours provide strong signals for predicting other three

types of behaviours due to the sequential patterns of user actions on e-commerce sites.

Page 47: USER-CENTERED DATA ANALYTICS AND MODELINGnet.pku.edu.cn/daim/hongzhi.yin/slides/invited_talk.pdfMy General Research •Data Mining •KDD, ICDM, WSDM, TKDD and TKDE •Database •SIGMOD,

Study of Different Sampling Strategies

Page 48: USER-CENTERED DATA ANALYTICS AND MODELINGnet.pku.edu.cn/daim/hongzhi.yin/slides/invited_talk.pdfMy General Research •Data Mining •KDD, ICDM, WSDM, TKDD and TKDE •Database •SIGMOD,

Summary

• We developed a scalable probabilistic tensor factorization

model (SPTF) to predict semantic-aware user behaviours.

• To optimize/train the model of SPTF, we proposed a novel

bidirectional popularity-biased negative sampling technique to

leverage both observed and unobserved examples.

• We proposed a novel adaptive ranking-based sampling

approach to overcome the heavy skewness of the heterogeneous

behaviour data distribution w.r.t. behaviour types.

Page 49: USER-CENTERED DATA ANALYTICS AND MODELINGnet.pku.edu.cn/daim/hongzhi.yin/slides/invited_talk.pdfMy General Research •Data Mining •KDD, ICDM, WSDM, TKDD and TKDE •Database •SIGMOD,

Thank you!

55