Modeling Sequential Preferences with Dynamic User and … · 2021. 6. 11. · Peter The notion of...
Transcript of Modeling Sequential Preferences with Dynamic User and … · 2021. 6. 11. · Peter The notion of...
Modeling Sequential Preferences
with Dynamic User and Context Factors
Duc-Trong Le, Yuan Fang, Hady W. Lauw
ECML-PKDD 2016
Riva Del Garda, Italy
Outline
• Motivating examples and Models
• Modeling sequential preferences (HMM)
• Modeling Dynamic User-Bias Emissions (SEQ-E)
• Modeling Dynamic Context-Biased Transitions (SEQ-T)
• Joint Model (SEQ*)
• Experiments
• Real-life Datasets: Twitter & Yes.com
• Synthetic Dataset
2
The notion of Sequence – Song playlists
3
Dream
onCrazy Layla
Faded ForceDisfigure
…
…
Sequential
Preference
Tom
Mary
t=0 t=1 t=2 …
Modeling Sequential Preferences
4
HMM-Formulation: 𝜃 = (𝜋, 𝐴, 𝐵)
• 𝜋 is the initial state distribution: 𝜋𝑥 ≜ 𝑃 𝑋1 = 𝑥 ;
• A is the transition matrix: 𝐴𝑥𝑢 ≜ 𝑃 𝑋𝑡 = 𝑢 𝑋𝑡−1 = 𝑥);
• B is the emission matrix: 𝐵𝑥𝑦 = 𝑃 𝑌𝑡 = 𝑦 𝑋𝑡= 𝑥);
∀ 𝑥, 𝑢 ∈ 𝒳; 𝑦 ∈ 𝒴; 𝑡 ∈ {1, 2, … . }
Hidden Markov Model (HMM)
latent states 𝒳
Observations 𝒴
Modeling Sequential Preferences
5
HMM-Example: A HMM model with 2 latent states, 4 items
• 𝜋 = 𝜋0, 𝜋1 = 0.8, 0.2 𝜋0 = 𝑃 𝑋1 = 0 = 0.8
• 𝐴 =𝐴00 𝐴01𝐴10 𝐴11
=0.7 0.30.6 0.4
𝐴00 = 𝑃(𝑋𝑡 = 0 | 𝑋𝑡−1 = 0) = 0.7
• 𝐵 =𝐵00 𝐵01 𝐵02 𝐵03𝐵10 𝐵11 𝐵12 𝐵13
=0.6 0.1 0.2 0.10.3 0.5 0.1 0.1
𝐵00 = 𝑃(𝑌𝑡 = "Dream on" | 𝑋𝑡 = 0) = 0.6
𝐵10 = 𝑃(𝑌𝑡 = "Dream on" | 𝑋𝑡 = 1) = 0.3
The notion of Group
6
Dream
onCrazy Layla
Faded Force Disfigure
Tom
Mary
t=0 t=1 t=2
Front-
ierFly
AwayWay Back
HomePeter
Soft
Rock
EDM
EDM
Hypotheses: There exists different groups of users
• Users in the same group share the same emission probabilities
• Users across groups may have different emission probabilities.
Modeling Dynamic User-Bias Emissions
7
Formulation: 𝜃 = 𝜋, 𝜎, 𝐴, 𝐵 with a set of groups 𝒢
• 𝜎 is the group distribution: 𝜎𝑔 ≜ 𝑃(𝐺 = 𝑔)
• B is the new emission tensor: 𝐵𝑔𝑥𝑦 ≜ 𝑃 𝑌𝑡 = 𝑦 𝑋𝑡 = 𝑥, 𝑮 = 𝒈)
∀ 𝑥, 𝑢 ∈ 𝒳; 𝑦 ∈ 𝒴; 𝑔 ∈ 𝒢; 𝑡 ∈ {1, 2, … . }
• Example: 𝐵000 = 𝑃 𝑌𝑡 = "Dream on" 𝑋𝑡 = 0,𝑮 = 𝟎) = 0.8
𝐵100 = 𝑃 𝑌𝑡 = "Dream on" 𝑋𝑡 = 0,𝑮 = 𝟏) = 0.3
SEQ-E
Model
…𝜋 …
𝑌𝑡
𝑋𝑡
𝑌𝑡−1𝑌1
𝑋𝑡−1𝑋1
𝜎 𝐺
Peter
The notion of Context Features, Factors
8
Disfigure
Artist: Blank
Tag: Deep House
Artist: Alan Walker
Tag: Deep HouseArtist: Alan Walker
Tag: Melodic House
Faded Force
Mary
Context
FeaturesLatent Context Factors
Hypotheses: There exists context features and factors
• Latent context factors manifest through context features
• Transitions are affected by latent context factors.
Front-
ier
Fly
AwayWay Back
HomeEDM
Artist: Krale
Tag: DubstepArtist: Krys Talk
Tag: DubstepArtist: Krys Talk
Tag: Chill Trap
EDM
Modeling Dynamic Context-Biased Transitions
9
Hypothesis:
Transitions are affected by latent context factors
Formulation: 𝜃 = 𝜋, 𝜌, 𝐴, 𝐵, 𝐶 with a context factors set ℛ;
• 𝜌 is the distribution of the latent context factor: 𝜌𝑟 ≜ 𝑃 𝑅𝑡 = 𝑟 ;
• A is the new transition tensor: 𝐴𝑟𝑥𝑢 ≜ 𝑃 𝑋𝑡 = 𝑢 𝑋𝑡−1 = 𝑥, 𝑅𝑡−1 = 𝑟);
∀ 𝑥, 𝑢 ∈ 𝒳; 𝑟 ∈ 1, … , ℛ ;
• Examples: 𝐴100 = 𝑃 𝑋𝑡 = 0 𝑋𝑡−1 = 0,𝑹𝒕−𝟏 = 𝟏) = 0.9
𝐴000 = 𝑃 𝑋𝑡 = 0 𝑋𝑡−1 = 0,𝑹𝒕−𝟏 = 𝟎) = 0.4
SEQ-T
Model
…𝜋 …
𝑌𝑡
𝑋𝑡
𝑌𝑡−1
𝑋𝑡−1
𝜌
𝑅𝑡−2 𝑅𝑡−1
𝜌
Modeling Dynamic Context-Biased Transitions
10
Hypothesis:
Latent context factors manifest through context features
Formulation: 𝜃 = 𝜋, 𝜌, 𝐴, 𝐵, 𝐶 with a context features set F =
{𝐹1, 𝐹2, … }, each 𝐹𝑖 takes a values set ℱ𝑖
• C is the feature probability matrix: 𝐶𝑟𝑖𝑓 ≜ 𝑃 𝐹𝑡𝑖 = 𝑓 𝑅𝑡 = 𝑟);
𝑟 ∈ 1, … , ℛ ; 𝑖 ∈ 1,… , 𝐹 ; 𝑓 ∈ ℱ𝑖; 𝑡 ∈ {1, 2, … }
• Examples: 𝐶101 = 𝑃( 𝑭𝒕𝟎 = 𝟏 𝑅𝑡 = 1) = 0.99
𝐶100 = 𝑃( 𝑭𝒕𝟎 = 𝟎 𝑅𝑡 = 1) = 0.01
SEQ-T
Model
…𝜋 …
𝑌𝑡
𝑋𝑡
𝑌𝑡−1
𝑋𝑡−1
𝜌
𝑅𝑡−2…
𝑅𝑡−1𝐹𝑡−11
𝐹𝑡−1|𝐹|
…𝐹𝑡−11
𝐹𝑡−1|𝐹|
… …
𝜌
Joint Model
11
SEQ*
Model
Main idea: Jointly capture user and context factors in a single model
Parameters: The six-tuple 𝜃 = 𝜋, 𝜎, 𝜌, 𝐴, 𝐵, 𝐶 as above
Prediction: 𝑦∗ = argmax𝑦𝑃(𝑌𝑇+1 = 𝑦| 𝑌1, . . , 𝑌𝑇 , 𝐹1, … , 𝐹𝑇; 𝜃∗)
Outline
• Motivating examples and Models
• Modeling sequential preferences (HMM)
• Modeling Dynamic User-Bias Emissions (SEQ-E)
• Modeling Dynamic Context-Biased Transitions (SEQ-T)
• Joint Model (SEQ*)
• Experiments
• Real-life Datasets: Twitter & Yes.com
• Synthetic Dataset
12
Experimental Setup – Real-life Datasets
• Research Question: Do the latent user and context
factors result in significant improvements over HMM ?
• Datasets:
• Features:
• Categories of tags (Yes.com)
• Tweet information (Twitter): #Retweet, Created Time, etc.
Dataset #Observation #SequenceAverage
Length
Song playlists
(Yes.com)3168 250k 7
Hashtag Sequences
(Twitter )2121 114k 19
13
Experimental Setup – Real-life Datasets
• Task: Last item prediction
– For each testing sequence 𝑠 = 𝑌1, 𝑌2, … , 𝑌𝑇−1, 𝑌𝑇 𝑇 ≥ 2
– Save 𝑌𝑇 as ground-truth target. Predict the last item given the
previous items 𝑃 𝑌candidate 𝑌1, 𝑌2, … , 𝑌𝑇−1)
• Evaluation Metrics
• 𝑅𝑒𝑐𝑎𝑙𝑙@𝐾 =# sequences with ground truth in top 𝐾
# sequences in the testing set
Example:
Given a sequence 𝑠 = 𝑖10, 𝑖2, 𝑖5, 𝑖8 ;
𝑃 𝑖candidate 𝑖10, 𝑖2, 𝑖5) ⇒ ranki8 = 9
𝑆test = {𝑠1, 𝑠2, 𝑠3} with respective ranks of actual items
are 3, 6, 11. Recall@5 = 1/3; Recall@10 = 2/3
• Mean Reciprocal Rank (MRR)
14
Result – Twitter - Recall@1%
15
#Group 𝒢 = 2, #Context Factor Level ℛ = 2, #Feature |F| = 7
12
16
20
24
28
32
5 10 15
Rec
all@
1%
Number of States Twitter
HMM
SEQ-T
SEQ-E
SEQ*
+4.1
+5.1
+4.8
Result – Yes.com - Recall@1%
16
#Group 𝒢 = 2, #Context Factor Level ℛ = 2, #Feature |F| = 11
12
16
20
24
28
32
5 10 15
Rec
all@
1%
Number of States Yes.com
HMM
SEQ-T
SEQ-E
SEQ*
+10
.3
+6.3
+4.5
Experimental Setup – Synthetic Dataset
• Research Questions:
– Can the implementation recover parameters from the synthetic
dataset?
– Could the effect of latent user and context factors be simulated by
increasing the number of HMM’s states?
• Generative process:
• Task: Last item prediction
• Evaluation metrics: Recall@1, MRR
Dataset #Observation #SequenceAverage
Length
Synthetic 4 10k 10
17
Result - Synthetic Dataset – Recall@1
50
60
70
80
90
2 6 10 14
Recall@1
NumberofStates
HMM SEQ-T
SEQ-E SEQ*
18
#Group 𝒢 = 2, #Context Factor Level ℛ = 2, #Feature |F| = 4
Conclusion
• Introduce and model dynamic user and context factors
to capture sequential preferences.
• The proposed model contributes statistically significant
improvement as compared to the baseline HMM in term
of top-K recommendations.
19
Thank you!
Q&A
Any further questions, please contact us:
Backup Slides
Result – Yes.com – Tuning Parameters
0.032
0.035
0.037
0.040
15
16
17
18
19
2 4 6 8 10
MRR
Recall@1%
NumberofFeatures
Recall@1%
MRR0.030
0.033
0.035
0.038
18
19
20
2 4 6 8
MRR
Recall@1%
NumberofContextFactorLevels
Recall@1%
MRR
0.035
0.045
0.055
0.065
20
23
26
29
32
2 4 6 8MRR
Recall@1%
NumberofGroups
Recall@1%MRR
12
Synthetic Dataset – Generative Process
• Initial Probability: 𝜋 = 0.8, 0.2
• Latent Context Factor Distribution: 𝜌 = 0.3, 0.7
• Group Distribution: 𝜎 = {0.9, 0.1}
• Transition Tensor A:
The first context factor favors self-transition to the same state
The second context factor encourages the state switching.
𝐴 = 𝐴0, 𝐴1 ; 𝐴1 =0.01 0.990.70 0.30
;𝐴0 =0.99 0.010.30 0.70
#Group 𝒢 2#Context Factor
Level ℛ2
#States 𝒳 2 #Feature |F| 4
#Observation 𝒴 4 #Feature values |ℱ| 2
10
Result – Twitter – Recall@K & MRR
13
#Group 𝒢 = 2, #Context Factor Level ℛ = 2, #Feature |F| = 7
Result – Yes.com – Recall@K & MRR
12
#Group 𝒢 = 2, #Context Factor Level ℛ = 2, #Feature |F| = 11
Result - Synthetic Dataset – MRR
0.75
0.80
0.85
0.90
0.95
2 6 10 14
MRR
NumberofStates
HMM SEQ-T
SEQ-E SEQ*
15
#Group 𝒢 = 2, #Context Factor Level ℛ = 2, #Feature |F| = 4
Synthetic Dataset – Generative Process
• Emission Tensor B:
Each pair of (state, group) favors one of the four items
𝐵 = 𝐵0, 𝐵1 ; 𝐵0 =
0.991 0.0030.0030.0030.003
0.0030.9910.003
; 𝐵1=
0.003 0.0030.9910.0030.003
0.0030.0030.991
• Feature matrix C:
Each context factor level is associated with 2 of the 4 features.
𝐶 = 𝐶0, 𝐶1 ; 𝐶0 =
0.10 0.900.200.900.90
0.800.100.10
; 𝐶1=
0.90 0.100.900.100.30
0.100.900.70
11