Bayesian Personalized Feature Interaction Selection …...Factorization Machines I Linear...
Transcript of Bayesian Personalized Feature Interaction Selection …...Factorization Machines I Linear...
![Page 1: Bayesian Personalized Feature Interaction Selection …...Factorization Machines I Linear regression: O(d) ^r(x) = b 0 + Xd i=1 w ix i I Degree-2 polynomial regression: O(d2) r^(x)](https://reader033.fdocuments.us/reader033/viewer/2022042310/5ed73071c30795314c175d0b/html5/thumbnails/1.jpg)
Bayesian Personalized Feature Interaction Selection forFactorization Machines
Yifan Chen1,2 Pengjie Ren1 Yang Wang3 Maarten de Rijke1
1University of Amsterdam
2National University of Defense Technology
3Hefei University of Technology
Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 1 / 28
![Page 2: Bayesian Personalized Feature Interaction Selection …...Factorization Machines I Linear regression: O(d) ^r(x) = b 0 + Xd i=1 w ix i I Degree-2 polynomial regression: O(d2) r^(x)](https://reader033.fdocuments.us/reader033/viewer/2022042310/5ed73071c30795314c175d0b/html5/thumbnails/2.jpg)
IntroductionFactorization MachinesFeature Interaction Selection
Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 2 / 28
![Page 3: Bayesian Personalized Feature Interaction Selection …...Factorization Machines I Linear regression: O(d) ^r(x) = b 0 + Xd i=1 w ix i I Degree-2 polynomial regression: O(d2) r^(x)](https://reader033.fdocuments.us/reader033/viewer/2022042310/5ed73071c30795314c175d0b/html5/thumbnails/3.jpg)
Factorization Machines
What is Factorization Machine?I generic supervised learning methodI account for feature interactions with factored parameters
I the combination of features
#Hashtag
“comics”
“marvel”
“avengers”
Feature combinations
(“comics”,“marvel”)
(“comics”,“avengers”)
(“marvel”,“avengers”)
Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 3 / 28
![Page 4: Bayesian Personalized Feature Interaction Selection …...Factorization Machines I Linear regression: O(d) ^r(x) = b 0 + Xd i=1 w ix i I Degree-2 polynomial regression: O(d2) r^(x)](https://reader033.fdocuments.us/reader033/viewer/2022042310/5ed73071c30795314c175d0b/html5/thumbnails/4.jpg)
Factorization Machines
I Linear regression: O(d)
r̂(x) = b0 +d∑
i=1
wixi
I Degree-2 polynomial regression: O(d2)
r̂(x) = b0 +d∑
i=1
wixi +d∑
i=1
d∑j=i+1
wij · xixj
I Factorization machine: O(dk)
r̂(x) = b0 +d∑
i=1
wixi +d∑
i=1
d∑j=i+1
〈vi , vj〉 · xixj
Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 4 / 28
![Page 5: Bayesian Personalized Feature Interaction Selection …...Factorization Machines I Linear regression: O(d) ^r(x) = b 0 + Xd i=1 w ix i I Degree-2 polynomial regression: O(d2) r^(x)](https://reader033.fdocuments.us/reader033/viewer/2022042310/5ed73071c30795314c175d0b/html5/thumbnails/5.jpg)
Factorization Machines
I Linear regression: O(d)
r̂(x) = b0 +d∑
i=1
wixi
I Degree-2 polynomial regression: O(d2)
r̂(x) = b0 +d∑
i=1
wixi +d∑
i=1
d∑j=i+1
wij · xixj
I Factorization machine: O(dk)
r̂(x) = b0 +d∑
i=1
wixi +d∑
i=1
d∑j=i+1
〈vi , vj〉 · xixj
Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 4 / 28
![Page 6: Bayesian Personalized Feature Interaction Selection …...Factorization Machines I Linear regression: O(d) ^r(x) = b 0 + Xd i=1 w ix i I Degree-2 polynomial regression: O(d2) r^(x)](https://reader033.fdocuments.us/reader033/viewer/2022042310/5ed73071c30795314c175d0b/html5/thumbnails/6.jpg)
Factorization Machines
I Linear regression: O(d)
r̂(x) = b0 +d∑
i=1
wixi
I Degree-2 polynomial regression: O(d2)
r̂(x) = b0 +d∑
i=1
wixi +d∑
i=1
d∑j=i+1
wij · xixj
I Factorization machine: O(dk)
r̂(x) = b0 +d∑
i=1
wixi +d∑
i=1
d∑j=i+1
〈vi , vj〉 · xixj
Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 4 / 28
![Page 7: Bayesian Personalized Feature Interaction Selection …...Factorization Machines I Linear regression: O(d) ^r(x) = b 0 + Xd i=1 w ix i I Degree-2 polynomial regression: O(d2) r^(x)](https://reader033.fdocuments.us/reader033/viewer/2022042310/5ed73071c30795314c175d0b/html5/thumbnails/7.jpg)
Factorization Machines
Example
r̂(spider-man) =b0 + wcomics + wmarvel + wavengers+
〈vcomics, vmarvel〉+ 〈vcomics, vavengers〉+ 〈vmarvel, vavengers〉
#Hashtag
“comics”
“marvel”
“avengers”
Feature combinations
(“comics”,“marvel”)
(“comics”,“avengers”)
(“marvel”,“avengers”)
Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 5 / 28
![Page 8: Bayesian Personalized Feature Interaction Selection …...Factorization Machines I Linear regression: O(d) ^r(x) = b 0 + Xd i=1 w ix i I Degree-2 polynomial regression: O(d2) r^(x)](https://reader033.fdocuments.us/reader033/viewer/2022042310/5ed73071c30795314c175d0b/html5/thumbnails/8.jpg)
IntroductionFactorization MachinesFeature Interaction Selection
Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 6 / 28
![Page 9: Bayesian Personalized Feature Interaction Selection …...Factorization Machines I Linear regression: O(d) ^r(x) = b 0 + Xd i=1 w ix i I Degree-2 polynomial regression: O(d2) r^(x)](https://reader033.fdocuments.us/reader033/viewer/2022042310/5ed73071c30795314c175d0b/html5/thumbnails/9.jpg)
Factorization Machines for Recommendation
I Effective use of historical interactions between users and items
I Incorporate additional information associated with users or itemsI High-dimensional feature space
I #feature = #user + #item + #additionalI not all features or feature interactions are helpful
Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 7 / 28
![Page 10: Bayesian Personalized Feature Interaction Selection …...Factorization Machines I Linear regression: O(d) ^r(x) = b 0 + Xd i=1 w ix i I Degree-2 polynomial regression: O(d2) r^(x)](https://reader033.fdocuments.us/reader033/viewer/2022042310/5ed73071c30795314c175d0b/html5/thumbnails/10.jpg)
Factorization Machines for Recommendation
I Effective use of historical interactions between users and items
I Incorporate additional information associated with users or itemsI High-dimensional feature space
I #feature = #user + #item + #additionalI not all features or feature interactions are helpful
Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 7 / 28
![Page 11: Bayesian Personalized Feature Interaction Selection …...Factorization Machines I Linear regression: O(d) ^r(x) = b 0 + Xd i=1 w ix i I Degree-2 polynomial regression: O(d2) r^(x)](https://reader033.fdocuments.us/reader033/viewer/2022042310/5ed73071c30795314c175d0b/html5/thumbnails/11.jpg)
Feature Interaction Selection (FIS)
Filter out useless feature interactions
I P-FIS: Select feature interactionsfor users personally
I FIS: select a common set ofinteractions
x1
x2
x3
x4
x1 · x2
x1 · x3
x1 · x4
x2 · x3
x2 · x4
x3 · x4
x1 · x3
x1 · x4
x2 · x4
FM
FIS for u1 and u2
Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 8 / 28
![Page 12: Bayesian Personalized Feature Interaction Selection …...Factorization Machines I Linear regression: O(d) ^r(x) = b 0 + Xd i=1 w ix i I Degree-2 polynomial regression: O(d2) r^(x)](https://reader033.fdocuments.us/reader033/viewer/2022042310/5ed73071c30795314c175d0b/html5/thumbnails/12.jpg)
Feature Interaction Selection (FIS)
Filter out useless feature interactions
I P-FIS: Select feature interactionsfor users personally
x1
x2
x3
x4
x1 · x2
x1 · x3
x1 · x4
x2 · x3
x2 · x4
x3 · x4
x1 · x3
x1 · x3
x2 · x3
x2 · x3
x2 · x4
x2 · x4
FM
FM
P-FIS for u1
P-FIS for u2
I FIS: select a common set ofinteractions
Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 8 / 28
![Page 13: Bayesian Personalized Feature Interaction Selection …...Factorization Machines I Linear regression: O(d) ^r(x) = b 0 + Xd i=1 w ix i I Degree-2 polynomial regression: O(d2) r^(x)](https://reader033.fdocuments.us/reader033/viewer/2022042310/5ed73071c30795314c175d0b/html5/thumbnails/13.jpg)
IntroductionFactorization MachinesFeature Interaction Selection
Model descriptionBayesian personalized feature interaction selectionEfficient optimization
Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 9 / 28
![Page 14: Bayesian Personalized Feature Interaction Selection …...Factorization Machines I Linear regression: O(d) ^r(x) = b 0 + Xd i=1 w ix i I Degree-2 polynomial regression: O(d2) r^(x)](https://reader033.fdocuments.us/reader033/viewer/2022042310/5ed73071c30795314c175d0b/html5/thumbnails/14.jpg)
Personalized Factorization Machines (PFM)
FM
r̂(x) = b0 +d∑
i=1
wixi +d∑
i=1
d∑j=i+1
wij · xixj
PFM
r̂(x) = bu +d∑
i=1
wuixi +d∑
i=1
d∑j=i+1
wuij · xixj
Select 1st-order interactions {xi} and 2nd-order interactions {xixj} by {wui} and {wuij}
Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 10 / 28
![Page 15: Bayesian Personalized Feature Interaction Selection …...Factorization Machines I Linear regression: O(d) ^r(x) = b 0 + Xd i=1 w ix i I Degree-2 polynomial regression: O(d2) r^(x)](https://reader033.fdocuments.us/reader033/viewer/2022042310/5ed73071c30795314c175d0b/html5/thumbnails/15.jpg)
Bayesian Variable Selection (BVS)
I Apply BVS to select feature interactionsI avoid expensive cross-validation
I Priors for BVSI sparsity priorsI spike-and-slab
Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 11 / 28
![Page 16: Bayesian Personalized Feature Interaction Selection …...Factorization Machines I Linear regression: O(d) ^r(x) = b 0 + Xd i=1 w ix i I Degree-2 polynomial regression: O(d2) r^(x)](https://reader033.fdocuments.us/reader033/viewer/2022042310/5ed73071c30795314c175d0b/html5/thumbnails/16.jpg)
Bayesian Variable Selection
Spike-and-slab
I Spike (black arrow):p(w = 0) = 0.5
I Slab (blue line)
Sparsity priors
I f (w) = 12b exp
(− |x−µ|b
)I p(w = 0) = 0
Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 12 / 28
![Page 17: Bayesian Personalized Feature Interaction Selection …...Factorization Machines I Linear regression: O(d) ^r(x) = b 0 + Xd i=1 w ix i I Degree-2 polynomial regression: O(d2) r^(x)](https://reader033.fdocuments.us/reader033/viewer/2022042310/5ed73071c30795314c175d0b/html5/thumbnails/17.jpg)
Hereditary Spike-and-Slab Priors
I Spike-and-slab
s ∼ Bernoulli(π), w̃ ∼ N (0, 1), w = w̃ · s.
I Hereditary spike-and-slabI capture the relations between 1st-order and 2nd-order feature interactions
sui , suj ∼ Bernoulli(π1)
p(suij = 1 | sui suj = 1) = 1 (Strong heredity)
p(suij = 1 | sui + suj = 1) = π2 (Weak heredity)
p(suij = 1 | sui + suj = 0) = 0
Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 13 / 28
![Page 18: Bayesian Personalized Feature Interaction Selection …...Factorization Machines I Linear regression: O(d) ^r(x) = b 0 + Xd i=1 w ix i I Degree-2 polynomial regression: O(d2) r^(x)](https://reader033.fdocuments.us/reader033/viewer/2022042310/5ed73071c30795314c175d0b/html5/thumbnails/18.jpg)
Hereditary Spike-and-Slab Priors
I Spike-and-slab
s ∼ Bernoulli(π), w̃ ∼ N (0, 1), w = w̃ · s.
I Hereditary spike-and-slabI capture the relations between 1st-order and 2nd-order feature interactions
sui , suj ∼ Bernoulli(π1)
p(suij = 1 | sui suj = 1) = 1 (Strong heredity)
p(suij = 1 | sui + suj = 1) = π2 (Weak heredity)
p(suij = 1 | sui + suj = 0) = 0
Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 13 / 28
![Page 19: Bayesian Personalized Feature Interaction Selection …...Factorization Machines I Linear regression: O(d) ^r(x) = b 0 + Xd i=1 w ix i I Degree-2 polynomial regression: O(d2) r^(x)](https://reader033.fdocuments.us/reader033/viewer/2022042310/5ed73071c30795314c175d0b/html5/thumbnails/19.jpg)
Hereditary Spike-and-Slab Priors
I Spike-and-slab
s ∼ Bernoulli(π), w̃ ∼ N (0, 1), w = w̃ · s.
I Hereditary spike-and-slabI capture the relations between 1st-order and 2nd-order feature interactions
sui , suj ∼ Bernoulli(π1)
p(suij = 1 | sui suj = 1) = 1 (Strong heredity)
p(suij = 1 | sui + suj = 1) = π2 (Weak heredity)
p(suij = 1 | sui + suj = 0) = 0
Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 13 / 28
![Page 20: Bayesian Personalized Feature Interaction Selection …...Factorization Machines I Linear regression: O(d) ^r(x) = b 0 + Xd i=1 w ix i I Degree-2 polynomial regression: O(d2) r^(x)](https://reader033.fdocuments.us/reader033/viewer/2022042310/5ed73071c30795314c175d0b/html5/thumbnails/20.jpg)
Hereditary Spike-and-Slab Priors
I Spike-and-slab
s ∼ Bernoulli(π), w̃ ∼ N (0, 1), w = w̃ · s.
I Hereditary spike-and-slabI capture the relations between 1st-order and 2nd-order feature interactions
sui , suj ∼ Bernoulli(π1)
p(suij = 1 | sui suj = 1) = 1 (Strong heredity)
p(suij = 1 | sui + suj = 1) = π2 (Weak heredity)
p(suij = 1 | sui + suj = 0) = 0
Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 13 / 28
![Page 21: Bayesian Personalized Feature Interaction Selection …...Factorization Machines I Linear regression: O(d) ^r(x) = b 0 + Xd i=1 w ix i I Degree-2 polynomial regression: O(d2) r^(x)](https://reader033.fdocuments.us/reader033/viewer/2022042310/5ed73071c30795314c175d0b/html5/thumbnails/21.jpg)
Hereditary Spike-and-Slab Priors
I Spike-and-slab
s ∼ Bernoulli(π), w̃ ∼ N (0, 1), w = w̃ · s.
I Hereditary spike-and-slabI capture the relations between 1st-order and 2nd-order feature interactions
sui , suj ∼ Bernoulli(π1)
p(suij = 1 | sui suj = 1) = 1 (Strong heredity)
p(suij = 1 | sui + suj = 1) = π2 (Weak heredity)
p(suij = 1 | sui + suj = 0) = 0
Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 13 / 28
![Page 22: Bayesian Personalized Feature Interaction Selection …...Factorization Machines I Linear regression: O(d) ^r(x) = b 0 + Xd i=1 w ix i I Degree-2 polynomial regression: O(d2) r^(x)](https://reader033.fdocuments.us/reader033/viewer/2022042310/5ed73071c30795314c175d0b/html5/thumbnails/22.jpg)
Generative Procedure of BP-FIS
Algorithm Generation procedure
1: for each user u ∈ U do2: for each feature i ∈ F do3: draw first-order interaction selection variable sui ∼ Bernoulli(π1)4: draw first-order interaction weight w̃i ∼ N (0, 1)5: wui = sui · w̃i
6: for each feature pair i , j ∈ F do7: draw second-order interaction selection variable suij ∼ p(suij | sui , suj)8: draw second-order interaction weight w̃ij ∼ N (0, 1)9: wuij = suij · w̃ij
10: for each feature vector x ∈ X do11: calculate the rating prediction r̂(x) by PFM12: draw r(x) ∼ p(r | r̂(x))
Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 14 / 28
![Page 23: Bayesian Personalized Feature Interaction Selection …...Factorization Machines I Linear regression: O(d) ^r(x) = b 0 + Xd i=1 w ix i I Degree-2 polynomial regression: O(d2) r^(x)](https://reader033.fdocuments.us/reader033/viewer/2022042310/5ed73071c30795314c175d0b/html5/thumbnails/23.jpg)
Generative Procedure of BP-FIS
Algorithm Generation procedure
1: for each user u ∈ U do2: for each feature i ∈ F do3: draw first-order interaction selection variable sui ∼ Bernoulli(π1)
4: draw first-order interaction weight w̃i ∼ N (0, 1)5: wui = sui · w̃i
6: for each feature pair i , j ∈ F do7: draw second-order interaction selection variable suij ∼ p(suij | sui , suj)8: draw second-order interaction weight w̃ij ∼ N (0, 1)9: wuij = suij · w̃ij
10: for each feature vector x ∈ X do11: calculate the rating prediction r̂(x) by PFM12: draw r(x) ∼ p(r | r̂(x))
Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 14 / 28
![Page 24: Bayesian Personalized Feature Interaction Selection …...Factorization Machines I Linear regression: O(d) ^r(x) = b 0 + Xd i=1 w ix i I Degree-2 polynomial regression: O(d2) r^(x)](https://reader033.fdocuments.us/reader033/viewer/2022042310/5ed73071c30795314c175d0b/html5/thumbnails/24.jpg)
Generative Procedure of BP-FIS
Algorithm Generation procedure
1: for each user u ∈ U do2: for each feature i ∈ F do3: draw first-order interaction selection variable sui ∼ Bernoulli(π1)4: draw first-order interaction weight w̃i ∼ N (0, 1)
5: wui = sui · w̃i
6: for each feature pair i , j ∈ F do7: draw second-order interaction selection variable suij ∼ p(suij | sui , suj)8: draw second-order interaction weight w̃ij ∼ N (0, 1)9: wuij = suij · w̃ij
10: for each feature vector x ∈ X do11: calculate the rating prediction r̂(x) by PFM12: draw r(x) ∼ p(r | r̂(x))
Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 14 / 28
![Page 25: Bayesian Personalized Feature Interaction Selection …...Factorization Machines I Linear regression: O(d) ^r(x) = b 0 + Xd i=1 w ix i I Degree-2 polynomial regression: O(d2) r^(x)](https://reader033.fdocuments.us/reader033/viewer/2022042310/5ed73071c30795314c175d0b/html5/thumbnails/25.jpg)
Generative Procedure of BP-FIS
Algorithm Generation procedure
1: for each user u ∈ U do2: for each feature i ∈ F do3: draw first-order interaction selection variable sui ∼ Bernoulli(π1)4: draw first-order interaction weight w̃i ∼ N (0, 1)5: wui = sui · w̃i
6: for each feature pair i , j ∈ F do7: draw second-order interaction selection variable suij ∼ p(suij | sui , suj)8: draw second-order interaction weight w̃ij ∼ N (0, 1)9: wuij = suij · w̃ij
10: for each feature vector x ∈ X do11: calculate the rating prediction r̂(x) by PFM12: draw r(x) ∼ p(r | r̂(x))
Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 14 / 28
![Page 26: Bayesian Personalized Feature Interaction Selection …...Factorization Machines I Linear regression: O(d) ^r(x) = b 0 + Xd i=1 w ix i I Degree-2 polynomial regression: O(d2) r^(x)](https://reader033.fdocuments.us/reader033/viewer/2022042310/5ed73071c30795314c175d0b/html5/thumbnails/26.jpg)
Generative Procedure of BP-FIS
Algorithm Generation procedure
1: for each user u ∈ U do2: for each feature i ∈ F do3: draw first-order interaction selection variable sui ∼ Bernoulli(π1)4: draw first-order interaction weight w̃i ∼ N (0, 1)5: wui = sui · w̃i
6: for each feature pair i , j ∈ F do7: draw second-order interaction selection variable suij ∼ p(suij | sui , suj)
8: draw second-order interaction weight w̃ij ∼ N (0, 1)9: wuij = suij · w̃ij
10: for each feature vector x ∈ X do11: calculate the rating prediction r̂(x) by PFM12: draw r(x) ∼ p(r | r̂(x))
Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 14 / 28
![Page 27: Bayesian Personalized Feature Interaction Selection …...Factorization Machines I Linear regression: O(d) ^r(x) = b 0 + Xd i=1 w ix i I Degree-2 polynomial regression: O(d2) r^(x)](https://reader033.fdocuments.us/reader033/viewer/2022042310/5ed73071c30795314c175d0b/html5/thumbnails/27.jpg)
Generative Procedure of BP-FIS
Algorithm Generation procedure
1: for each user u ∈ U do2: for each feature i ∈ F do3: draw first-order interaction selection variable sui ∼ Bernoulli(π1)4: draw first-order interaction weight w̃i ∼ N (0, 1)5: wui = sui · w̃i
6: for each feature pair i , j ∈ F do7: draw second-order interaction selection variable suij ∼ p(suij | sui , suj)8: draw second-order interaction weight w̃ij ∼ N (0, 1)
9: wuij = suij · w̃ij
10: for each feature vector x ∈ X do11: calculate the rating prediction r̂(x) by PFM12: draw r(x) ∼ p(r | r̂(x))
Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 14 / 28
![Page 28: Bayesian Personalized Feature Interaction Selection …...Factorization Machines I Linear regression: O(d) ^r(x) = b 0 + Xd i=1 w ix i I Degree-2 polynomial regression: O(d2) r^(x)](https://reader033.fdocuments.us/reader033/viewer/2022042310/5ed73071c30795314c175d0b/html5/thumbnails/28.jpg)
Generative Procedure of BP-FIS
Algorithm Generation procedure
1: for each user u ∈ U do2: for each feature i ∈ F do3: draw first-order interaction selection variable sui ∼ Bernoulli(π1)4: draw first-order interaction weight w̃i ∼ N (0, 1)5: wui = sui · w̃i
6: for each feature pair i , j ∈ F do7: draw second-order interaction selection variable suij ∼ p(suij | sui , suj)8: draw second-order interaction weight w̃ij ∼ N (0, 1)9: wuij = suij · w̃ij
10: for each feature vector x ∈ X do11: calculate the rating prediction r̂(x) by PFM12: draw r(x) ∼ p(r | r̂(x))
Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 14 / 28
![Page 29: Bayesian Personalized Feature Interaction Selection …...Factorization Machines I Linear regression: O(d) ^r(x) = b 0 + Xd i=1 w ix i I Degree-2 polynomial regression: O(d2) r^(x)](https://reader033.fdocuments.us/reader033/viewer/2022042310/5ed73071c30795314c175d0b/html5/thumbnails/29.jpg)
Generative Procedure of BP-FIS
Algorithm Generation procedure
1: for each user u ∈ U do2: for each feature i ∈ F do3: draw first-order interaction selection variable sui ∼ Bernoulli(π1)4: draw first-order interaction weight w̃i ∼ N (0, 1)5: wui = sui · w̃i
6: for each feature pair i , j ∈ F do7: draw second-order interaction selection variable suij ∼ p(suij | sui , suj)8: draw second-order interaction weight w̃ij ∼ N (0, 1)9: wuij = suij · w̃ij
10: for each feature vector x ∈ X do11: calculate the rating prediction r̂(x) by PFM
12: draw r(x) ∼ p(r | r̂(x))
Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 14 / 28
![Page 30: Bayesian Personalized Feature Interaction Selection …...Factorization Machines I Linear regression: O(d) ^r(x) = b 0 + Xd i=1 w ix i I Degree-2 polynomial regression: O(d2) r^(x)](https://reader033.fdocuments.us/reader033/viewer/2022042310/5ed73071c30795314c175d0b/html5/thumbnails/30.jpg)
Generative Procedure of BP-FIS
Algorithm Generation procedure
1: for each user u ∈ U do2: for each feature i ∈ F do3: draw first-order interaction selection variable sui ∼ Bernoulli(π1)4: draw first-order interaction weight w̃i ∼ N (0, 1)5: wui = sui · w̃i
6: for each feature pair i , j ∈ F do7: draw second-order interaction selection variable suij ∼ p(suij | sui , suj)8: draw second-order interaction weight w̃ij ∼ N (0, 1)9: wuij = suij · w̃ij
10: for each feature vector x ∈ X do11: calculate the rating prediction r̂(x) by PFM12: draw r(x) ∼ p(r | r̂(x))
Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 14 / 28
![Page 31: Bayesian Personalized Feature Interaction Selection …...Factorization Machines I Linear regression: O(d) ^r(x) = b 0 + Xd i=1 w ix i I Degree-2 polynomial regression: O(d2) r^(x)](https://reader033.fdocuments.us/reader033/viewer/2022042310/5ed73071c30795314c175d0b/html5/thumbnails/31.jpg)
IntroductionFactorization MachinesFeature Interaction Selection
Model descriptionBayesian personalized feature interaction selectionEfficient optimization
Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 15 / 28
![Page 32: Bayesian Personalized Feature Interaction Selection …...Factorization Machines I Linear regression: O(d) ^r(x) = b 0 + Xd i=1 w ix i I Degree-2 polynomial regression: O(d2) r^(x)](https://reader033.fdocuments.us/reader033/viewer/2022042310/5ed73071c30795314c175d0b/html5/thumbnails/32.jpg)
Optimization
Maximum A Posteriori: arg maxW̃ ,S p(W̃ , S | R,X )
Infeasible exact inferenceI space complexity: O(md2)
I time complexity: O(2md2)
Variational inferenceI approximate p(W̃ ,S | R,X ) by
q(W̃ ,S)I space complexity: O(md)
I Stochastic Gradient VariationalBayes (SGVB)I time complexity: O(dk), same
as FMs
Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 16 / 28
![Page 33: Bayesian Personalized Feature Interaction Selection …...Factorization Machines I Linear regression: O(d) ^r(x) = b 0 + Xd i=1 w ix i I Degree-2 polynomial regression: O(d2) r^(x)](https://reader033.fdocuments.us/reader033/viewer/2022042310/5ed73071c30795314c175d0b/html5/thumbnails/33.jpg)
Optimization
Maximum A Posteriori: arg maxW̃ ,S p(W̃ , S | R,X )
Infeasible exact inferenceI space complexity: O(md2)
I time complexity: O(2md2)
Variational inferenceI approximate p(W̃ ,S | R,X ) by
q(W̃ ,S)I space complexity: O(md)
I Stochastic Gradient VariationalBayes (SGVB)I time complexity: O(dk), same
as FMs
Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 16 / 28
![Page 34: Bayesian Personalized Feature Interaction Selection …...Factorization Machines I Linear regression: O(d) ^r(x) = b 0 + Xd i=1 w ix i I Degree-2 polynomial regression: O(d2) r^(x)](https://reader033.fdocuments.us/reader033/viewer/2022042310/5ed73071c30795314c175d0b/html5/thumbnails/34.jpg)
Optimization
Maximum A Posteriori: arg maxW̃ ,S p(W̃ , S | R,X )
Infeasible exact inferenceI space complexity: O(md2)
I time complexity: O(2md2)
Variational inferenceI approximate p(W̃ ,S | R,X ) by
q(W̃ ,S)I space complexity: O(md)
I Stochastic Gradient VariationalBayes (SGVB)I time complexity: O(dk), same
as FMs
Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 16 / 28
![Page 35: Bayesian Personalized Feature Interaction Selection …...Factorization Machines I Linear regression: O(d) ^r(x) = b 0 + Xd i=1 w ix i I Degree-2 polynomial regression: O(d2) r^(x)](https://reader033.fdocuments.us/reader033/viewer/2022042310/5ed73071c30795314c175d0b/html5/thumbnails/35.jpg)
IntroductionFactorization MachinesFeature Interaction Selection
Model descriptionBayesian personalized feature interaction selectionEfficient optimization
Experiment
Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 17 / 28
![Page 36: Bayesian Personalized Feature Interaction Selection …...Factorization Machines I Linear regression: O(d) ^r(x) = b 0 + Xd i=1 w ix i I Degree-2 polynomial regression: O(d2) r^(x)](https://reader033.fdocuments.us/reader033/viewer/2022042310/5ed73071c30795314c175d0b/html5/thumbnails/36.jpg)
Experimental Setup
DatasetsHetRec: Information Heterogeneityand Fusion in Recommender Systems
I MovieLens: rating and tagging
I LastFM: rating, tagging, socialnetworking
I Delicious: rating, tagging, socialnetworking
BaselinesI Rendle (2010): Factorization Machine
(FM)
I Sparse Factorization Machine (SFM)
I Xiao et al. (2017): AttentionalFactorization Machine (AFM)
I He and Chua (2017): NeuralFactorization Machine (NFM)
Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 18 / 28
![Page 37: Bayesian Personalized Feature Interaction Selection …...Factorization Machines I Linear regression: O(d) ^r(x) = b 0 + Xd i=1 w ix i I Degree-2 polynomial regression: O(d2) r^(x)](https://reader033.fdocuments.us/reader033/viewer/2022042310/5ed73071c30795314c175d0b/html5/thumbnails/37.jpg)
Experimental Setup
Our methodsapply BP-FIS to a linear FM and anon-linear FM
I BP-FM
I BP-NFM
EvaluationTop-N recommendation
I Leave-One-Out-Cross-Validation(LOOCV)
I ranking among 100 items
I metrics: HR@N and ARHR@N
Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 19 / 28
![Page 38: Bayesian Personalized Feature Interaction Selection …...Factorization Machines I Linear regression: O(d) ^r(x) = b 0 + Xd i=1 w ix i I Degree-2 polynomial regression: O(d2) r^(x)](https://reader033.fdocuments.us/reader033/viewer/2022042310/5ed73071c30795314c175d0b/html5/thumbnails/38.jpg)
Overall Performance
Table: Delicious
Method HR@1 HR@10 ARHR@10
FM 0.0202 0.1147 0.0440SFM 0.0229 0.1212 0.0465AFM 0.0274 0.1169 0.0494BP-FM 0.0278 0.1240** 0.0509*
NFM 0.0229 0.1065 0.0426BP-NFM 0.0268 0.1289** 0.0504**
∗ and ∗∗ indicate that the best score is signif-icantly better than the second best score withp < 0.1 and p < 0.05, respectively.
I SFM outperforms FM and AFMon HR@10: need for FIS
I BP-FM and BP-NFM significantlyoutperforms FMs and NFM,respectively: effect of P-FIS
Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 20 / 28
![Page 39: Bayesian Personalized Feature Interaction Selection …...Factorization Machines I Linear regression: O(d) ^r(x) = b 0 + Xd i=1 w ix i I Degree-2 polynomial regression: O(d2) r^(x)](https://reader033.fdocuments.us/reader033/viewer/2022042310/5ed73071c30795314c175d0b/html5/thumbnails/39.jpg)
Impact of Embedding Size
k=64 k=128 k=256
0.0
0.2
0.4
0.6
MovieLens
HR
@10 AFM
FMSFMNFMBP−FMBP−NFM
I k = 64: P-FIS has insignificant effect of FMsI k = 128, 256:
I BP-FM and BP-NFM significantly outperform FMs and NFMI BP-NFM does not outperform BP-FM
Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 21 / 28
![Page 40: Bayesian Personalized Feature Interaction Selection …...Factorization Machines I Linear regression: O(d) ^r(x) = b 0 + Xd i=1 w ix i I Degree-2 polynomial regression: O(d2) r^(x)](https://reader033.fdocuments.us/reader033/viewer/2022042310/5ed73071c30795314c175d0b/html5/thumbnails/40.jpg)
Case study
#hashtag
“action”“buddy”
“comedy”
“sequel”
“action”
“buddy”
“comedy”
“sequel”
Figure: User 1, top-1 recommendation
Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 22 / 28
![Page 41: Bayesian Personalized Feature Interaction Selection …...Factorization Machines I Linear regression: O(d) ^r(x) = b 0 + Xd i=1 w ix i I Degree-2 polynomial regression: O(d2) r^(x)](https://reader033.fdocuments.us/reader033/viewer/2022042310/5ed73071c30795314c175d0b/html5/thumbnails/41.jpg)
Case study
#hashtag
“action”“buddy”
“comedy”
“sequel”
“action”
“buddy”
“comedy”
“sequel”
Figure: User 2, top-5 recommendation
Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 23 / 28
![Page 42: Bayesian Personalized Feature Interaction Selection …...Factorization Machines I Linear regression: O(d) ^r(x) = b 0 + Xd i=1 w ix i I Degree-2 polynomial regression: O(d2) r^(x)](https://reader033.fdocuments.us/reader033/viewer/2022042310/5ed73071c30795314c175d0b/html5/thumbnails/42.jpg)
Case study
#hashtag
“action”“buddy”
“comedy”
“sequel”
“action”
“buddy”
“comedy”
“sequel”
Figure: User 3, not recommended
Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 24 / 28
![Page 43: Bayesian Personalized Feature Interaction Selection …...Factorization Machines I Linear regression: O(d) ^r(x) = b 0 + Xd i=1 w ix i I Degree-2 polynomial regression: O(d2) r^(x)](https://reader033.fdocuments.us/reader033/viewer/2022042310/5ed73071c30795314c175d0b/html5/thumbnails/43.jpg)
IntroductionFactorization MachinesFeature Interaction Selection
Model descriptionBayesian personalized feature interaction selectionEfficient optimization
Experiment
Conclusion
Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 25 / 28
![Page 44: Bayesian Personalized Feature Interaction Selection …...Factorization Machines I Linear regression: O(d) ^r(x) = b 0 + Xd i=1 w ix i I Degree-2 polynomial regression: O(d2) r^(x)](https://reader033.fdocuments.us/reader033/viewer/2022042310/5ed73071c30795314c175d0b/html5/thumbnails/44.jpg)
Conclusion
1. We study personalized feature interaction selection (P-FIS) for FactorizationMachines.
2. We propose a Bayesian personalized feature interaction selection (BP-FIS)method based on the Bayesian variable selection.I We propose hereditary spike-and-slab as priors to achieve P-FIS.I BP-FIS is a plug-and-play framework for FMs
3. We design an efficient optimization algorithm based on Stochastic GradientVariational Bayes (SGVB).
Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 26 / 28
![Page 45: Bayesian Personalized Feature Interaction Selection …...Factorization Machines I Linear regression: O(d) ^r(x) = b 0 + Xd i=1 w ix i I Degree-2 polynomial regression: O(d2) r^(x)](https://reader033.fdocuments.us/reader033/viewer/2022042310/5ed73071c30795314c175d0b/html5/thumbnails/45.jpg)
Conclusion
1. We study personalized feature interaction selection (P-FIS) for FactorizationMachines.
2. We propose a Bayesian personalized feature interaction selection (BP-FIS)method based on the Bayesian variable selection.I We propose hereditary spike-and-slab as priors to achieve P-FIS.I BP-FIS is a plug-and-play framework for FMs
3. We design an efficient optimization algorithm based on Stochastic GradientVariational Bayes (SGVB).
Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 26 / 28
![Page 46: Bayesian Personalized Feature Interaction Selection …...Factorization Machines I Linear regression: O(d) ^r(x) = b 0 + Xd i=1 w ix i I Degree-2 polynomial regression: O(d2) r^(x)](https://reader033.fdocuments.us/reader033/viewer/2022042310/5ed73071c30795314c175d0b/html5/thumbnails/46.jpg)
Conclusion
1. We study personalized feature interaction selection (P-FIS) for FactorizationMachines.
2. We propose a Bayesian personalized feature interaction selection (BP-FIS)method based on the Bayesian variable selection.I We propose hereditary spike-and-slab as priors to achieve P-FIS.I BP-FIS is a plug-and-play framework for FMs
3. We design an efficient optimization algorithm based on Stochastic GradientVariational Bayes (SGVB).
Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 26 / 28
![Page 47: Bayesian Personalized Feature Interaction Selection …...Factorization Machines I Linear regression: O(d) ^r(x) = b 0 + Xd i=1 w ix i I Degree-2 polynomial regression: O(d2) r^(x)](https://reader033.fdocuments.us/reader033/viewer/2022042310/5ed73071c30795314c175d0b/html5/thumbnails/47.jpg)
Future Work
1. Extend BP-FIS to select higher-order feature interactions
2. Consider group-level personalization via clustering to speed up training
Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 27 / 28
![Page 48: Bayesian Personalized Feature Interaction Selection …...Factorization Machines I Linear regression: O(d) ^r(x) = b 0 + Xd i=1 w ix i I Degree-2 polynomial regression: O(d2) r^(x)](https://reader033.fdocuments.us/reader033/viewer/2022042310/5ed73071c30795314c175d0b/html5/thumbnails/48.jpg)
Thank You
Source codehttps://github.com/yifanclifford/BP-FIS
ContactI Yifan Chen (https://sites.google.com/view/yifanchenuva/home)
Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 28 / 28
![Page 49: Bayesian Personalized Feature Interaction Selection …...Factorization Machines I Linear regression: O(d) ^r(x) = b 0 + Xd i=1 w ix i I Degree-2 polynomial regression: O(d2) r^(x)](https://reader033.fdocuments.us/reader033/viewer/2022042310/5ed73071c30795314c175d0b/html5/thumbnails/49.jpg)
Xiangnan He and Tat-Seng Chua. 2017. Neural Factorization Machines for SparsePredictive Analytics. In SIGIR. ACM, 355–364.
Steffen Rendle. 2010. Factorization Machines. In ICDM. IEEE, 995–1000.
Jun Xiao, Hao Ye, Xiangnan He, Hanwang Zhang, Fei Wu, and Tat-Seng Chua. 2017.Attentional Factorization Machines: Learning the Weight of Feature Interactions viaAttention Networks. In IJCAI. 3119–3125.
Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 28 / 28