Kdd15 - distributed personalization
-
Upload
xu-miao -
Category
Presentations & Public Speaking
-
view
618 -
download
0
Transcript of Kdd15 - distributed personalization
Aug 11st, 2015Xu Miao, Lijun Tang, Yitong Zhou, Joel Young LinkedInChun-te Chu, MicrosoftAnmol Bhasin Groupon
Distributed Personalization
MotivationDistributed Learning
PersonalizationExperiments
Recommendation
Recommendation
Recommendation
Common Solution
Apps Tracking ETL
DM
Delivering
Common Solution -- Cold Start
Apps Tracking ETL
DM
Delivering
minutes
hours days
Apps
seconds
seconds
seconds
Common Solution -- Warm Start
Apps Tracking ETL
DM
Delivering
minutes
hours days
seconds
seconds
seconds
seconds
seconds
seconds
Bring ML Closer to Users
Apps Tracking ETL
DM
Delivering
minutes
hours days
Distributed Online Learning
▪ Definition:– Agent presents an example – User responses with a reward r– Agent updates the model w
Distributed Online Learning
▪ Definition:– Agent presents an example – User responses with a reward r– Agent updates the model w
▪ Challenges:– Users’ feedback data too few
▪ Distributed Learning
Distributed Online Learning
▪ Definition:– Agent presents an example – User responses with a reward r– Agent updates the models
▪ Challenges:– Users’ feedback data too few
▪ Distributed Learning– Everyone has different preferences
▪ Personalization
MotivationDistributed Learning
PersonalizationExperiments
▪Bulk Synchronous Parallel (Hadoop & Spark)– ~ Thousands of interactions to converge
Distributed Gradient Descent
▪Stale Synchronous Parallel [Ho and etc. 13’]– For some users, staleness is forever
Distributed Gradient Descent
What did I do?
▪Blessing– It is one of the key reasons for PGDs to converge
fast▪Challenge
– It goes diminished, and the data comes later has smaller and smaller impact
– Restart? Residue constant? Hard to manage
Learning Rate
Alternating Direction Method of Multipliers (ADMMs)
ADMMs -- Bulk Synchronous Parallel
ADMMs -- Bulk Synchronous Parallel
ADMMs -- Asynchronous Parallel[Miao, Chu, Tang, Zhou, Young, Bhasin 15’]
timelinesV1
V1’
V1’’
t0
t1
t2
ADMMs -- Asynchronous Parallel[Miao, Chu, Tang, Zhou, Young, Bhasin 15’]
timelinesV1
V1’
V1’’
t0
t1
t2
t3
t4
V2
V1’’
ADMMs -- Asynchronous Parallel[Miao, Chu, Tang, Zhou, Young, Bhasin 15’]
Weighted Merge1
1
timelinesV1
V1’
t0
t1
t2V2 V3
t3
t4
ADMMs -- Asynchronous Parallel[Miao, Chu, Tang, Zhou, Young, Bhasin 15’]
Master Versions
timelines
ADMMs -- Asynchronous Parallel[Miao, Chu, Tang, Zhou, Young, Bhasin 15’]
▪Same convergence rate as Bulk Synchronous Parallel▪No learning rate
– Out-of-order sequences of mini-optimizations– Continuous Learning
MotivationDistributed LearningPersonalization
Experiments
Personalized Models
Personalized Models
Personalized Models
▪The personalization strength:– Allow divergence of personal models from the
consensus model– Improve relevance– Improve convergence (speed)
MotivationDistributed LearningPersonalization
Experiments
Facial Expression Recognition
Facial Expression Recognition
Facial Expression Recognition
Job Recommendation
Job Recommendation
Speed
Conclusion
▪Asynchronous ADMMs– Continuous learning
▪Personalized Models– Fits users better– Improves convergence speed
Thank You and Questions
ADMMs -- Asynchronous Parallel
▪Delay variations – Weighted Merge (v.s. Stale Synchronous Parallel)– Flexible to handle non-stationary distribution
▪Crazy active users▪Passive important users▪Spammers