Music Personalization At Spotify
-
Upload
vidhya-murali -
Category
Technology
-
view
351 -
download
3
Transcript of Music Personalization At Spotify
Spotify’s Big Data‣ Started in 2006, now available in 58 countries
‣ 100+ million active users, 35+ million paid subscribers
‣ 30+ million songs in our catalog, ~20K added every day
‣ 2+ billion playlists
‣ 1 TB of log data every day
‣ Hadoop cluster with ~2500 nodes
Approaches
‣Manual Curation by Experts
‣Metadata (e.g: Label Provided Data, News, Blogs)
‣Audio Signals
‣Collaborative Filtering
‣ Hybrid
Latent Factor Models“Compact” representation for each user and items(songs): f-dimensional vectors
Vidhya Rise
.. . . . .
.. . . . .
.. . . . .
.. . . . .
.. . . . .
.. .
.. .
.. .
.. . . .
.. .
.. .
.. .
.. .
. .mUsers
Songs
Latent Factor Models“Compact” representation for each user and items(songs): f-dimensional vectors
Vidhya Rise
.. . . . .
.. . . . .
.. . . . .
.. . . . .
.. . . . .
.. .
.. .
.. .
.. . . .
.. .
.. .
.. .
.. .
. .mUsers
SongsUser Vector
Matrix: X: (m x f)
Latent Factor Models“Compact” representation for each user and items(songs): f-dimensional vectors
Vidhya Rise
.. . . . .
.. . . . .
.. . . . .
.. . . . .
.. . . . .
.. .
.. .
.. .
.. . . .
.. .
.. .
.. .
.. .
. .mUsers
SongsUser Vector
Matrix: X: (m x f)Song Vector
Matrix: Y: (n x f)
Latent Factor Models“Compact” representation for each user and items(songs): f-dimensional vectors
(here, f = 2)
Vidhya Rise
.. . . . .
.. . . . .
.. . . . .
.. . . . .
.. . . . .
.. .
.. .
.. .
.. . . .
.. .
.. .
.. .
.. .
. .mUsers
SongsUser Vector
Matrix: X: (m x f)Song Vector
Matrix: Y: (n x f)
[1] http://benanne.github.io/2014/08/05/spotify-cnns.html
Deep Learning on Audio
Vectors“COMPACT” representation for users and items musical fingerprint.
Normalized Song Vectors
User Vector
Why Vectors?Encodes higher order dependencies
Users and Items in the same latent spaceUser - Item recommendationsItem - Item similarities
Easy to scale upComplexity is linear in order of latent factors
RankingSimilarity score can be used for ranking
Balance relevance, diversity, popularity, freshness
Heuristic based
RankingSimilarity score can be used for ranking
Balance relevance, diversity, popularity, freshness
Heuristic based
MAB Interactions
Impressions Clicks Streams
Challenges Unique to Spotify
Scale of catalog
Music is “niche”
Music consumption has heavy correlation to users’ context
Repeated consumption of music is NOT so uncommon.
Challenge Accepted!Cold start problem for both users and new music/upcoming artists:
Content Based Signals Real Time Recommendations
Measuring Quality:Implicit: A/B Test Metrics Explicit: Feedback from social forums
Scam Attacks:Rule based model to detect scammers
Humans choices are not always predictable: Faith in humanity
What Next?
‣Personalization!
‣Content signals such as lyrics, audio, images
‣Expanded Catalog: Shows, Podcasts
‣New Markets
21