The Hidden Pub/Sub of Spotify
Transcript of The Hidden Pub/Sub of Spotify
The Hidden Pub/Sub of Spotify
Vinay Setty1, Gunnar Kreitz2,3, Roman Vitenberg1, Maarten van Steen 4, Guido Urdaneta2, Staffan Gimåker2
1 2 3 4
The Hidden Pub/Sub of Spotify
Vinay Setty1, Gunnar Kreitz2,3, Roman Vitenberg1, Maarten van Steen 4, Guido Urdaneta2, Staffan Gimåker2
1 2 3 4
What is Spotify?
!
• Over 20 million users
• Fast streaming
• Legal
• Social Interaction
�3
• On-demand peer-assisted music streaming
• Large catalogue, over 20 million tracks
• Available in US and 55 other countries worldwide
This image is from the Wikimedia Commons (CC BY-SA 3.0)
�4
Online feed from friends and artists
�5
Offline feed from friends and artistsOffline feed from friends and artists
listened to track, playlist activity
Spotify Social Interaction
�6
Music Playlist
Artists Facebook Friend
Spotify Friend
Spotify user follow
listened/starred
track
followAlbum released
follow
Playlist u
pdated
Playlist cre
ated/
updated
followFriend joined Spotifylistened to track
Social
TopicTopic
TopicTopicSubscrib
erPub/Sub for Social Interaction!
✔
Spotify Pub/Sub
• In 2013 more than 20% of active Spotify users were using pub/sub
• More than 1 TB of pub/sub data is sent/received every day
• More than a dozen engineers working full time maintain and improve
• Design decisions change over time
�7
Contributions• A case-study of Pub/Sub for social
interaction
• Spotify Pub/Sub architecture overview
• Analysis of real-world Pub/Sub workload
• Collected traces from production system
• Subscription workload distributions
• Publication event rate distribution
• Pub/Sub traffic analysis
�8
Design Challenges for Spotify Pub/Sub
• Billions of notifications every day
• Millions of users to be served at any time
• Distributed across 3 data centers (sites)
• Different notification types
• Online feed
• Persisted and Offline feed
• Synchronization across devices
�9
Overview of Pub/Sub Architecture
Access Points
Subscription dataPublication events
Client Client...
Publishers
Subscribers
Notification Module
Internet
Spotify Backend
�10
Database
Pub/Sub Engine
Notification Types
Publisher Service Notification Type
Pub/Sub Module
Presence Service friend-feed Pub/Sub Engine
Playlist Servicefriend-feed, In-client, push and
Pub/Sub Engine and Notification module
Artist ServiceIn-client, push
and EmailNotification module
Social ServiceIn-client, push
and EmailNotification module
���11
Detailed Pub/Sub Architecture
Access Points
Notification Service
Rule Engine
Cassandra Cluster
Subscription dataPublication events Client Client...
Publishers
Subscribers
events
notifications
timestamps
Notification Module
Pub/Sub Engine
Internet
Spotify Backend
�12
Database
Notification Module
• Determines notification type using Rule Engine
• Notification types
• In-client (for desktop clients)
• Push notifications (for mobile clients)
• Batch e-mail for offline users
• Notification persistence
�13
Access Points
Notification Service
Rule Engine
Cassandra Cluster
Subscription dataPublication events Client
Artist Monitoring Service
Publishers
Subscribers
events
notifications
timestamps
Notification Module
Pub/Sub Engine
Internet
Spotify Backend
Pull Request
�14
Database
✗
Client goes offline
Follow Paul McCartney
Paul McCartney released an album
Paul McCartney released an album
Client comes back online
Client is offline
Paul McCartney released an album
Paul McCartney released an album
Paul McCartney released an albumFollow Paul McCartney
Paul McCartney released an album
Offline Event Retrieval
Pull Request
Pull Request
Pub/Sub Engine
• Aggregators aggregate subscriptions and distribute publications
• Ring of Pub/Sub brokers • Manage subscriptions (subscription and unsubscription)
• Match Publications
• Forwarding matched publications to aggregators
• Cross-site forwarding
• Load balancing
�15
Pub/Sub Engine
AP1
Aggregator
A B
AP2
Client Client Client Client......
Aggregator
Subscription data Publication events
�16
...
...
Aggregators: 1. One-to-one per AP 2. Aggregate subscriptions 3. Distribute publications
Follow Paul M
cCartney
Follo
w P
aul
McC
artn
ey
Follow Paul
McC
artney
Spotify Backend
Broker-overlay Links
Pub/Sub Engine
Publishers
Follow Paul McCartney
Pub/Sub brokers: 1. Manage subscriptions 2. Match and forward publications 3. Organized as DHT to partition subscriptions
Connecting Users Across Sites in Real-Time
�17
London (UK)
Ashburn (USA)Stockholm
(Sweden)
Broker-DHT LinksSubscription data Publication events
Follow Paul McCartney Follow Paul McCartney
Follow Paul M
cCartney
Paul M
cCar
tney
create
d a pl
aylist
Paul
McC
artney
created a playlist
Paul McCartney
created a playlist
Paul M
cCart
ney
listen
ed to
“Hey
Jude”
Paul McCartney
listened to “Hey Jude”
Paul McCartney
listened to “Hey Jude”
A B
C
A
C
B
A
C
B
User connected to Stockholm site
Paul McCartney connected to London
Site
Cross-Site Replication: 1. Subscriptions replicated in each site 2. One-to-one corresponding brokers in each site 3. Matching publications forwarded across sites
Paul McCartney connected to Ashburn Site
✔
Workload Analysis
• Traces from production system
• Mostly collected at Stockholm site
• From Thursday, 10 Jan 2013 to Saturday, 19 Jan 2013
• Study of subscription and publication workload
• Pub/Sub traffic trends and analysis
�18
1e-05
0.0001
0.001
0.01
0.1
1
10
100
1e-05 0.0001 0.001 0.01 0.1 1 10
% T
opic
s or
% S
ubsc
ribers
Topic Popularity or Subscription Size
CCDF for Topic Popularity or Subscription Size
Topic Popularity (CCDF)Subscription Size (CCDF)
Topic & Subscription Distributions
• Topic-Popularity: % of total #subscribers subscribing to a topic
• Subscription Size: % of topics subscribed by a subscriber
• Power-law like distribution (visually from log-log scale plot)
• Similar to degree distribution in a Twitter social graph
• Most topics have very few subscribers
• Most subscribers are interested in very few topics
99% topics have < 0.001% of total #subscribers
99% users subscribe to < 0.001% of total #topics
CCDF = Complimentary Cumulative Distribution Function
���19
1e-05
0.0001
0.001
0.01
0.1
1
10
100
1e-07 1e-06 1e-05 0.0001 0.001 0.01
% T
opic
s% of total publication event rate
CCDF for Publication Event Rate
Publication Event Rate Distribution
�20
• Publications generated per topic per day
• Not a Power-law like distribution
• Most topics generate very low event rate
99% topics generate less than 0.001% of the total #publications
0.0001
0.001
0.01
0.1
1
10
100
1e-07 1e-06 1e-05 0.0001 0.001 0.01 0.1 1
% S
ubsc
ribers
Notification Rate (NR) per Subscriber
CCDF for Notification Rate per Subscriber
Notification Rate (NR) per Subscriber
�21
• Defined as percentage of daily publications attracted by a subscriber
• Not Power-law like distribution
• Varies across subscribers from 1% to as low as 10−7%
• Most subscribers have very low notification rate
90% users attract less than 0.001% of the total #publications
Correlation b/w Topic Popularity & Event Rate
�22
NR per subscriber is linearly proportional to Subscription Size
Correlation b/w Subscription Size & NR
�23
NR per subscriber is linearly proportional to Subscription Size
Publication Traffic
�24
• Daily periodic pattern of publication traffic
• Maximum around 6 pm - 7 pm
• Minimum around 2 am
• Complements the design
• Online traffic is the most dominating
• Offline traffic is the least dominating 0.01
0.1
1
10
100
Thu - 0:00
Fri - 0:00
Sat - 0:00
Sun - 0:00
Mon - 0:00
Tue - 0:00
Wed - 0:00
Thu - 0:00
Fri - 0:00
Sat - 0:00%
of
da
ily p
ub
lica
tion
tra
ffic
UTC Time(Day - Hour)
Publication traffic per-publisher
Total trafficMusic Playback Traffic (Online) Playlist Update Traffic (Online)
Notifications Module Traffic (Persisted/Offline)
Cross-Site Traffic
�25
• Publication traffic within the sites is dominating
• Cross-site traffic is order of magnitude lower
• Confirms the scalability of cross-site forwarding design
0
1
2
3
4
5
6
7
8
9
10
Thu - 0:00
Fri - 0:00
Sat - 0:00
Sun - 0:00
Mon - 0:00
Tue - 0:00
Wed - 0:00
Thu - 0:00
Fri - 0:00
Sat - 0:00%
of
da
ily p
ub
lica
tion
tra
ffic
UTC Time(Day - Hour)
Publication traffic within the sites vs across the sites
Traffic within a siteTraffic across sites
Online Subscription Traffic
�26
• Client login/logout result in subscriptions and unsubscriptions
• Exhibits a daily periodic pattern
• Unsubscription traffic follows the same pattern as subscription traffic
• Short-lived subscriptions: Approximately 2 hour mean subscription validity
0.0006
0.0008
0.001
0.0012
0.0014
0.0016
0.0018
0.002
0.0022
Thu - 0:00
Fri - 0:00
Sat - 0:00
Sun - 0:00
Mon - 0:00
Tue - 0:00
Wed - 0:00
Thu - 0:00
Fri - 0:00
Sat - 0:00%
of
da
ily s
ub
scrip
tion
s
UTC Time(Day - Hour)
Subscription and unsubscription traffic
Subscription rateUnsubscription rate
Total #Subscriptions
�27
• Exhibits Daily pattern
• Subscription count at any point is dominated by the Presence service
• Playlists and Notifications have significantly low #subscriptions
0.0001
0.001
0.01
0.1
1
Thu - 0:00
Fri - 0:00
Sat - 0:00
Sun - 0:00
Mon - 0:00
Tue - 0:00
Wed - 0:00
Thu - 0:00
Fri - 0:00
Sat - 0:00
% o
f daily
subsc
riptio
ns
(log s
cale
)
UTC Time(Day - Hour)
Pattern of percentage of total #subscriptions
Total subscriptionsPresence subscriptions
Playlist subscriptionsNotifications subscription
UTC-Time (Day-Hour)
Conclusions• Pub/Sub used at Spotify for social
interaction among Spotify users
• Hybrid architecture to support online and offline notifications
• Workload similar to that of a Twitter social graph
• Daily periodic patterns in pub/sub traffic
• Design complements the workload & traffic
�28