Feed at LinkedIn (Quora Talk)
-
Upload
shubham-gupta -
Category
Engineering
-
view
205 -
download
0
Transcript of Feed at LinkedIn (Quora Talk)
Feed @LinkedInOverview
Ankit GuptaEngineer, LinkedIn
Shubham GuptaEngineer, LinkedIn
Vivek NelamangalaEngineer, LinkedIn
Today’s agenda
15:00 The LinkedIn Feed
15:15 Activity Store
15:20 FollowFeed
15:25 Operational Story of FollowFeed
15:35 Q&A
The LinkedIn Feed
The personalized “home page” of LinkedIn
A heterogenous list of updates produced by a
user’s network
In addition, we also show other recommendations (jobs,
articles, people) and monetize via native ads.
Mission: Give professionals the power to stay
informed to make them more productive and
successful every day.
The Feed is:
Share
Like
Sponsored Content
Comment
News recommendation
Connect
Organic updates
Recommendation
Feed Composition
Creation
● Member sharing
● Member publishing
● Video
● Editorial Tools
DiscoveryEngagement
● Feed Experience
● Follow Ecosystem
● Conversations/Comments
● Sponsored Updates
● Control of Feed
● Promotions
● Feed Relevance
● Feed delivered as Email
● Notifications
Feed is an ecosystem
Activity Data Model
● AVO triples
○ Actor: “Jeff Weiner”
○ Verb: “Shared”
○ Object: “Article”
● Visibility
○ [Public, Self, Connections, Follow]
● Domain Entity (optional)
○ Activity “Foreign Key”
○ e.g. comment-id for comment activity
1. Homepage request
2. Fetch viewer’s network
and features
4. Resolve activities
3. Call data sources
Client
Feed Mixer
(Blend across data sources)
FollowFeed
Social Graph
Service
Sponsored
Content (Ads)
Trending in
Industry
Activities
store
Activities
Database
Activities
stream
Features
store
Stats Server
Sharing Profiles
Read
Write
People You
May Know
Jobs
Recommend
ations
Activity Store
Activity Store
Data stored in an in-house distributed document database
called Espresso
Keyed by Activity ID
Unified Social Content PlatformFlexible Schema
Single distribution pipeline
FollowFeed
FollowFeed
● Term-partitioned index
○ Different from generic search indices which are partitioned
by documents
○ Partitioned by actor (ex, member:1, school:2)
● Posting list of reverse chronologically ordered list of activities
by or about an actor
m1
m2
c1
m3
m4
m5
m6
s7
s8
Storage cluster
partition 1 partition 2 partition N
Activities stream
Partitioner cluster
Query cluster
Get Updates for viewer with
network {m1, c1, m6}
{m1, c1} {m6}
Features Pipeline
FollowFeed Storage - Layers
Timeline Index
Filtering
Ranking
topK
FollowFeed
select activitieswhere (required)
actor IN <my network>TimeRange between [start, end] OR count = <C>
where (optional)(actor type | verb type | object type) in <X>Visibility is (connections-only OR followees-only OR public)
sort by time OR relevance
Why embedded?
Bring computation closer to data
Allows scoring of tens of millions of records per second
Less data transferred over the wire
Colocation of relevance features and data
Document features
RocksDB as embedded database
Open sourced by Facebook in 2014
Operational Story of FollowFeed
Traffic
Metrics
Performance testing during development
Legacy System
Log collection system
Log collector
Disk
Log processor
and replayFollowFeed
Access
logs
Live traffic
(read only)
Development process
Code
Check in
Staging
(Integration
testing)
Dark
CanaryCanary Production
Dark Canary
Request dispatcher
FollowFeed
live node
FollowFeed
dark canary
node
Request
Response
Copy of
request
metrics
No Alerts Left Behind
● Meaningful thresholds● Each alert has an action● Alerts only on external symptoms● An underlying issue triggers only a single alert
Backup
FollowFeed
Node
Backup NBackup 2Backup 1
Cron Script
HDFS
Copy to HDFS
Call to backup API
...
Restore
New FollowFeed
Node
Copy from HDFS
Backup NBackup 2Backup 1
HDFS
...
Rebalancing
Rebalance
script
Box 1
Partitions 0..19
Box 2
Partitions 20..39
Box 3
Partitions 40..59
Box 1
Partitions
0..14
Box 2
Partitions
15..29
Box 3
Partitions
30..44
Box 4
Partitions
45..59
Rebalancing
New
FollowFeed
Node 1
Backup NBackup 3Backup 1
HDFS
Extension of restore script
New
FollowFeed
Node 2
Backup 2
New
FollowFeed
Node 3
New
FollowFeed
Node N
Copy from HDFS
…
…
Questions
Appendix
Timeline storage structure
Blob 1 Blob 2 Blob 3
Timeline storage structure
Blob 1 Blob 2 Blob 3
Blob header
Update NUpdate N-
1Update N-
2……
Next blob key