Social Piggybacking: Leveraging Common Friends to Generate Event Streams

24
Social Piggybacking: Leveraging Common Friends to Generate Event Streams Marco Serafini joint work with Aris Gionis, Flavio Junqueira, Vincent Leroy and Ingmar Weber

description

Presentation at the Social Networking Systems (SNS) workshop 2012, colocated with Eurosys.

Transcript of Social Piggybacking: Leveraging Common Friends to Generate Event Streams

Page 1: Social Piggybacking: Leveraging Common Friends to Generate Event Streams

Social Piggybacking: Leveraging Common Friends to

Generate Event StreamsMarco Serafini

joint work with Aris Gionis, Flavio Junqueira, Vincent Leroy and Ingmar Weber

Page 2: Social Piggybacking: Leveraging Common Friends to Generate Event Streams

Social Event StreamsBackground

Page 3: Social Piggybacking: Leveraging Common Friends to Generate Event Streams

Social event feeds

Major feature - 70% of page views on Tumblr

Page 4: Social Piggybacking: Leveraging Common Friends to Generate Event Streams

Generating event streams

front-end

user

Social networking system

data store clients

(application logic)

social graph

data stores

Two types of user actions Share an event Generate a new event stream

Page 5: Social Piggybacking: Leveraging Common Friends to Generate Event Streams

OptimizationsMaterialized views, one per user

Contain the user’s own events It can also contain events of the other users it

follows

We abstract away application-specific “relevance” filters All views contain all events stored in them All queries to a view return all events in the view

Plato 12.00 “Having the shadow of an ideal sandwich”

Hume 12.01 “I just feel a good taste in my mouth”

Kant 12.02 “Dudes, just eat it and stop blabbering”

VIEW

EVENT

Page 6: Social Piggybacking: Leveraging Common Friends to Generate Event Streams

GOAL: Optimizing throughputThroughput of event stream

Proportional to the amount of data being transferred

Partitioning social graphs is impossible (or at least, very very hard)

Existing approaches to optimize throughput Push-all Pull-all Hybrid

Page 7: Social Piggybacking: Leveraging Common Friends to Generate Event Streams

Writes to your view only

Read from all your friends’ view

Simpler, good with frequent writes

WRITE from Alice

Pull-all

A

B

C

ClientAlice

Bob

Charlie

Data stores

Alice

READ from Charlie

ClientAlice

Bob

Charlie

Data stores

Charlie

Page 8: Social Piggybacking: Leveraging Common Friends to Generate Event Streams

Push-allWrite to all your friends’ views

Read from your view only

Good with frequent reads

A

B

C

WRITEs from Aliceand Bob

Clients Alice

Bob

Charlie

Data stores READ from Charlie

ClientAlice

Bob

Charlie

Data stores

Charlie

Alice

CharlieBob

Page 9: Social Piggybacking: Leveraging Common Friends to Generate Event Streams

Hybrid [Silberstein et. al., SIGMOD 2010]

Per-edge choice between pull or push

Uses Production Rate (PR) and Consumption Rate (CR)

Minimum per-edge throughput cost

A B

If PR(A) < CR(B)

PUSHA writes onto B’s

viewCost: PR(A)

A B

If PR(A) ≥ CR(B)

PULLB reads from A’s

viewCost: CR(B)

Page 10: Social Piggybacking: Leveraging Common Friends to Generate Event Streams

Request schedule

front-end

user

Social networking system

data store clients

(application logic)

social graph

data stores

Social graph contains the Request Schedule Per-edge Push or Pull Easy to integrate in existing system

Page 11: Social Piggybacking: Leveraging Common Friends to Generate Event Streams

Social PiggybackingContribution

Page 12: Social Piggybacking: Leveraging Common Friends to Generate Event Streams

Idea: Social Piggybacking

Two friends are likely to share many common friends

Their views can be used as HUBS to prune edges

A

B

CFREE EDGE!Neither pull nor push

PUSHA writes new events

onto B’s view PULLC reads events

by B and Afrom B’s view

SOCIAL PIGGYBACKING

HUB

Page 13: Social Piggybacking: Leveraging Common Friends to Generate Event Streams

Social Dissemination Problem

Inputs Social Graph Per-node Production and consumption rates

Output: request schedule that minimizes costs Each edge needs to be covered Can be through a hub, push or pull

Requirements Bounded staleness Non-triviality

Page 14: Social Piggybacking: Leveraging Common Friends to Generate Event Streams

AnalysisAll admissible request schedule are s.t., for

each edge The edge is served directly, using a push or a

pull, or The edge is served through a hub. Any other schedule is not admissible

The Social Dissemination problem is NP-hard

Page 15: Social Piggybacking: Leveraging Common Friends to Generate Event Streams

Nosy: A Simple HeuristicNosy looks for hubgraphs

Cost with Piggybacking : PR(X) + CR(Y), cross edges free

Page 16: Social Piggybacking: Leveraging Common Friends to Generate Event Streams

X

Nosy Phase 1Add elements to X sets

For each edge (w, y) Build the largest hubgraph (X, w, y) Piggybacking cost: PR(X) + CR(y) Cross edges X -> y are free Piggyback if cheaper than hybrid

w y

X

Page 17: Social Piggybacking: Leveraging Common Friends to Generate Event Streams

X

Nosy Phase 2Add elements to Y sets

For each (w, y) Let Xy be producers of y that

push to w already Piggybacking cost: CR(y) Cross edges Xy -> y are free

Piggyback if cheaper than hybrid

X w y

X Y

Page 18: Social Piggybacking: Leveraging Common Friends to Generate Event Streams

ExperimentsFlickr and Twitter graphs

Page 19: Social Piggybacking: Leveraging Common Friends to Generate Event Streams

ExperimentsTwitter (Aug 2009) and Flickr (Apr 2008) social

graphs

Samples using random walks, which preserve graph properties

Average sizes Flickr: 4 k nodes, 112 k edges Twitter: 25 k nodes, 158 k edges

Production and consumption rates are generated write:read ratio is 1:5 PR (resp. CR) increases logarithmically with out-

degree (resp. in-degree)

Page 20: Social Piggybacking: Leveraging Common Friends to Generate Event Streams

Metrics and ResultsMetric

Improvement over hybrid optimization (baseline) Gain(A) = Cost(BASE) / Cost(A) – 1

Results1. Nosy exploits the community structure

2. It works well under a variety of parameters

Page 21: Social Piggybacking: Leveraging Common Friends to Generate Event Streams

Clustering CoefficientAfter sampling, we keep only a fraction s of

edges

B+ is a trivial extension of Baseline Lock push edges Pull edges that can be served using hubs are free

More clustering, more gain for Nosy but not for B+

Page 22: Social Piggybacking: Leveraging Common Friends to Generate Event Streams

Varied WorkloadSignificant gains

Asymptotically, i.e. with all reads, the per-edge push-based solution is optimal so the gain tends to zero

Page 23: Social Piggybacking: Leveraging Common Friends to Generate Event Streams

Effect of ColocationAs the system size grows, the gains reach their

maximum

For very small systems there is little communication so little room for improvements

Page 24: Social Piggybacking: Leveraging Common Friends to Generate Event Streams

ConclusionsSocial Piggybacking is a very promising

approach Baseline has up to 2.4 times higher throughput

cost Easy to integrate in existing systems

Next steps Run on full social graphs Evaluate throughput gain on actual social

networking system