An Analysis of Facebook Photo Caching

Post on 02-Jan-2016

34 views 0 download

Tags:

description

An Analysis of Facebook Photo Caching. by Huang et al., SOSP 2013. Presented by Phuong Nguyen. Some animations and figures are borrowed from the original paper and presentation. Photos on Facebook: Overview. Album. Feed. Profile. 250 billion photos, as of Sep 2013. 2. Storage Backend. - PowerPoint PPT Presentation

Transcript of An Analysis of Facebook Photo Caching

by Huang et al., SOSP 2013

An Analysis ofFacebook Photo Caching

Presented by Phuong Nguyen

Some animations and figures are borrowed from the original paper and presentation

Photos on Facebook: Overview

Profile

Feed

Album

2

250 billion photos, as of Sep 2013

Photos on Facebook: Overview

3

StorageBackend

FBCacheLayers Full-stack

Study

AkamaiCDN

FACEBOOK PHOTO CACHING: HOW IT WORKS?

4

Client-based Browser CacheClient

Browser Cache

Client

5

LocalFetch

Geo-distributed Edge Cache (FIFO)

Edge Cache

(Tens)

Browser Cache

Client PoP

(Millions)

6

Single Global Origin Cache (FIFO)

Browser Cache

Edge Cache

OriginCache

PoPClient Data Center

(Tens)(Millions) (Four)

7

Hash(url)

Haystack Backend

Backend (Haystack)

Browser Cache

Edge Cache

OriginCache

PoPClient Data Center

(Tens)(Millions) (Four)

8

FULL-STACK CACHE STUDY: DATA COLLECTION

9

• Objective: collecting a representative sample that could permits correlation of events related to the same request

Trace Collection

Instrumentation Scope

Backend (Haystack)

Browser Cache

Edge Cache

OriginCache

PoPClient Data Center

10

Sampling Strategies

• Request-based: sampling requests randomly• Bias on popular content

• Objected-based: focused on some subset of photos selected by a deterministic test on photoId• Fair coverage of unpopular photos• Cross stack analysis

11

WORKLOAD ANALYSIS

12

Analysis Objectives

• Traffic sheltering effects of caches

• Photo popularity distribution

• Geographic traffic distribution & collaborative caching

• Can we make the cache better?

• Impact of sizes & algorithm

• Could we know which photos to cache?

13

ANALYSIS:TRAFFIC SHELTERING

14

Traffic Sheltering

77.2M

26.6M11.2M

7.6M

Backend (Haystack)

Browser Cache

Edge Cache

OriginCache

PoPClient Data Center

65.5%58.0%

31.8%

R

Traffic Share

65.5% 20.0% 4.6% 9.9%

15

ANALYSIS:PHOTO POPULARITY IMPACT

16

Popularity Distribution

Skewness is reduced after layers of cache17

Popularity Impact on Caches

18

ANALYSIS:GEOGRAPHIC TRAFFIC DISTRIBUTION & COLLABORATIVE CACHING

19

Substantial Remote Traffic at Edge

20

Atlanta 20% local

Miami 35% localDallas 50% local

Chicago 60% local

LA 18% local

NYC 35% local

Substantial Remote Traffic at Edge

21

Atlanta 20% local

5% Dallas

35% D.C.

5% NYC

20% Miami

5% California

10% Chicago

• Atlanta has 80% requests served by remote Edges

Collaborative Edge

22

Impact of Using Collaborative Edge

Collaborative Edge increases hit ratio by 18%

18%

23

Collaborative

ANALYSIS:IMPACTS OF CACHE SIZE & ALGORITHM

24

Potential Improvement Study

• Methodology: cache simulation• Replay the trace (25% warm up)• Evaluate using remaining 75%

• Improvement factors:• Cache size• Caching algorithm

• Evaluation metric: hit ratio

25

Edge Cache with Different Sizes & Algorithms

Infinite Cache

26

The same hit ratio can be achieved with a smaller cache and higher-performing algorithms

Edge Cache with Different Sizes & Algorithms

Infinite Cache

27

Sophisticated algorithm can achieve better hit ratio with the same cache size

ANALYSIS:WHICH PHOTOS TO CACHE?

28

Intuitions

• Properties that intuitively associated with photo traffic: • The age of photos • The number of Facebook followers

associated with the owner

29

Content Age Affect

• Age-based cache replacement algorithm could be effective

• Fresh content is popular and tends to be effectively cached throughout the hierarchy

30

Social Affect

• The more popular photo owner is, the more likely the photo is to be accessed

• Browser caches tend to have lower hit ratios for popular users (“viral” effect)

31

DISCUSSIONS

32

Discussions

33

• Evaluation method:• Only consider desktop clients, excluding mobile

clients• Trends by mobility of users

• Sampling: object-based sampling might not represent realistic workload

• Impact of caching done by Akamai CDN• Correlating requests method is not perfect

• Latency issue• Evaluation mainly focuses on hit ratio & traffic

sheltering, not latency• Latency of collaborative caching is note evaluated

Discussions (cont.)

34

• Other potential improvements:• Improved caching algorithm taking into account

metadata of photos• Optimal placement of resizing functionality along

the stack• The use of Clairvoyant caching might be possible

based on predicting future accesses• E.g., photos from the same album, photos

appear on news feed, etc.• Solve geographical diversity by improving routing

policy (e.g., put more weight into locality aspect)

THANK YOU!

35