An Analysis of Facebook Photo Cachingiwanicki/courses/ds/2015/... · Facebook's Photo-Serving Stack...

Post on 26-Jul-2020

0 views 0 download

Transcript of An Analysis of Facebook Photo Cachingiwanicki/courses/ds/2015/... · Facebook's Photo-Serving Stack...

An Analysis of Facebook Photo Caching

Qi Huang, Ken Birman, Robbert van Renesse, Wyatt Lloyd, Sanjeev Kumar, Harry C. Li

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.Copyright is held by the Owner/Author(s).SOSP’13, Nov. 3–6, 2013, Farmington, Pennsylvania, USA.ACM 978-1-4503-2388-8/13/11.http://dx.doi.org/10.1145/2517349.2522722

The problem● 250 billion photos on Facebook in 2013● 350 million more uploaded daily● High availability and low latency required for delivery infrastructure.

Facebook's Photo-Serving Stack

Client

Browser cache

Akamai CDN

Facebook storage and caches

Facebook's Photo-Serving Stack

Focus exclusively on Facebook stack

Client

Browser cache

Akamai CDN

Facebook storage and caches

Edge Caches (FIFO)

● tens of independent edge caches across the US

Client

Browser cache

PoP

Edge cache

Edge Caches (FIFO)

Purpose:

● reduce latency● reduce bandwidth utilization between clients and datacenters

Client

Browser cache

PoP

Edge cache

Global Origin Cache (FIFO)

● a single global cache● requests routed to specific data centers based on hash of photo id

Client

Browser cache

PoP

Edge cache

Data center

Data center

Origin cache

hash(id)

Global Origin Cache (FIFO)

Purpose:

● traffic sheltering for I/O bound backend

Client

Browser cache

PoP

Edge cache

Data center

Data center

Origin cache

hash(id)

Haystack backend

● located in the same data centers as origin servers

Client

Browser cache

PoP

Edge cache

Data center

Data center

Origin cache

Backend (Haystack)

Backend (Haystack)

hash(id)

Resizers

● photos are served in different sizes to different users● resized photos are treated as distinct objects by the cache layer

Client

Browser cache

PoP

Edge cache

Data center

Data center

Origin cache

Backend (Haystack)

Backend (Haystack)

hash(id)

R

R

Resizer

Methodology

Stack instrumentation

● desktop clients only● data for browser cache inferred by counting requests seen at browsers and

edge caches

Client

Browser cache

PoP

Edge cache

Data center

Origin cache

Backend (Haystack)R

Instrumentation scope

Sampling method● request-based sampling: collect X% of requests● object-based sampling: collect X% of objects

Sampling method● request-based sampling: collect X% of requests● object-based sampling: collect X% of objects

○ by deterministic test on photo id

Object-based Sampling● better coverage of unpopular photos● simplified cross-stack analysis

Analysis

The collected data● One-month long trace● 77M requests from 13.2M user browsers ● 1.3M unique photos (2.6M photo objects)

Traffic shelteringClient

Browser cache

PoP

Edge cache

Data center

Origin cache

Backend (Haystack)R

77.2M

65.5%

26.6M 11.2M

58% 31.8%

7.6M

Popularity Distribution

● approximately Zipfian

Traffic share by photo popularity

Geographic Traffic Distribution

Atlanta

Client To Edge Cache Traffic

20% Atlanta

5%California

5%New York

35%Washington D.C.

5%Dallas

10%Miami

Client To Edge Cache Traffic

Client To Edge Cache Traffic

● routing algorithm considers latency, edge capacity and ISP peering cost● substantial remote traffic is normal

Edge Cache to Origin Cache Traffic

Traffic flow based on content, not locality.

Cross-Region Traffic at Backend

● Caused by data migrations and hardware failures● Only 0.2% of Origin Cache to Backend traffic

Potential Improvements

Cache Simulation● replay the trace● use the first 25% to warm the cache● evaluate using the remaining 75% of the trace

Object-hit Ratios for Edge Cache

● S4LRU improves 9% at x over FIFO● Still a lot of room for improvement

S4LRU

● Four queues are maintained at levels 0 to 3

Cache space

L3

L2

L1

L0

S4LRU

● Missed objects inserted into the head of level 0 queue

Cache space

L3

L2

L1

L0

S4LRU

● On a cache hit, the item is moved to the head of the next higher queue

Cache space

L3

L2

L1

L0

S4LRU

● items evicted from the tail of a queue to the head of the next lower queue

Cache space

L3

L2

L1

L0

Byte-hit Ratios for Edge Cache

● LFU fails to reduce bandwidth utilization over the current solution

Origin cache

S4LRU improves Origin slightly more than Edge (14%)

Social-Network Analysis

Content Age Effect

Traffic share for non-profile photos by age

Traffic share for social activity groups.

Number of followers