A Hybrid Caching Strategy for Streaming Media Files

29
A Hybrid Caching Strategy for Streaming Media Files Jussara M. Almeida Derek L. Eager Mary K. Vernon University of Wisconsin-Madison University of Saskatchewan November 2001

description

A Hybrid Caching Strategy for Streaming Media Files. Jussara M. Almeida Derek L. Eager Mary K. Vernon University of Wisconsin-Madison University of Saskatchewan November 2001. Outline. Characteristics of Streaming Media (SM) files Delivery of SM files - PowerPoint PPT Presentation

Transcript of A Hybrid Caching Strategy for Streaming Media Files

Page 1: A Hybrid Caching Strategy for Streaming Media Files

A Hybrid Caching Strategy for Streaming Media Files

Jussara M. Almeida Derek L. Eager Mary K. Vernon

University of Wisconsin-MadisonUniversity of Saskatchewan

November 2001

Page 2: A Hybrid Caching Strategy for Streaming Media Files

Outline• Characteristics of Streaming Media (SM) files

• Delivery of SM files

• Hypothesis and Assumptions

• Previous Caching Policies

• New Policy Performance Comparison

• New Caching Policies

• Conclusions and Future Work

Page 3: A Hybrid Caching Strategy for Streaming Media Files

Characteristics of SM Files• Large file size

– cache on disk

• Sustained I/O bandwidth – inserting and reading new content

• Clients access partial files– initial portion– favored segment– base + variable number of layers of layered

encoding

Page 4: A Hybrid Caching Strategy for Streaming Media Files

Delivery of SM Files

• Unicast streaming:– server bandwidth is linear in client request rate– goal: maximize byte hit ratio

• Multicast streaming– save bandwidth– cost sharing introduces new tradeoffs

Page 5: A Hybrid Caching Strategy for Streaming Media Files

Multicast

0

5

10

15

20

1 10 100 1000Client Request Rate

Req

uire

d Se

rver

B

andw

idth

• example: 10 distributed proxy servers each serving a local region, 100 requests (on avg) arrive per region during a given popular videoneed 7 streams per region, or 12 streams at the remote server

Caching for Multicast Streams: Tradeoffs

Page 6: A Hybrid Caching Strategy for Streaming Media Files

Caching for Multicast Streams: Tradeoffs

• caching popular content reduces the load on the remote server and network

• delivering popular content from the remote server amortizes the cost of a stream over more clients

• earlier portions of a popular video require more bandwidth and have less cost-sharing than later portions

Page 7: A Hybrid Caching Strategy for Streaming Media Files

New Caching Policies Research

• Hypothesis: popularity-based strategy will outperform replacement-based strategy– significant fraction of requests to uncached files

may be for files that are accessed very sporadically

• Assumptions:– limited disk space implies limited disk bandwidth– proxy bandwidth for delivering cached streams is

equal to min of proxy disk bw and proxy network bw(call this proxy disk bandwidth)

Page 8: A Hybrid Caching Strategy for Streaming Media Files

Current Web Caching Policies• Replacement based (cache on each miss)• Top replacement candidate is an ad-hoc

combination of:– large files– least recently access or lower access frequency– miss penalty (server latency, bandwidth)

• Cache whole file or none• Unicast• Ignore limited disk bandwidth

Page 9: A Hybrid Caching Strategy for Streaming Media Files

• Interval Caching [DaSi93, KaRT95]• Resource Based Caching (RBC) [TVDS98]• Least Frequently Used (LFU)

• Block-based insertion and deletion [AcSm00]• Popularity-based caching for layered

encoding [RYHE00]• Prefix and Segment Caching for smoothing

[SeRT99,WZDS98]

Previous SM Caching Policies

Page 10: A Hybrid Caching Strategy for Streaming Media Files

Interval Caching• Cache smallest intervals

• Target: memory caches (lots of insertions)

File f 0 T

Time

S1S2

0 T

S1

S1S2

0 T

S1S2S3

0 T

Page 11: A Hybrid Caching Strategy for Streaming Media Files

• Cache entire files and intervals/runs

• Goal: efficiently utilize the limited resource – limited space: cache smallest space requirement– limited bandwidth: cache smallest write overhead

• Pre-allocate bandwidth to each cached entity

• Complex algorithm – Complex implementation – High time complexity

Resource Based Caching

Page 12: A Hybrid Caching Strategy for Streaming Media Files

RBC Algorithm

xixi

xi

WRW

,,

,

xi

xixi

xi

SWR

R

,

,,

,)(

Step 1: Selecting entity x {interval, run, file} of file i1) If Ubw > Uspace + Choose the entity with lowest

2) If Uspace > Ubw + Choose the entity with minimum space requirement Si,x

3) If Uspace - < Ubw < Uspace + Choose the entity with largest

Step 2: Caching decision for entity x1) If enough unallocated space and unallocated bandwidth:Cache entity x2) If enough unallocated space but bandwidth constrained:Use bandwidth goodness list to select candidates for eviction3) If enough unallocated bandwidth but space constrained:Use space goodness list to select candidates for eviction4) If both bandwidth and space constrained:

Walk on both lists: at each step, remove entity from bandwidth goodness list or from space goodness list.

Step 3: Allocate space and bandwidth for entity x

Page 13: A Hybrid Caching Strategy for Streaming Media Files

Least Frequently Used• Different implementation options:

– What to do when receive first access to an object?– How to estimate frequency?

• Version studied: Currently Most Popular (CMP)– Insert only most frequently accessed

(file or segment)– On-line popularity estimate: future research

Page 14: A Hybrid Caching Strategy for Streaming Media Files

Previous comparison : RBC vs. CMP [TVDS98]

• Fixed file access frequencies

• RBC outperforms CMP for all parameter values studied

• Limited design space– e.g.: total cache size 16GB

• Inconsistent results

Page 15: A Hybrid Caching Strategy for Streaming Media Files

New Performance Comparison

• Re-evaluate byte hit ratio of CMP and RBC– Simulation with synthetic workload– Broad design space

• New Pooled RBC

• New simple hybrid CMP/interval caching (CMP/IC) policy

Page 16: A Hybrid Caching Strategy for Streaming Media Files

System Assumptions• Arrivals: Poisson()

– extra experiments with Pareto(,k)• File access frequency: Zipf()• Perfect File popularity

– extra experiments with approximate file popularity• Uniform file size and delivery rate

– extra experiments with variable file size and delivery rate

• Load balanced across multiple disks

Page 17: A Hybrid Caching Strategy for Streaming Media Files

System Parameters• n : number of files

: Zipf parameter

• N : arrival rate (avg. number of requests per avg. file duration T)

N = T

• C : cache size (fraction of media data accessed)

Page 18: A Hybrid Caching Strategy for Streaming Media Files

• B: normalized disk bandwidth (fraction of the average number of simultaneous

streams needed to deliver data that is cached by CMP)

• B depends on N, , n, C and disk technology

• Relative performance of policies depends mainly on B

• B = 1.0 : CMP system is bandwidth balanced• B 1.0 : CMP system is bandwidth deficient• B 1.0: CMP system is bandwidth abundant

System Parameters

Page 19: A Hybrid Caching Strategy for Streaming Media Files

• Ultrastar 72ZX disk : – disk space: 116.76 hours of MPEG-1 video (73.4GB)– disk bandwidth: 108 MPEG-1 streams (22-37 MB/s )

• Assume: 100 requests / hour for cached files

• If cache contains 2-hour movies:– Need 200 streams– B = 108/200 = 0.54

• If cache contains 30-minute TV shows:– Need 50 streams for cache content– B = 108/50 = 2.16

Normalized Disk Bandwidth (B)Example

Page 20: A Hybrid Caching Strategy for Streaming Media Files

RBC vs. CMP

• CMP outperforms RBC if B 1.0• RBC slightly outperforms CMP if B 1.0 and small caches

0

0.2

0.4

0.6

0.8

1

0 0.1 0.25 0.4 0.6 0.8 1

Cache Size

Byt

e H

it R

atio RBC

CMP

CMP

RBC

B=0.75

B=1.0

0

0.2

0.4

0.6

0.8

1

0 0.1 0.25 0.4 0.6 0.8 1

Cache Size

Byt

e H

it R

atio

CMPRBC

B=2.0

N = 450, n= 100, =0

Page 21: A Hybrid Caching Strategy for Streaming Media Files

00.10.20.30.40.50.60.70.80.9

1

1 21 41 61 81

File

Frac

tion

Cac

hed

B = 0.75

00.10.20.30.40.50.60.70.80.9

1

1 21 41 61 81

File

Frac

tion

Cac

hed

B = 2.0

00.10.20.30.40.50.60.70.80.9

1

1 21 41 61 81

File

Frac

tion

Cac

hed

B = 1.0

Files Cached by RBC• Average fraction of each file cached by

RBC (N = 450, n = 100, C=0.25)

Page 22: A Hybrid Caching Strategy for Streaming Media Files

00.20.40.60.8

1

0 0.2 0.4 0.6 0.8 1Cache Size

Util

izat

ion

00.20.40.60.8

1

0 0.2 0.4 0.6 0.8 1Cache Size

Util

izat

ion CMP - BW Util.

RBC - BW Util.RBC - Space Util.RBC - Write BW

00.20.40.60.8

1

0 0.2 0.4 0.6 0.8 1Cache Size

Util

izat

ion

B = 0.75 B = 2.0B = 1.0

Space and Bandwidth Utilization

Page 23: A Hybrid Caching Strategy for Streaming Media Files

Pooled RBC

• Three improvements over RBC– simpler rule to select entity to cache– can keep cached intervals when deleting a full file– pool of pre-allocated bandwidth

• Similar complexity as RBC

Page 24: A Hybrid Caching Strategy for Streaming Media Files

Pooled RBC, RBC and LFU

0

0.2

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 1

Cache Size

Byt

e H

it R

atio

CMP / Pooled RBC

RBC

B=0.75CMP / Pooled RBC

RBC

B =1.0

0

0.2

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 1

Cache SizeB

yte

Hit

Rat

io

RBC / Pooled RBC

CMP

B = 2.0

• Pooled RBC CMP• BUT, Pooled RBC is much more complex than CMP

N = 450, n= 100, =0

Page 25: A Hybrid Caching Strategy for Streaming Media Files

Hybrid CMP/IC Policies• Do interval caching on a separate (small)

cache– Interval Cache in Main Memory:

CMP/ICmem and Pooled RBC/ICmem

– Interval Cache on Disk: CMP/ICdisk

• e.g. 5% of disk cache

Page 26: A Hybrid Caching Strategy for Streaming Media Files

0

0.2

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 1

Cache Size

Byt

e H

it R

atio

CMP/ICmemPooled RBC/ICmemCMP/ICmem

Pooled RBC/ICmem

B = 1.0

B = 0.75

0

0.2

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 1

Cache Size

Byt

e H

it R

atio

Pooled RBC/ICmem

CMP/ICmem

B = 2.0

N = 450, n= 100, =0

CMP/ICmem vs. Pooled RBC/ICmem

• Memory cache improves CMP and Pooled RBC • B 1.0 : greater improvement for CMP

Page 27: A Hybrid Caching Strategy for Streaming Media Files

0

0.2

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 1

Cache Size

Byt

e H

it ra

tio

CMP/ICdisk / CMP

Pooled RBC

B=0.75CMP/ICdisk / CMP

Pooled RBC

B=1.0

0

0.2

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 1Cache Size

Byt

e H

it R

atio

CMP

CMP/ICdisk / Pooled RBC

B = 2.0

N = 450, n= 100, =0

CMP/ICdisk vs. Pooled RBC

• CMP/ICdisk Pooled RBC CMP

Page 28: A Hybrid Caching Strategy for Streaming Media Files

Conclusions• Simple CMP

– simple to implement– performance similar to Pooled RBC, CMP/ICdisk

(static file popularities)

• Hybrid CMP/IC policy– Performance Pooled RBC– simple to implement– possibly more robust

(imperfect and dynamic popularity measures)

Page 29: A Hybrid Caching Strategy for Streaming Media Files

Future Work• Develop on-line estimate of file popularity

• Server log analysis– client behavior and workloads (NOSSDAV’01 paper)– More logs!!!!

• Caching Policies for Multicast Streams – popular file has greater cache-sharing if not cached– determine cache content that minimizes per-client cost– caching principles / on-line policy– (coming up soon)

• Prototype, experimental ( live ) workloads