A Hybrid Caching Strategy for Streaming Media Files

Post on 22-Feb-2016

59 views 6 download

description

A Hybrid Caching Strategy for Streaming Media Files. Jussara M. Almeida Derek L. Eager Mary K. Vernon University of Wisconsin-Madison University of Saskatchewan November 2001. Outline. Characteristics of Streaming Media (SM) files Delivery of SM files - PowerPoint PPT Presentation

Transcript of A Hybrid Caching Strategy for Streaming Media Files

A Hybrid Caching Strategy for Streaming Media Files

Jussara M. Almeida Derek L. Eager Mary K. Vernon

University of Wisconsin-MadisonUniversity of Saskatchewan

November 2001

Outline• Characteristics of Streaming Media (SM) files

• Delivery of SM files

• Hypothesis and Assumptions

• Previous Caching Policies

• New Policy Performance Comparison

• New Caching Policies

• Conclusions and Future Work

Characteristics of SM Files• Large file size

– cache on disk

• Sustained I/O bandwidth – inserting and reading new content

• Clients access partial files– initial portion– favored segment– base + variable number of layers of layered

encoding

Delivery of SM Files

• Unicast streaming:– server bandwidth is linear in client request rate– goal: maximize byte hit ratio

• Multicast streaming– save bandwidth– cost sharing introduces new tradeoffs

Multicast

0

5

10

15

20

1 10 100 1000Client Request Rate

Req

uire

d Se

rver

B

andw

idth

• example: 10 distributed proxy servers each serving a local region, 100 requests (on avg) arrive per region during a given popular videoneed 7 streams per region, or 12 streams at the remote server

Caching for Multicast Streams: Tradeoffs

Caching for Multicast Streams: Tradeoffs

• caching popular content reduces the load on the remote server and network

• delivering popular content from the remote server amortizes the cost of a stream over more clients

• earlier portions of a popular video require more bandwidth and have less cost-sharing than later portions

New Caching Policies Research

• Hypothesis: popularity-based strategy will outperform replacement-based strategy– significant fraction of requests to uncached files

may be for files that are accessed very sporadically

• Assumptions:– limited disk space implies limited disk bandwidth– proxy bandwidth for delivering cached streams is

equal to min of proxy disk bw and proxy network bw(call this proxy disk bandwidth)

Current Web Caching Policies• Replacement based (cache on each miss)• Top replacement candidate is an ad-hoc

combination of:– large files– least recently access or lower access frequency– miss penalty (server latency, bandwidth)

• Cache whole file or none• Unicast• Ignore limited disk bandwidth

• Interval Caching [DaSi93, KaRT95]• Resource Based Caching (RBC) [TVDS98]• Least Frequently Used (LFU)

• Block-based insertion and deletion [AcSm00]• Popularity-based caching for layered

encoding [RYHE00]• Prefix and Segment Caching for smoothing

[SeRT99,WZDS98]

Previous SM Caching Policies

Interval Caching• Cache smallest intervals

• Target: memory caches (lots of insertions)

File f 0 T

Time

S1S2

0 T

S1

S1S2

0 T

S1S2S3

0 T

• Cache entire files and intervals/runs

• Goal: efficiently utilize the limited resource – limited space: cache smallest space requirement– limited bandwidth: cache smallest write overhead

• Pre-allocate bandwidth to each cached entity

• Complex algorithm – Complex implementation – High time complexity

Resource Based Caching

RBC Algorithm

xixi

xi

WRW

,,

,

xi

xixi

xi

SWR

R

,

,,

,)(

Step 1: Selecting entity x {interval, run, file} of file i1) If Ubw > Uspace + Choose the entity with lowest

2) If Uspace > Ubw + Choose the entity with minimum space requirement Si,x

3) If Uspace - < Ubw < Uspace + Choose the entity with largest

Step 2: Caching decision for entity x1) If enough unallocated space and unallocated bandwidth:Cache entity x2) If enough unallocated space but bandwidth constrained:Use bandwidth goodness list to select candidates for eviction3) If enough unallocated bandwidth but space constrained:Use space goodness list to select candidates for eviction4) If both bandwidth and space constrained:

Walk on both lists: at each step, remove entity from bandwidth goodness list or from space goodness list.

Step 3: Allocate space and bandwidth for entity x

Least Frequently Used• Different implementation options:

– What to do when receive first access to an object?– How to estimate frequency?

• Version studied: Currently Most Popular (CMP)– Insert only most frequently accessed

(file or segment)– On-line popularity estimate: future research

Previous comparison : RBC vs. CMP [TVDS98]

• Fixed file access frequencies

• RBC outperforms CMP for all parameter values studied

• Limited design space– e.g.: total cache size 16GB

• Inconsistent results

New Performance Comparison

• Re-evaluate byte hit ratio of CMP and RBC– Simulation with synthetic workload– Broad design space

• New Pooled RBC

• New simple hybrid CMP/interval caching (CMP/IC) policy

System Assumptions• Arrivals: Poisson()

– extra experiments with Pareto(,k)• File access frequency: Zipf()• Perfect File popularity

– extra experiments with approximate file popularity• Uniform file size and delivery rate

– extra experiments with variable file size and delivery rate

• Load balanced across multiple disks

System Parameters• n : number of files

: Zipf parameter

• N : arrival rate (avg. number of requests per avg. file duration T)

N = T

• C : cache size (fraction of media data accessed)

• B: normalized disk bandwidth (fraction of the average number of simultaneous

streams needed to deliver data that is cached by CMP)

• B depends on N, , n, C and disk technology

• Relative performance of policies depends mainly on B

• B = 1.0 : CMP system is bandwidth balanced• B 1.0 : CMP system is bandwidth deficient• B 1.0: CMP system is bandwidth abundant

System Parameters

• Ultrastar 72ZX disk : – disk space: 116.76 hours of MPEG-1 video (73.4GB)– disk bandwidth: 108 MPEG-1 streams (22-37 MB/s )

• Assume: 100 requests / hour for cached files

• If cache contains 2-hour movies:– Need 200 streams– B = 108/200 = 0.54

• If cache contains 30-minute TV shows:– Need 50 streams for cache content– B = 108/50 = 2.16

Normalized Disk Bandwidth (B)Example

RBC vs. CMP

• CMP outperforms RBC if B 1.0• RBC slightly outperforms CMP if B 1.0 and small caches

0

0.2

0.4

0.6

0.8

1

0 0.1 0.25 0.4 0.6 0.8 1

Cache Size

Byt

e H

it R

atio RBC

CMP

CMP

RBC

B=0.75

B=1.0

0

0.2

0.4

0.6

0.8

1

0 0.1 0.25 0.4 0.6 0.8 1

Cache Size

Byt

e H

it R

atio

CMPRBC

B=2.0

N = 450, n= 100, =0

00.10.20.30.40.50.60.70.80.9

1

1 21 41 61 81

File

Frac

tion

Cac

hed

B = 0.75

00.10.20.30.40.50.60.70.80.9

1

1 21 41 61 81

File

Frac

tion

Cac

hed

B = 2.0

00.10.20.30.40.50.60.70.80.9

1

1 21 41 61 81

File

Frac

tion

Cac

hed

B = 1.0

Files Cached by RBC• Average fraction of each file cached by

RBC (N = 450, n = 100, C=0.25)

00.20.40.60.8

1

0 0.2 0.4 0.6 0.8 1Cache Size

Util

izat

ion

00.20.40.60.8

1

0 0.2 0.4 0.6 0.8 1Cache Size

Util

izat

ion CMP - BW Util.

RBC - BW Util.RBC - Space Util.RBC - Write BW

00.20.40.60.8

1

0 0.2 0.4 0.6 0.8 1Cache Size

Util

izat

ion

B = 0.75 B = 2.0B = 1.0

Space and Bandwidth Utilization

Pooled RBC

• Three improvements over RBC– simpler rule to select entity to cache– can keep cached intervals when deleting a full file– pool of pre-allocated bandwidth

• Similar complexity as RBC

Pooled RBC, RBC and LFU

0

0.2

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 1

Cache Size

Byt

e H

it R

atio

CMP / Pooled RBC

RBC

B=0.75CMP / Pooled RBC

RBC

B =1.0

0

0.2

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 1

Cache SizeB

yte

Hit

Rat

io

RBC / Pooled RBC

CMP

B = 2.0

• Pooled RBC CMP• BUT, Pooled RBC is much more complex than CMP

N = 450, n= 100, =0

Hybrid CMP/IC Policies• Do interval caching on a separate (small)

cache– Interval Cache in Main Memory:

CMP/ICmem and Pooled RBC/ICmem

– Interval Cache on Disk: CMP/ICdisk

• e.g. 5% of disk cache

0

0.2

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 1

Cache Size

Byt

e H

it R

atio

CMP/ICmemPooled RBC/ICmemCMP/ICmem

Pooled RBC/ICmem

B = 1.0

B = 0.75

0

0.2

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 1

Cache Size

Byt

e H

it R

atio

Pooled RBC/ICmem

CMP/ICmem

B = 2.0

N = 450, n= 100, =0

CMP/ICmem vs. Pooled RBC/ICmem

• Memory cache improves CMP and Pooled RBC • B 1.0 : greater improvement for CMP

0

0.2

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 1

Cache Size

Byt

e H

it ra

tio

CMP/ICdisk / CMP

Pooled RBC

B=0.75CMP/ICdisk / CMP

Pooled RBC

B=1.0

0

0.2

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 1Cache Size

Byt

e H

it R

atio

CMP

CMP/ICdisk / Pooled RBC

B = 2.0

N = 450, n= 100, =0

CMP/ICdisk vs. Pooled RBC

• CMP/ICdisk Pooled RBC CMP

Conclusions• Simple CMP

– simple to implement– performance similar to Pooled RBC, CMP/ICdisk

(static file popularities)

• Hybrid CMP/IC policy– Performance Pooled RBC– simple to implement– possibly more robust

(imperfect and dynamic popularity measures)

Future Work• Develop on-line estimate of file popularity

• Server log analysis– client behavior and workloads (NOSSDAV’01 paper)– More logs!!!!

• Caching Policies for Multicast Streams – popular file has greater cache-sharing if not cached– determine cache content that minimizes per-client cost– caching principles / on-line policy– (coming up soon)

• Prototype, experimental ( live ) workloads