[IEEE 2009 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB) -...

10
mm09-01 1 Abstract—In an IPTV network, Video on Demand and other video services generate a large amount of unicast traffic from the Video Hub Office (VHO) to subscribers and, therefore, require additional bandwidth and equipment resources in the network. To reduce this traffic and overall network cost, a portion of the video content (the most popular titles) may be stored in caches closer to subscribers, e.g., in a Digital Subscriber Line Access Multiplexer (DSLAM), a Central Office (CO), or in an Intermediate Office (IO). The problem is where and how much cache memory should be allocated in order to achieve maximum cost effectiveness. In this paper, we consider two approaches to solve this problem and analyze the factors that affect this solution. The analysis shows that hierarchical distributed caching can save significant network cost. Index Terms— Cache memories, Modeling, Optimization methods, Video on demand I. INTRODUCTION HE growth of video on the Internet has been extraordinary by any measure; forecasts of video traffic suggest we are entering the Exabyte/month (10 to the 18 th power) era. This trend is expected to continue as content providers push higher quality content, and cost effective download models emerge from Xbox, Tivo, Apple and many others. Network operators are starting to feel the strain on network resources, in particular their transit links. Comcast’s infamous class action lawsuit for P2P shaping, and more recently, its imposition of limits on how much each subscriber can download per month, is evidence of the network operator’s unhappiness. While the ISPs’ operational expenses are rising due to increased online video traffic, no suitable compensation mechanism is in place to help offset it. We suggest that the solution for over-burdened ISPs and high delivery costs of content lies in deploying highly decentralized content delivery networks or content distribution networks (CDNs). Today CDNs are typically limited to the edge of the network, or at the ingress of an ISP. We will Manuscript received April 7, 2009. Lev B. Sofman is with Bell Labs, Alcatel-Lucent, Plano, TX 75075 USA (corresponding author, phone: 972-477-2835; fax: 972-477-2460; e-mail: Lev.Sofman@ alcatel-lucent.com). Bill Krogfoss is with Bell Labs, Alcatel-Lucent, Plano, TX 75075 USA (e- mail Bill.Krogfoss @ alcatel-lucent.com). Anshul Agrawal is with Bell Labs, Alcatel-Lucent, Plano, TX 75075 USA (e-mail Anshul.Agrawal @ alcatel-lucent.com). show, based on current video related market trends, that creating decentralized hierarchical CDNs will provide significant cost savings. Traditional CDNs leverage 1000s of edge servers, private backbone facilities and peering connections to T1 ISPs. Akamai claims that their customers are only one hop from 90% of the Internet subscribers globally. In reality, most CDNs are limited to the ingress of the ISP (network edge) so that CDNs are at best one hop from any ISP. In larger metropolitan areas, ISPs will have multiple levels or network hops between the subscriber and the egress to the Internet. This is the area for optimization that we will investigate. In the following, we apply the concept of CDN to Internet Protocol television (IPTV) networks and consider a typical IPTV architecture with a hierarchical (tree) topology. In this network architecture, the following layers are present in the tree topology: - Several subscribers are connected to Digital Subscriber Line Access Multiplexers (DSLAM); - Several DSLAMs are connected to a Central Office (CO); - Several COs are connected to an Intermediate Office (IO); - Several IOs are connected to a Video Hub Office (VHO). In an IPTV network, Video on Demand (VoD) and other video services generate a large amount of unicast traffic from the Video Head Office (VHO) to subscribers and, therefore, require additional BW/equipment resources in the network. To reduce this traffic (and overall network cost), part of the video content (most popular titles) may be stored in caches closer to subscribers (e.g., in DSLAMs, COs, and/or in IOs – see Fig. 1). The problem is how to determine the optimal size and locations of the cache memory in IPTV networks, and which titles and services should be cached at which locations in order to achieve the maximum cost effectiveness. Hierarchical caching architectures that include DSLAMs, COs, IOs, and VHO, as well as optimization strategies for IPTV networks have been studied in [1]. Reviews of caching techniques for multimedia services can be found in a number of publications [2-6]. In particular, various caching techniques for different hit ratio metrics are discussed in [2]. An analytical model for hierarchical cache optimization in an IPTV network is described in [7]. The impact of evolving content popularities on caching has been studied in [8] and [9]. An algorithm that optimally partitions a cache between Hierarchical Cache Optimization in IPTV Networks Bill Krogfoss, Lev B. Sofman, and Anshul Agrawal T

Transcript of [IEEE 2009 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB) -...

Page 1: [IEEE 2009 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB) - Bilbao, Spain (2009.05.13-2009.05.15)] 2009 IEEE International Symposium on Broadband

mm09-01 1

Abstract—In an IPTV network, Video on Demand and other

video services generate a large amount of unicast traffic from the Video Hub Office (VHO) to subscribers and, therefore, require additional bandwidth and equipment resources in the network. To reduce this traffic and overall network cost, a portion of the video content (the most popular titles) may be stored in caches closer to subscribers, e.g., in a Digital Subscriber Line Access Multiplexer (DSLAM), a Central Office (CO), or in an Intermediate Office (IO). The problem is where and how much cache memory should be allocated in order to achieve maximum cost effectiveness. In this paper, we consider two approaches to solve this problem and analyze the factors that affect this solution. The analysis shows that hierarchical distributed caching can save significant network cost.

Index Terms— Cache memories, Modeling, Optimization methods, Video on demand

I. INTRODUCTION

HE growth of video on the Internet has been extraordinary by any measure; forecasts of video traffic suggest we are

entering the Exabyte/month (10 to the 18th power) era. This trend is expected to continue as content providers push higher quality content, and cost effective download models emerge from Xbox, Tivo, Apple and many others. Network operators are starting to feel the strain on network resources, in particular their transit links. Comcast’s infamous class action lawsuit for P2P shaping, and more recently, its imposition of limits on how much each subscriber can download per month, is evidence of the network operator’s unhappiness. While the ISPs’ operational expenses are rising due to increased online video traffic, no suitable compensation mechanism is in place to help offset it.

We suggest that the solution for over-burdened ISPs and high delivery costs of content lies in deploying highly decentralized content delivery networks or content distribution networks (CDNs). Today CDNs are typically limited to the edge of the network, or at the ingress of an ISP. We will

Manuscript received April 7, 2009. Lev B. Sofman is with Bell Labs, Alcatel-Lucent, Plano, TX 75075 USA

(corresponding author, phone: 972-477-2835; fax: 972-477-2460; e-mail: Lev.Sofman@ alcatel-lucent.com).

Bill Krogfoss is with Bell Labs, Alcatel-Lucent, Plano, TX 75075 USA (e-mail Bill.Krogfoss @ alcatel-lucent.com).

Anshul Agrawal is with Bell Labs, Alcatel-Lucent, Plano, TX 75075 USA (e-mail Anshul.Agrawal @ alcatel-lucent.com).

show, based on current video related market trends, that creating decentralized hierarchical CDNs will provide significant cost savings. Traditional CDNs leverage 1000s of edge servers, private backbone facilities and peering connections to T1 ISPs. Akamai claims that their customers are only one hop from 90% of the Internet subscribers globally. In reality, most CDNs are limited to the ingress of the ISP (network edge) so that CDNs are at best one hop from any ISP. In larger metropolitan areas, ISPs will have multiple levels or network hops between the subscriber and the egress to the Internet. This is the area for optimization that we will investigate.

In the following, we apply the concept of CDN to Internet Protocol television (IPTV) networks and consider a typical IPTV architecture with a hierarchical (tree) topology. In this network architecture, the following layers are present in the tree topology:

- Several subscribers are connected to Digital Subscriber Line Access Multiplexers (DSLAM);

- Several DSLAMs are connected to a Central Office (CO); - Several COs are connected to an Intermediate Office (IO); - Several IOs are connected to a Video Hub Office (VHO). In an IPTV network, Video on Demand (VoD) and other

video services generate a large amount of unicast traffic from the Video Head Office (VHO) to subscribers and, therefore, require additional BW/equipment resources in the network. To reduce this traffic (and overall network cost), part of the video content (most popular titles) may be stored in caches closer to subscribers (e.g., in DSLAMs, COs, and/or in IOs – see Fig. 1). The problem is how to determine the optimal size and locations of the cache memory in IPTV networks, and which titles and services should be cached at which locations in order to achieve the maximum cost effectiveness.

Hierarchical caching architectures that include DSLAMs, COs, IOs, and VHO, as well as optimization strategies for IPTV networks have been studied in [1]. Reviews of caching techniques for multimedia services can be found in a number of publications [2-6]. In particular, various caching techniques for different hit ratio metrics are discussed in [2]. An analytical model for hierarchical cache optimization in an IPTV network is described in [7]. The impact of evolving content popularities on caching has been studied in [8] and [9]. An algorithm that optimally partitions a cache between

Hierarchical Cache Optimization in IPTV Networks

Bill Krogfoss, Lev B. Sofman, and Anshul Agrawal

T

Page 2: [IEEE 2009 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB) - Bilbao, Spain (2009.05.13-2009.05.15)] 2009 IEEE International Symposium on Broadband

mm09-01 2

Fig. 1. Hierarchical caching in IPTV network

several video services with different traffic characteristics and content sizes is described in [10].

In Section II, the relationship between cache hit rates and the popularity distribution of VoD titles is briefly examined to set the stage. Then two modeling approaches for hierarchical cache optimization in IPTV networks are considered. In the first model described in Section III, for a given topology, service traffic characteristics, configuration and cost parameters, we perform network and equipment dimensioning at every level of the hierarchy and find the optimal cache architecture using a heuristic algorithm. This model (which we will call the heuristic model) is suitable when we need to optimize the caching architecture in a particular IPTV network with a given topology, traffic and equipment characteristics. However, because of multiple levels of cost modularity in the heuristic model, it is difficult to analyze the factors that affect the solution using this approach. The second modeling approach described in Section IV, uses simpler cost structures and some reasonable assumptions, which allow us to obtain an analytically optimal solution for the problem and to analyze the factors that affect this solution. This model (which we call the analytical model) is suitable for sensitivity analysis and allows us to determine the impact of various parameters on the optimal solution. Results of the sensitivity analysis are presented in Section V. Section VI summarizes our contribution and presents some concluding remarks regarding the importance of hierarchical caching for network cost optimization.

II. HIT RATE IN HIERARCHICAL NETWORKS Cache effectiveness may be described by hit rate, i.e.

percentage of all requests that are satisfied by the data in the cache. The discrete version of hit rate, H(n), represents a portion of service requests that may be served by the n “most popular” titles stored in the cache. The continuous version of hit rate, H(m), is a function of cache memory size m. Hit rate depends on the statistical characteristics of traffic (long- and short-term titles popularity) and on the effectiveness of the

caching algorithm to update the cache content [11]. Several service popularity models can be found in research

literature; we chose the Zipf Mandelbrot (ZM) distribution [12-13] for our purposes, although any alternative distribution could be used as well. The ZM Probability Mass Function is described by p(k) = C / (k + q)α; where C is a normalization constant, k is the rank of the object, q is the shift factor, and α is a power parameter that determines the steepness of the curve. In the ideal case, when the caching algorithm has complete information about the statistical characteristics of the traffic, the hit rate is equal to the cumulative popularity distribution. Fig. 2 shows the hit rate corresponding to various occasions (e.g. weekday, typical Friday, blockbuster, etc.) of 500 movie titles during different times of the week and year. Caching 10% of the most popular files in this example can result in hit rates anywhere from 15% up to 90%. Note that the hit rate increases with power parameter (i.e. steepness of the curve). The curves on this figure are used for illustration purposes only, but they are consistent with customer behavior trends.

Fig. 2. Cache hit rate and popularity distributions

In the case of multiple services, the hit rate depends on the popularity distribution and other characteristics of individual

DSLAM CO IO VHO

3TB

1GB

Cn

Mem

ory

C13TB

1GB

Cn

Mem

ory

C1

Mem

ory C1

ICC, PLTV, VoD, NPVR…

DSLAM CO IO VHO

3TB

1GB

Cn

Mem

ory

C13TB

1GB

Cn

Mem

ory

C13TB

1GB

Cn

Mem

ory

C13TB

1GB

Cn

Mem

ory

C1

Mem

ory C1

ICC, PLTV, VoD, NPVR…

Variability of VoD Popularity by days of week and seasonal events

0%10%

20%30%40%50%

60%70%80%

90%100%

1 51 101 151 201 251 301 351 401 451

Cac

he H

it R

ate

alpha = 0.2 alpha = 0.5 alpha = 1 alpha = 1.5

Blockbuster

Typical Friday

Weekday Off Season

Page 3: [IEEE 2009 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB) - Bilbao, Spain (2009.05.13-2009.05.15)] 2009 IEEE International Symposium on Broadband

mm09-01 3

services as described in [10]. In the following, we assume traffic symmetry for nodes at

each level (i.e. the hit rate is the same at each node of every level). We also assume no redundant caching, i.e. if some title is cached at a certain level (e.g. IO), this title is not cached again in downstream nodes (e.g., CO and DSLAM). The concept of “cumulative memory effect” of hierarchical caching, or “virtual” cache [14], is shown in Fig. 3. The “virtual” cache in any node of the tree is the actual cache at that node augmented by the caches in deeper nodes (downstream) of the tree. Video content residing in the “virtual” cache of the node (i.e., in the node itself or in a node downstream of it) reduces the unicast traffic on the upstream link of the node (and all links further upstream right up to the root of the tree). For example, if the cache size per DSLAM is m1, the cache size per service switch at CO is m2, and the cache size per service router at IO is m3, then the caching related traffic reduction (hit rate) at the DSLAM level is H(m1), at the CO level is H(m1+m2), and at the IO level is H(m1+m2+m3).

Fig. 3. Traffic flow in hierarchical network with cache memory.

III. HEURISTIC MODEL

A. Assumptions The cache optimization model described below may be

applied to any type of tree topology. However, in the following we assume that the tree is symmetrical, and network topology is defined by the following parameters:

- Number of subscribers per DSLAM - Number of DSLAMs per to CO - Number of COs per IO - Number of IOs per VHO There is an option in our model to dual-home COs, i.e.,

connect every CO to two IOs. In some small IPTV networks, COs are connected directly

to the VHO, and there is no IO level; the model can support this network topology as well.

We consider one multicast and one or several unicast services. Parameters of the multicast service (for busy hour)

are: - Number of offered High Definition (HD) and Standard

Definition (SD) channels - Bandwidth per HD and per SD channel - % of multicast viewers that view HD channels - % of set-top boxes (STB) tuned to multicast channels Parameters of every unicast service are: - Number of titles in the service - Average memory size per title - Average traffic per title - Hit rate In our model, the cache may be located at any combination

of the following layers: DSLAM, CO, or/and IO. We assume that there is one equipment shelf per DSLAM, and one or several (depending on traffic volume) equipment shelves per CO and IO. Typical equipment at a DSLAM may be the Alcatel-Lucent 7330 or the 7342 ISAM, CO equipment may be the Alcatel-Lucent 7450 Ethernet Service Switch or the 7750 Service Router, and IO equipment may be the 7750 Service Router. Note that our model is equipment flexible and new equipment types may be used.

We assume equipment uniformity: the same equipment type is used within one layer (e.g., 7330 at all DSLAMs, 7450 at all COs, 7750 at all IOs), the same number of shelves (e.g., 7450), number of cache modules and cache memory per shelf at every CO location, and the same number of shelves (e.g., 7750), number of cache modules and cache memory per shelf at every IO location.

Equipment configuration is characterized by the number of Input/Output Modules (IOMs, or slots) per shelf, number of Media Dependent Adapters (MDA) per IOM, and the number and type (bandwidth) of ports per MDA. Equipment cost comprises the common cost per shelf, and the cost per IOMs and MDAs.

The cache in each location comprises one or several cache modules; each cache module occupies one slot of the corresponding equipment shelf. Each cache module can store a limited amount of data (e.g., up to 3,000 GB), and can support a limited amount of traffic throughout (e.g., up to 20 Gbps). The amount of memory per cache module is a multiple of the memory granularity parameter (e.g., 100 GB). Cache cost includes cost per cache module and cost per unit of memory.

Note that the equipment configuration and cost structure can cause some modularity effects: relatively small variations in traffic volume may cause a significant change in the number of network elements (e.g. ports, MDAs, IOs, and even shelves), and, therefore, significant change in network cost.

The more cache modules and total cache memory per shelf, the more titles may be stored in the cache and the more unicast traffic requests will be served from this cache – therefore fewer resources (bandwidth, ports, equipments, etc.) will be required upstream from this cache location. On the other hand, there are a limited number of slots in the equipment, so the more slots are used for cache the fewer slots are available

All subscribers

T k1T k2T kNT

1st level(e.g. DSLAM)memory m1

2nd level(e.g. CO)memory m2

N-th levelmemory mN

m1 m2 mn

VHO

m1 m2 mn

k2=1-H(m1+m2) kN=1-H(Σmi)k1=1-H(m1)

All subscribers

T k1T k2T kNT

1st level(e.g. DSLAM)memory m1

2nd level(e.g. CO)memory m2

N-th levelmemory mN

m1 m2 mn

VHO

m1 m2 mn

k2=1-H(m1+m2) kN=1-H(Σmi)k1=1-H(m1)

Page 4: [IEEE 2009 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB) - Bilbao, Spain (2009.05.13-2009.05.15)] 2009 IEEE International Symposium on Broadband

mm09-01 4

for ports. The goal is to pick the optimal cache memory size and content distribution at every layer to reduce the overall

network costs (i.e. transport, equipment and cache cost).

Fig. 4. Main sheet of Cache Optimization Tool.

B. Cache Optimization Modes and Heuristics In the following, we consider three optimization modes: - Adhoc optimization, - Layered optimization, and - Global optimization. In Adhoc optimization, we assume that the cache

configuration (number of cache modules and the cache memory per shelf at every layer – DSLAM, CO, IO) is given. The goal of Adhoc optimization is to find the optimal (in terms of network cost) distribution of content between caches, i.e. how many titles of each service should be cached at each layer. Adhoc optimization also allows us to compute the cost of the network without any cache.

The other two modes of optimization – Layered optimization and Global optimization – allow us to optimize simultaneously both cache configuration and distribution of the content. These two modes of optimization are built on top of the Adhoc optimization. These two modes of optimization also allow us to constrain the layers for cache deployment, e.g., DSLAM only, CO only, IO only, DSLAM and CO only, etc. This may be useful for instances when caches can only be deployed at certain layers of the network.

Note that because of memory granularity (which is one of the model’s parameters) and the finite number of cache

modules per shelf, there are a finite (but possibly very large) number of different cache configurations we could consider. In the case of Global optimization, all possible cache configurations are enumerated, and Adhoc optimization is carried out for every cache configuration. The cache configuration that gives the best Adhoc optimization result will be the solution of the Global optimization method. Usually, Global optimization requires long processing times. Layered optimization gives a “good enough” solution much more quickly.

The basic building block of Layered optimization is optimizing the cache for one particular layer (e.g. CO) while keeping the cache configuration for the other layers (e.g. DSLAM and IO) to be fixed.

In Layered optimization, an ordered subset of layers is first selected (in one particular case, all three layers – DSLAM, CO and IO – could be selected). Cache optimization is done for the 1st selected layer and the optimal cache configuration for this layer is fixed. Then, cache optimization is done for the 2nd selected layer, and so on. After cache optimization has been done for the last selected layer we return back to the 1st selected layer and repeat the process. This process stops when no more cost improvement results from the optimization of any of the selected layers.

Unlike Global optimization, the Layered optimization

Page 5: [IEEE 2009 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB) - Bilbao, Spain (2009.05.13-2009.05.15)] 2009 IEEE International Symposium on Broadband

mm09-01 5

solution is a local optimum. However, in all the cases we considered, the results of Global and Layered optimization were close or identical.

C. Cache Optimization Tool A Cache Optimization Tool based on the models described

in the previous sections has been developed. A mathematical model representing the problem was built as a MS Excel module and Visual Basic for Applications (VBA) is used to interface this module with a heuristic optimization algorithm. The Excel module consists of four parts (sheets): Main, Multicast, Unicast and Sensitivity.

The Main sheet (Fig. 4) includes the main input data (green colored cells), intermediate results, control buttons (“Reset Cache”, “Adhoc Run”, “Optimize by Layers”, “Optimize Globally”), and the main results of optimization (yellow colored cells). The sheet is protected with the exception of “green” cells (input data). The input data on this sheet includes network topology, check boxes that control the mode of optimization, and for every level – DSLAM, CO, IO and VHO - the cache size, equipment configuration and cost parameters. The output data includes cache, traffic and cost configuration for the optimal solution, as well as the unicast content distribution between caches at each level.

The Multicast sheet is for entering multicast traffic-related information (number of multicast channels, their bandwidth, hit rate distribution parameters, etc.), generating the corresponding distribution and calculating the multicast traffic on the feeder link (i.e. the link between CO and DSLAM).

The Unicast sheet is for entering unicast traffic-related information (number of unicast services, and, for every unicast service, the traffic volume per subscriber, the number of titles, average size per title, and hit rate distribution parameters).

For hit rate distributions, the Zipf-Mandelbrot or custom distributions defined by tables, are used.

The Sensitivity sheet allows selection of any one or combination of two input parameters of the system to run a sensitivity analysis against those parameters.

IV. ANALYTICAL MODEL

A. Assumptions In order to solve cache optimization problem analytically,

we need to make some simplifying assumptions. First, in this study we ignore granularity factors mentioned

above. In particular, we assume that the cost of cache memory is proportional to the size of the cache and that equipment cost is proportional to the amount of traffic that traverses the equipment. More specifically, we estimate the equipment cost based on the amount of traffic received by this equipment from higher levels of network hierarchy (r-traffic) and from the amount of traffic sent by this equipment to lower levels of the network hierarchy (s-traffic). For example, the cost of a CO node may be estimated based on (1) the amount of traffic this CO node sends to DSLAMs (or s-traffic) and the cost per unit of this traffic, and (2) the amount of traffic that this CO node

receives from IO (or r-traffic), and the cost per unit of this traffic. In order to calculate a total network cost, we define for each level of the network hierarchy (DSLAM, CO, IO) a cost per unit of cache memory and a cost per unit of s- and r-traffic (see more details in the next section).

We assume that the tree topology structure is perfectly symmetric, i.e. the "fan-out" at each level is the same. A similar assumption is made about the traffic (the demand is the same for all populations attached to a DSLAM) and popularity distribution. We assume that all contents are downloaded only once to the caches during off-peak time, so we can ignore the network cost associated with these downloads.

Our next assumption is about the hit rate, H(m), as a function of cache memory m. It is obvious that the hit rate, H(m), increases with memory m. Additionally, we assume that H(m) has a continuous derivative H’(m) and this derivative is strictly decreasing (effect of diminishing returns for hit rate). This condition means that the function H(m) is strictly concave. To justify this assumption we note that the function H’(m) in some ideal cases (when the caching algorithm has complete information about the statistical characteristics of the traffic) is equivalent to a popularity curve, which, indeed, decreases as a function of a title’s rank.

In the following, we assume that out of the two cache resources – cache size and cache throughput – cache size is a limiting factor and the only resource to be considered. We can justify this assumption for the cache in DSLAMs by setting a maximum cache size that guarantees that traffic from the cache does not exceed the cache throughput.

In the analytical model we consider unicast traffic only. Because of replication, multicast traffic is a relatively small portion of the total traffic between the VHO, IO, CO and DSLAM levels and, therefore, does not make a big impact on equipment costs at those levels.

B. Mathematical Formulation In the case of tree topology with K levels of hierarchy, the

total network cost, NtwkCost, may be calculated as

)))(1)(((

),...,,(

1 111

121

∑ ∑

= =+

=

−++

+=

K

k

k

j jt

sktkr

ts

K

kk

mkkK

mHccTTc

mcNmmmNtwkCost (1)

We use the following parameters in (1): Decision variables: mk – cache memory size per node at k-th

level, 1≤k≤K (GB). T – total amount of traffic requested by subscribers (Mbs) H(m) – hit rate as a function of cache memory m Nk – number of nodes at k-th level of hierarchy, 1≤k≤K Mk – maximum cache size per node at k-th level, 1≤k≤K,

(GB) mkc - cost of cache memory at k-th level, 1≤k≤K ($/GB) tksc and t

krc - cost of traffic at k-th level sent to (k-1)-th ( s-traffic) and received from (k+1)-th level (r-traffic), 1≤k≤K+1 ($/Mbs)

Page 6: [IEEE 2009 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB) - Bilbao, Spain (2009.05.13-2009.05.15)] 2009 IEEE International Symposium on Broadband

mm09-01 6

The goal is to minimize network cost subject to constraints on cache memory size:

min),..,,( 21 →KmmmNtwkCost (2)

such that ,0 kk Mm ≤≤ i=1≤k≤K (3) Our further analysis will leverage an analytical solution of

this problem that has been obtained (for K=3) in [7].

V. MODELING RESULTS

A. Reference Scenario Definition A large metropolitan DSL based ISP network is used in our

reference scenario. A 4-level network is assumed with DSLAMs at the lowest level that are aggregated at COs by routers. In large metros there are often intermediate aggregation points, known as intermediate offices (IOs), that aggregate several COs. The IOs all terminate at a VHO that can be collocated with a Point of Presence (PoP).

Our topology assumptions are: - The total number of DSLAMs in the network, is 9,600 - The total number of service switches in all COs, is 100 - The total number of service routers in all IOs, is 16 The following maximum storage limits per cache location

are assumed: - The maximum cache size per DSLAM is 100 GB - The maximum cache size per service switch at CO is

12,000 GB (12TB) - The maximum cache size per service router at IO is 24,000

GB (24TB) Our cost assumptions are: - m

kc , k = 1,2,3, the cost of flash memory is $22/GB

- trc1 , the cost of traffic that a DSLAM receives from a CO

is $1.5/Mbps - t

sc2 and trc2 , the cost of traffic that a CO sends to a

DSLAM and receives from a IO respectively, is $2.5/Mbps - t

sc3 and trc3 , the cost of traffic that a IO sends to a CO and

receives from the VHO respectively, is $4/Mbps The total traffic T is varied to investigate the impact of

increasing traffic on different caching solutions. Finally we assume Zipf-Mandelbrot distribution for popularity with a power parameter alpha = 1 for the reference scenario.

These numbers were chosen based on empirical data and industry averages; nevertheless, a variety of sensitivity analyses was done to investigate the degree to which the

results and conclusions would depend on specific values of these parameters. In the following sections, all parameters (unless mentioned specifically) have values from this reference scenario.

Fig. 5. Optimal cache solution for varying traffic

B. Sensitivity to Traffic Variation The modeling results of the reference scenario are shown in

Fig. 5. The graph shows the optimal cache solution in terms of memory required per location as traffic volume is varied. The graph also shows the relative cost savings achievable due to cache deployment.

According to the graph, for a traffic volume of 300 Mbps at the DSLAM, the optimal cache solution requires 9TB of storage at the IO, 1TB at the CO, and 4GB at the DSLAM, and this provides a cost gain of over 55%. Note that the maximum storage limit is not reached at any location.

We observe that the solution becomes hierarchical (as opposed to single level caching) with increase in traffic volume. As traffic volume increases, caches are first deployed at the IO, then at the CO, and finally at the DSLAM too.

In the next scenario the popularity assumption is changed, which will impact the hit ratio for a given amount of memory. Fig. 6 shows the optimal caching solutions for alpha values of 0.75 and 1.2. For the lower alpha value, the optimal solution does not include DSLAM caches until the traffic exceeds 600 Mbps per DSLAM, while in the original scenario (alpha = 1) it occurs at slightly over 100Mbps/DSLAM (see Fig. 4). Also, the relative cost gain drops from between 40-60% in the original scenario to between 20-40% for the lower alpha value. We deduce that for the flatter popularity curve more traffic is required to justify DSLAM caching. For α = 1.2, the trends are much like the original scenario, but the cost gain

alpha = 1

1

10

100

1,000

10,000

100,000

0 100 200 300 400 500 600Traffic per DSLAM (Mbps)

Opt

imal

cac

he s

ize

(GB

)

0%

20%

40%

60%

80%

100%

Cos

t gai

n

DSLAM CO IO Cost gain

alpha = 1

1

10

100

1,000

10,000

100,000

0 100 200 300 400 500 600Traffic per DSLAM (Mbps)

Opt

imal

cac

he s

ize

(GB

)

0%

20%

40%

60%

80%

100%

Cos

t gai

n

DSLAM CO IO Cost gain

Page 7: [IEEE 2009 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB) - Bilbao, Spain (2009.05.13-2009.05.15)] 2009 IEEE International Symposium on Broadband

mm09-01 7

Fig. 6. Optimal Cache Solution for Reference Scenario varying traffic, for different alphas.

Fig. 7. Flash Crowds impact both CDN providers and ISPs.

Fig. 8. Optimal cache solution for varying service popularities.

curve is now between 60-80% savings. In other words, as the slope of the popularity curve increases, greater costs savings can be expected from deploying caches.

We conclude from our traffic sensitivity analysis that in all cases the optimal solution starts from being a “centralized” one (cache only at IO level) and gradually becomes “decentralized” as the traffic volume increases. Further, the optimal solution is always hierarchical with a relatively few files being cached at the DSLAM. The DSLAM cache size remains less than 10GBs, a few orders of magnitude less than at the CO and IO. In the next section, the optimum cache behavior for varying popularities will be explored in more detail.

C. Flash Crowds In the age of instant communication via blogs, email/SMS,

etc. a popular video file can result in a network effect called flash crowd, in which many subscribers look for the same file within a short period of time. This can cause congestion in network operators’ transit links and CDNs’ edge servers as bandwidth demands and requests overload facilities (see Fig. 7). The choice for the operator is to either over-dimension the network for these extraordinary events, which would be inefficient, or expect unhappy subscribers who are subjected to delays or disconnections during congestion, which would be just as undesirable.

This sociological phenomenon of flash crowds is quite

Traffic per DSLAM 500 Mbps

1

10

100

1,000

10,000

100,000

0.0 0.5 1.0 1.5 2.0 2.5

ZM power

Opt

imal

cac

he s

ize

(GB

)

0%

20%

40%

60%

80%

100%

Cos

t gai

n

DSLAM

CO

IO

Cost gain

Traffic per DSLAM 100 Mbps

1

10

100

1,000

10,000

100,000

0.0 0.5 1.0 1.5 2.0 2.5

ZM power

Opt

imal

cac

he s

ize

(GB

)

0%

20%

40%

60%

80%

100%

Cos

t gai

n

DSLAM

CO

IO

Cost gain

decentralize

decentralize

Flash Crowd Flash Crowd

Few filesFew files

Traffic per DSLAM 500 Mbps

1

10

100

1,000

10,000

100,000

0.0 0.5 1.0 1.5 2.0 2.5

ZM power

Opt

imal

cac

he s

ize

(GB

)

0%

20%

40%

60%

80%

100%

Cos

t gai

n

DSLAM

CO

IO

Cost gain

Traffic per DSLAM 100 Mbps

1

10

100

1,000

10,000

100,000

0.0 0.5 1.0 1.5 2.0 2.5

ZM power

Opt

imal

cac

he s

ize

(GB

)

0%

20%

40%

60%

80%

100%

Cos

t gai

n

DSLAM

CO

IO

Cost gain

decentralize

decentralize

decentralize

decentralize

Flash Crowd Flash Crowd

Few filesFew files

alpha = 0.75

1

10

100

1,000

10,000

100,000

0 200 400 600 800 1,000Traffic per DSLAM (Gbs)

Opt

imal

cac

he s

ize

(GB

)

0%

20%

40%

60%

80%

100%

Cos

t gai

n

DSLAM CO IO Cost gain

alpha = 1.2

1

10

100

1,000

10,000

100,000

0 100 200 300 400 500Traffic per DSLAM (Gbs)

Opt

imal

cac

he s

ize

(GB

)

0%

20%

40%

60%

80%

100%

Cos

t gai

n

DSLAM CO IO Cost gain

decentralize

alpha = 0.75

1

10

100

1,000

10,000

100,000

0 200 400 600 800 1,000Traffic per DSLAM (Gbs)

Opt

imal

cac

he s

ize

(GB

)

0%

20%

40%

60%

80%

100%

Cos

t gai

n

DSLAM CO IO Cost gain

alpha = 1.2

1

10

100

1,000

10,000

100,000

0 100 200 300 400 500Traffic per DSLAM (Gbs)

Opt

imal

cac

he s

ize

(GB

)

0%

20%

40%

60%

80%

100%

Cos

t gai

n

DSLAM CO IO Cost gain

decentralize

decentralize

ISP

ISP TransitBW Growth

Internet

CDN

CDN ServerGrowthsubscribers

ContentProvider

content

Traffic per DSLAM 500 Mbps

1

10

100

1,000

10,000

100,000

0.0 0.5 1.0 1.5 2.0 2.5

ZM power

Opt

imal

cac

he s

ize

(GB

)

0%

20%

40%

60%

80%

100%

Cos

t gai

n

DSLAM

CO

IO

Cost gain

Traffic per DSLAM 100 Mbps

1

10

100

1,000

10,000

100,000

0.0 0.5 1.0 1.5 2.0 2.5

ZM power

Opt

imal

cac

he s

ize

(GB

)

0%

20%

40%

60%

80%

100%

Cos

t gai

n

DSLAM

CO

IO

Cost gain

decentralize

decentralize

Flash Crowd Flash Crowd

Few filesFew files

Traffic per DSLAM 500 Mbps

1

10

100

1,000

10,000

100,000

0.0 0.5 1.0 1.5 2.0 2.5

ZM power

Opt

imal

cac

he s

ize

(GB

)

0%

20%

40%

60%

80%

100%

Cos

t gai

n

DSLAM

CO

IO

Cost gain

Traffic per DSLAM 100 Mbps

1

10

100

1,000

10,000

100,000

0.0 0.5 1.0 1.5 2.0 2.5

ZM power

Opt

imal

cac

he s

ize

(GB

)

0%

20%

40%

60%

80%

100%

Cos

t gai

n

DSLAM

CO

IO

Cost gain

decentralize

decentralize

decentralize

decentralize

Flash Crowd Flash Crowd

Few filesFew files

Page 8: [IEEE 2009 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB) - Bilbao, Spain (2009.05.13-2009.05.15)] 2009 IEEE International Symposium on Broadband

mm09-01 8

suitably addressed by distributed caching networks. In the next sensitivity analysis, the popularity distribution power parameter (alpha) is varied to investigate its impact on caching solutions.

Fig. 8 shows the result of varying the popularity distribution parameter alpha on optimal caching solutions for two traffic profiles – 100Mbps per DSLAM and 500Mbps per DSLAM. The graphs show the amount of storage required per location as before, but the right hand vertical axis represents cache hit rate, not cost gain.

As the popularity distribution parameter value is increased, the caching solution becomes increasingly hierarchical and decentralized, similar to the trend that was observed as traffic volume was increased. As alpha increases, cache storage is initially deployed only at the IOs, then gradually deployed at

the COs, and ultimately deployed at the DSLAMs too. Also, unlike the sensitivity to traffic volume, as alpha increases, the cache storage requirements decrease dramatically (note the log scale), with those at the IO decreasing by more than 2 orders of magnitude.

Flash crowds are characterized by dramatic traffic growth and a few highly popular files, which is exactly what the preceding two graphs depict. The graph on the right has a 500% growth in traffic compared to the graph on the left; in addition, the high alpha values (α > 2) characterize flash crowds. Therefore, as traffic volume and alpha increase, the most popular files should ideally be stored closer to the subscriber. In other words, flash crowds lead to caching solutions which are more decentralized.

.

Fig. 9. Cost benefit of decentralized cache solution for flash crowd scenarios.

Finally, the impact of not using a decentralized caching

solution, particularly in the context of a flash crowd event is investigated. Fig. 9 illustrates the benefit of a decentralized (or hierarchical) caching solution over centralized solutions. The storage per location as well as the cost gain curves are shown. In this case, there are two cost gain curves, one in which caching is constrained to be used only at the IOs (i.e. no caching permitted at COs and DSLAMs), and the other in which caching is allowed at any or all of the three levels (i.e. decentralized hierarchical caching). .The three storage curves (DSLAM, IO and CO) are for the latter case, while the IO-only storage curve is not shown..

We note the significant advantage to cache decentralization for flash crowd scenarios, a cost savings of 40-45% depending on traffic. Also, large amounts of memory need not be deployed close to the subscriber, rather only the most popular file(s) need be stored as close to the subscriber as possible. Thus pushing the most popular content very close to the subscriber can greatly reduce scalability requirements and

congestion. On the other hand, higher power parameter alpha during flash crowd causes increase of cache hit rate, which implies that the traffic upstream from the cache would be reduced alleviating the problem with the flash crowd phenomenon

D. Falling Memory Prices Due to the boom in consumer electronics such as smart

phones, digital cameras, and the like, the price of flash memory is falling at a rate greater than 50% a year. Recently it became cheaper than DRAM and it continues to drop precipitously, as seen in Fig. 10.

The storage for a HD movie cost $2,500 only a few years ago, whereas it costs about $80 today and in 2 years, it will likely cost $24. Due to this trend flash memory is being used by vendors for distributed VoD streaming, allowing operators to move video streamers closer to the subscriber, in some cases into the CO. The impact of flash memory prices on caching solutions is examined next.

Gain due to decentralizing

Traffic per DSLAM 100 Mbs

1

10

100

1,000

10,000

100,000

0.0 0.5 1.0 1.5 2.0 2.5

ZM Power

Opt

imal

Cac

he S

ize

(GB

)

0%

20%

40%

60%

80%

100%

Cos

t Gai

n

DSLAMCOIOCost gainIO Only - Cost gain

Traffic per DSLAM 1000 Mbs

1

10

100

1,000

10,000

100,000

0.0 0.5 1.0 1.5 2.0 2.5

ZM Power

Opt

imal

Cac

he S

ize

(GB

)

0%

20%

40%

60%

80%

100%

Cos

t Gai

n

DSLAMCOIOCost gainIO Only - Cost gain

40% 45%

Gain due to decentralizing

Traffic per DSLAM 100 Mbs

1

10

100

1,000

10,000

100,000

0.0 0.5 1.0 1.5 2.0 2.5

ZM Power

Opt

imal

Cac

he S

ize

(GB

)

0%

20%

40%

60%

80%

100%

Cos

t Gai

n

DSLAMCOIOCost gainIO Only - Cost gain

Traffic per DSLAM 1000 Mbs

1

10

100

1,000

10,000

100,000

0.0 0.5 1.0 1.5 2.0 2.5

ZM Power

Opt

imal

Cac

he S

ize

(GB

)

0%

20%

40%

60%

80%

100%

Cos

t Gai

n

DSLAMCOIOCost gainIO Only - Cost gain

40% 45%

Page 9: [IEEE 2009 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB) - Bilbao, Spain (2009.05.13-2009.05.15)] 2009 IEEE International Symposium on Broadband

mm09-01 9

Fig. 10. Flash memory pricing trend.

Fig. 11 shows the impact of memory price falling from $1000/GB to $1/GB (note log scale). For a “normal” popularity distribution curve, or even a ‘flat’ one, the trend is to use more storage as the memory price falls. Additionally, network savings (or cost gain) increases every year as network operators can store twice as much with each passing year for the same price, thus increasing the hit rate and reducing transport costs.

Finally, we see that the trend for caching solutions is the same as before; as the incentives for caching grow, cache decentralization occurs and savings increase.

Fig. 11. Falling memory pricing impact on optimal caching solutions.

E. Market Trends In the previous sections, the sensitivity of optimal caching

solutions to increases in traffic volume, steepness of the popularity distribution curve, the flash crowd phenomenon, and falling memory prices were analyzed. The trend observed in all cases was that these variations led to increased savings, greater decentralization and optimal cache solutions that were hierarchical. The variation of the input parameters used in the

sensitivity analyses reflects today’s market trends; as consumers watch more and higher quality video on the Internet, the significantly higher bandwidth per subscriber will lead to higher traffic requirements per DSLAM.

Flash crowds are a persistent problem, and we have shown that decentralized storage provides the best solution. Memory prices are in reality falling at dramatic rates, which would facilitate storing content closer to the subscriber. While our analysis looked at these events individually, the reality is that they are occurring together simultaneously. These three factors have a multiplicative effect towards decentralization. Thus we conclude that there is strong evidence for extending CDNs closer and closer to subscribers over time.

F. Network Topology Impact Finally the impact of network topology on caching solutions

is considered. Globally network operator topologies vary due to differences in loop lengths, number of COs per region and broadband strategy (VDSL, ADSL, GPON, etc.). For a given number of COs, longer loop networks require distributed (and smaller) DSLAMs and more of them per CO, while shorter loop networks allow centralized (and larger) DSLAMs and less of them per CO.

Looking back at our analytical solution, consider the case where the optimum cache has a moderate (or “non boundary”) solution. mi for each location is: 0 < im , < Mi; i=1, 2, 3. According to eq. (12)-(14) in [7],

1

22111 )(

TscNcNmH

mmo −=′ (4)

2

332221 )(

TscNcN

mmHmm

oo −=+′ (5)

3

33321 )(

TscNmmmH

mooo =++′ (6)

where s1, s2, and s3 are traffic cost parameters:

tt

tt

tt

ccs

ccs

ccs

41323

31222

21121

+=

+=

+=

(7)

From equation (4), as 1N (# of DSLAMs) increases for a

given amount of traffic T the total memory at the DSLAM 1m must decrease. From equation (5), we see that the total storage

1m + 2m does not depend on the number of DSLAMs ( 1N ),

thus if 1N increases and 1m decreases, 2m must increase by a sufficient amount so as to satisfy both equations. Therefore if the number of COs is fixed, and the number of DSLAMs can be changed, all other things being equal, whatever storage is removed from the DSLAM will be added to the CO.

Price Erosion for Flash (NAND) and DRAM Memory(storage cost per HD and SD movie)

$1

$10

$100

$1,000

2003 2004 2005 2006 2007 2008 2009 2010 2011

$/G

B

DRAM NAND

$2900 HD$720 SD

$265 HD$66 SD

$80 HD$20 SD

$24 HD$6 SD

* Source: Objective Analysis, August 2007

Price Erosion for Flash (NAND) and DRAM Memory(storage cost per HD and SD movie)

$1

$10

$100

$1,000

2003 2004 2005 2006 2007 2008 2009 2010 2011

$/G

B

DRAM NAND

$2900 HD$720 SD

$265 HD$66 SD

$80 HD$20 SD

$24 HD$6 SD

* Source: Objective Analysis, August 2007

1

10

100

1,000

10,000

100,000

0 100 200 300 400 500 600 700 800 900 1,000

Memory cost ($/GB)

Opt

imal

cac

he s

ize

(GB

)

0%

20%

40%

60%

80%

100%

Cos

t gai

n

DSLAM CO IO Cost gain

Page 10: [IEEE 2009 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB) - Bilbao, Spain (2009.05.13-2009.05.15)] 2009 IEEE International Symposium on Broadband

mm09-01 10

Fig. 12. Optimal Cache Solution for varying topology.

Fig. 12 shows the result of varying the number of DSLAMs.

The first graph shows that as the number of DSLAMs increases the total amount of memory decreases. Additionally, while not visible from this log scale the memory across DSLAMs and CO ( 1m + 2m ) remains constant – whatever is reduced in the DSLAM is added to the CO. Lastly, in the second graph we have limited the solution to DSLAM only, and we can see more clearly that the savings decrease as the number of DSLAMs increase. We can see also that the less number of DSLAMs (and, therefore, the larger the size of DSLAM) the more likely a cache solution exists there.

VI. CONCLUSIONS In this paper we describe two modeling approaches –

heuristic and analytical – for hierarchical cache optimization in an IPTV network. Both models use several key parameters – network topology, traffic volume, hit rate and cost – to calculate optimal cache sizes in DSLAM, CO and IO nodes. The heuristic model takes into account more detailed information about equipment configuration and cost; this model is suitable when we need to optimize caching architectures in a particular IPTV network. However, because of multiple levels of cost modularity in the heuristic model, it is difficult to analyze factors that affect the solution using this approach. The analytical approach uses a simpler cost structure and some reasonable assumptions, which allow us to identify fundamental factors that affect this solution. A sensitivity study based on the analytical model allowed us to estimate the impact of various parameters on the optimal cache configuration and demonstrated that for many typical scenarios the optimal cache configuration includes caches at two (CO and IO) or all three levels of the network hierarchy.

Based on three key market trends – increasing bandwidth per subscriber, flash crowds and falling memory prices, our analysis shows that hierarchical and fully distributed CDNs can save significantly over traditional edge caching techniques. Given that we are still in the infancy of the Internet video age, we can expect much higher subscriber bandwidth, larger flash crowds, and much lower flash memory prices, and as such, we anticipate the need to decentralize CDNs further.

REFERENCES. [1] B. Krogfoss, L. Sofman, and A. Agrawal., “Caching architecture and

optimization strategies for IPTV networks.”. Bell Labs Tech. J., vol. 13, N3, pp.13-28, Fall 2008.

[2] S. Ghandeharizadeh, and S. Shayandeh, “Greedy Cache Management Technique for mobile Devices”, Data Engineering Workshop, 2007 IEEE 23rd International Conference , pp. 39–48, April 2007.

[3] S. Ghandeharizadeh, T. Helmi, T. Jung, S. Kapadia, and S. Shayandeh, “An Evaluation of Two Policies for Simple Placement of Continuous Media in Multi-hop Wireless Networks”, Twelfth International Conference on Distributed Multimedia Systems (DMS), August 2006.

[4] H. Chen, H. Jin, J. Sun, X. Liao, and D. Deng, “A new proxy caching scheme for parallel video servers”, Computer Networks and Mobile Computing, pp.438–441, Oct. 2003.

[5] C. Cobarzan and L. Boszormenyi, “Further Developments of a Dynamic Distributed Video Proxy-Cache System”, Parallel, Distributed and Network-Based Processing, 15th EUROMICRO International Conference, Feb. 2007, pp.349–357.

[6] J. P. Lee and S. H. Park, “A cache management policy in proxy server for an efficient multimedia streaming service”, in Proceedings of the Ninth International Symposium on Consumer Electronics, June 2005, pp.64–68.

[7] L. Sofman and B. Krogfoss, “Analytical Model for Hierarchical Cache Optimization in IPTV Network”, IEEE Transactions on Broadcasting, vol. 55, No. 1, pp.62-70, March 2009.

[8] M. Verhoeyen, D. De Vleeschauwer, and D. Robinson, “Content storage architectures for boosted IPTV service”, Bell Labs Tech. J., vol.13, N3, pp. 29-43, Fall 2008.

[9] D. De Vleeschauwer and K. Laevens, "Performance of caching algorithms for IPTV on-demand services", accepted for “Special Issue on IPTV in Multimedia Broadcasting”, a special issue of the IEEE Transactions on Broadcasting, 2008

[10] L. Sofman, B. Krogfoss, and A. Agrawal, “Optimal Cache Partitioning in IPTV Network”, In Proceedings of 11th Communications and Networking Simulation Symposium, CNS’08, April 14-17, 2008, Ottawa, Canada, pp. 79-84, 2008.

[11] S. Vanichpun and A.M. Makowski, “Comparing strength of locality of reference - popularity, majorization, and some folk theorems”, INFOCOM 2004. Twenty-third Annual Joint Conference of the IEEE Computer and Communications Societies, Vol. 2, 7-11, pp. 838 - 849, March 2004.

[12] Zipf-Mandelbrot Law (online document). Available at http://en.wikipedia.org/wiki/Zipf-Mandelbrot_law.

[13] P. C. Breslau, L. Fan, G. Phillips, and S. Shenker, “Web caching and Zipf-like distributions: Evidence and implications,” Proc. of IEEE Infocom, pp. 126-134, 1999.

[14] D. De_Vleeschauwer, K. Laevens, Caching to reduce the peak rate, Alcatel-Lucent internal report, 2007.

alpha = 1.0

1

10

100

1,000

10,000

100,000

0 1,000 2,000 3,000 4,000 5,000

# of DSLAMs

Opt

imal

cac

he s

ize

(GB

)

0%

20%

40%

60%

80%

100%

Cos

t gai

n

DSLAM CO IO Cost gain

Savings decrease

Savings decrease

smallGPON Med

Total Traffic in Network Fixed (DSLAM only)alpha = 1.0

1

10

100

1,000

10,000

100,000

0 1,000 2,000 3,000 4,000 5,000

# of DSLAMs

Opt

imal

cac

he s

ize

(GB

)

0%

20%

40%

60%

80%

100%

Cos

t gai

n

DSLAM CO IO Cost gain

Total Traffic in Network Fixed