When to kill your siblings: cache mesh relation analysis

7
Computer Networks and ISDN Systems 30 (1998) 2105–2111 When to kill your siblings: cache mesh relation analysis Ingrid Melve 1 UNINETT, 7034 Trondheim, Norway Abstract Currently, no documented evaluation procedure exists for analysing the most economical relationship between cache proxy servers in terms of cost, latency, and bandwidth use. This article seeks to discover the threshold value for relations between servers, based on bandwidth saved, latency reduction and local cost for Internet access. This threshold value will be particularly relevant to work on auto configuration of large multi server proxy cache systems. This article presents a formula for calculating the efficiency of inter-cache relations, which can be used to determine when to terminate a relationship. 1998 Elsevier Science B.V. All rights reserved. Keywords: Web proxy cache; Inter-cache analysis 1. Motivation 1.1. Goals This work intends to examine co-operating Web cache systems, as typified by Squid meshes like COM-MESH, with three major goals: (1) Evaluate relations. (2) Find components ž delay, ž byte gain (traffic volume), ž cost factors. (3) Use threshold to terminate relationship. Although our work started with a concrete problem tied to a large scale Squid mesh, this work presents a generic framework for inter-cache communication analysis. The results are based on operational experi- ence and analysis of log files. Some of the questions that were raised during our test of COM-MESH: 1 E-mail: [email protected]. ž Value of adding sibling in COM-MESH? ž What are we doing when playing with large scale WAN meshes? ž Cost is important, how do we find break-even? ž Finding routing metrics for our application level routing. ž Homogeneous siblings, is this a problem? The Web cache proxy software used is Squid [1]. Co-operating Web proxy cache servers using sibling mode are used to illustrate the principles. (See [2] for more information on Web cache meshes.) The work may easily be interpreted to analyze parent relations. Local proxy performance is not considered, only traffic into the proxy. Gerhard Winkler of ACOnet has found that hit rates drop over time, since cache content becomes homogenized. One solution to this is to use the proxy-only mode; another is to establish a round- robin solution and change relations to the next on list once a cache drops below the threshold [3]. He needs a threshold to start with. 0169-7552/98/$ – see front matter 1998 Elsevier Science B.V. All rights reserved. PII:S0169-7552(98)00252-9

Transcript of When to kill your siblings: cache mesh relation analysis

Page 1: When to kill your siblings: cache mesh relation analysis

Computer Networks and ISDN Systems 30 (1998) 2105–2111

When to kill your siblings: cache mesh relation analysis

Ingrid Melve 1

UNINETT, 7034 Trondheim, Norway

Abstract

Currently, no documented evaluation procedure exists for analysing the most economical relationship between cacheproxy servers in terms of cost, latency, and bandwidth use. This article seeks to discover the threshold value for relationsbetween servers, based on bandwidth saved, latency reduction and local cost for Internet access. This threshold value willbe particularly relevant to work on auto configuration of large multi server proxy cache systems. This article presentsa formula for calculating the efficiency of inter-cache relations, which can be used to determine when to terminate arelationship. 1998 Elsevier Science B.V. All rights reserved.

Keywords: Web proxy cache; Inter-cache analysis

1. Motivation

1.1. Goals

This work intends to examine co-operating Webcache systems, as typified by Squid meshes likeCOM-MESH, with three major goals:(1) Evaluate relations.(2) Find componentsž delay,ž byte gain (traffic volume),ž cost factors.

(3) Use threshold to terminate relationship.Although our work started with a concrete problem

tied to a large scale Squid mesh, this work presentsa generic framework for inter-cache communicationanalysis. The results are based on operational experi-ence and analysis of log files. Some of the questionsthat were raised during our test of COM-MESH:

1 E-mail: [email protected].

ž Value of adding sibling in COM-MESH?ž What are we doing when playing with large scale

WAN meshes?ž Cost is important, how do we find break-even?ž Finding routing metrics for our application level

routing.ž Homogeneous siblings, is this a problem?

The Web cache proxy software used is Squid [1].Co-operating Web proxy cache servers using siblingmode are used to illustrate the principles. (See [2] formore information on Web cache meshes.) The workmay easily be interpreted to analyze parent relations.Local proxy performance is not considered, onlytraffic into the proxy.

Gerhard Winkler of ACOnet has found that hitrates drop over time, since cache content becomeshomogenized. One solution to this is to use theproxy-only mode; another is to establish a round-robin solution and change relations to the next on listonce a cache drops below the threshold [3]. He needsa threshold to start with.

0169-7552/98/$ – see front matter 1998 Elsevier Science B.V. All rights reserved.PII: S 0 1 6 9 - 7 5 5 2 ( 9 8 ) 0 0 2 5 2 - 9

Page 2: When to kill your siblings: cache mesh relation analysis

2106 I. Melve / Computer Networks and ISDN Systems 30 (1998) 2105–2111

1.2. Why cache

The two primary reasons for running a Web cacheare (1) saving bandwidth (at bottlenecks), and (2) re-ducing latency. Other gains from Web proxy caching,such as anonymization and access control, are notexamined in this article.

Latency reduction is important for clients placeda long distance from the Web servers, as settingup a TCP connection takes time. Typical values arearound 3 seconds for UNINETT users, and even USusers experience 1.5 s delays [4]. Most popular Webservers are placed in the US, a considerable distancefrom non-American users.

The time spent determining if the sibling cachehas the object is negligible compared with down-load time. Typical values are 2 orders of magnitudesmaller, when using Squid’s implementation of ICP.

2. Values

Basic components for evaluating Web cache rela-tions are(1) a factor representing the delay (compared with

general delay for the object domain),(2) traffic volume: traffic flow into the proxy from

sibling, and traffic flow into the proxy fromorigin Web servers,

(3) two cost factors representing the cost for gettingobjects directly as opposed to via the cooperatingcache, measured in traffic volume and delay.

2.1. Delay

The optimum latency measurement would be tomeasure median response time for hits during peakload and median response time for misses duringpeak load. Different response types should be sep-arated, and emphasis placed on HTTP ‘200 OK’responses.

Performance issues related to the computationof real-time median suggests that using weightedmean (throw away values larger than 3 times thestandard deviation) yields a good enough result, asexperienced for the DePStat analysis package.

Experience in network measurements indicatethat Internet traffic is cyclic over the week. Week-

end traffic is significantly lower in UNINETT, whichindicated that we should take measurements on Mon-day through Friday. This may be different for othernetwork environments.

Dimensioning should be done using the extremeconditions where Web traffic is at its high water mark(when your feet get wet and you feel uncomfortableunless you move on to a better place — or higherbandwidth).

The delay when a sibling cache is in use is calcu-lated by adding the delay for hits and the delay formisses and dividing the result by the number of re-quests. The delay for hits is obtained by multiplyingthe weighted mean by the number of hits. The delayfor misses is calculated by multiplying the weightedmean by the number of misses.

The suggested formula for calculating delay whenusing a Web proxy cache sibling:

D D Ds Ł Th C Dw Ł Tm

where D is delay, Tc is number of object domainHTTP connections, Th is number of hits in objectdomain, Tm D Tc � Th, Ds is delay for sibling hits,Dw is delay for object direct from Web origin server.D is measured in peak hour.

The delay without a sibling cache is estimated bycalculating the delay for misses in the object domainmultiplied by number of requests.

Suggested formula for delay without Web proxycache sibling:

D D Dw Ł Tc

where D is delay, Tc is number of object domainHTTP connections, Dw is delay for object directfrom Web origin server.

High document hit ratio gives low latency (thereis a trade off between latency and high byte hitratio in the current Squid implementation). The doc-ument hit ratio is implicitly handled by consideringlatency.

2.2. Traffic volume

ICP traffic is used here to indicate non-HTTPinter-cache communication traffic.

The traffic volume to proxy is calculated both fortraffic from Web servers directly to proxy (misses)and for traffic from sibling (HTTP hits and ICP re-

Page 3: When to kill your siblings: cache mesh relation analysis

I. Melve / Computer Networks and ISDN Systems 30 (1998) 2105–2111 2107

sponses). The ratio between direct traffic and siblingHTTP traffic is the net byte gain. The ICP traffic isadded to compute the total traffic flow from siblingto proxy. If cost of traffic from the sibling is low, theICP traffic may be ignored.

The total traffic volume is the sum of ICP trafficfrom the sibling, HTTP traffic from the sibling (hits)and HTTP traffic from Web servers (misses).

ICP traffic from a sibling is in response to ICPrequests. All requests for documents result in an ICPresponse-request to sibling (unless there is a hit atthe proxy, but that case falls outside the scope of thisdocument). If the average object size is 3 kB, at leastscope of this document). If the average object size is 3kB, at least a 3.3% object hit rate before the ICP trafficexceeds the HTTP traffic on the link to the sibling (thebreak-even point). Each ICP request=response uses 64B, on average, per message.

Traffic needs not be symmetric. Only the trafficvolume flowing into the Web cache is calculatedhere. Similar calculations may be done for trafficflowing towards the sibling cache, but this is left outto simplify calculations. Most Internet connectionshave a lopsided traffic profile, importing more thanthey export. Normally, connections are full duplex,and the dimensioning factor is traffic flowing intothe network. When this is the case, outflowing trafficdoes not need to be considered.

2.3. Cost

The important cost factor for an ISP may be thecost of an international connection as opposed to thelocal cost of bandwidth, or it may be cost per bytetransferred out of cache. For a local sysadmin, the cru-cial cost factor may be the access line cost comparedwith the cost of local traffic exchange. Determiningwhich factor is the most important is not necessarilyeasy, as some costs are not paid at the same organiza-tional level as the cache servers are run.

The cost=benefit analysis from the DESIREproject [5] gives examples of costs for internationalversus national bandwidth.

Cd is a cost factor for delay, for example cost ofman-hours. If reduced delay on Web page downloadleads to reduced use of man-hours in a company,the gain may be calculated using the average cost ofman-hours.

Cost factors for traffic volume:

Cts D cost of traffic from sibling,CtwD cost of traffic direct from Web origin servers.

3. Proposed formula

3.1. Components

The basic components of the threshold value forbreaking even are:(1) Delay: A factor representing the delay caused by

sibling caching (compared with general delay forthe same documents).

(2) Byte gain: A factor representing the total trafficin the object domain and the gain from the sib-ling. May include calculations on ICP overheadand traffic from sibling.

(3) Cost: Cost factors representing the cost of gettingobjects directly as opposed to via the cooperatingcache, or the cost of delay.

The two first may be computed from Squid logs,the last is local to each site.

3.2. Cost formula

Suggested formula for cost when using Webproxy cache sibling:

.Ds Ł Th C Dw Ł Tm/ Ł Cd C .Tb � Hb/ Ł Ctw

C Hb Ł Cts C Ib Ł Cts:

Suggested formula for cost without sibling:

Dw Ł Tc Ł Cd C Tb Ł Ctw:

3.3. Variables (summary)

Results common for object domain:

Tc D number of object domain HTTP connections,Th D number of hits in object domain,Tm D Tc � Th, number of misses in object domain,DwD delay for object direct from Web origin server,Tb D total object domain HTTP traffic in bytes,Ib D inter-cache communication overhead, in the

case of Squid ICP traffic in bytes,Ib D 64B Ł Tc (64 bytes per connection).

Page 4: When to kill your siblings: cache mesh relation analysis

2108 I. Melve / Computer Networks and ISDN Systems 30 (1998) 2105–2111

Various cost factors:

Cts D cost of traffic from sibling (individual persibling),

CtwD cost of traffic direct from Web origin servers,Cd D cost of delay.

Results per sibling:

Ds D delay for sibling hits,HbD byte hit rate.

4. Approximations and choices

4.1. Volume (simplify)

When considering only bandwidth savings, thecost factor Cd is set to zero. The simplified result is

.Tb � Hb/ Ł Ctw C Hb Ł Cts C Ib Ł Cts with sibling;

Tb Ł Ctw without sibling:

If cost of sibling is negligible (i.e. both caches areon the same LAN), Cts may be set to zero, whichresults in

.Tb � Hb/ Ł Ctw with sibling;

Tb Ł Ctw without sibling:

Subtracting these two to find total saving yields

Hb Ł Ctw:

Byte hit rate multiplied with cost of connection ifnot hit.

4.2. Delay (simplify)

Neglecting traffic volume, and setting Cd D 1gives

Ds Ł Th C Dw Ł Tm with sibling;

Dw Ł Tc without sibling:

Subtracting these two to find savings results in

.Dw � Ds/ Ł Th:

Total number of connections multiplied with me-dian delay difference for hit and miss.

5. Squid example

5.1. COM-MESH

COM-MESH [6] must be analysed to allow usto distinguish between good and bad cache relation-ships. If we want to use asymmetrical relations, weneed to be able to compare the results with symmet-ric relations.

The simplest solution is to look at bytes savedper relation. This has been done by giving accessto Calamaris [7] output from the participants. Morecomplex questions, involving cost, are not answeredby this solution.

5.2. Cost factors

Estimate of cost for traffic volume. If we set thenational infrastructure cost at 1:ž European infrastructure cost is 162,ž cross-Atlantic infrastructure cost is 2580.

The relative figures are from SURFnet 1997 asreported in DESIRE Report on the costs and benefitsof operating caching services [5].

These cost figures shows that, in a European con-text, it makes sense to ignore national and Europeancosts when setting up a mesh for non-European Webcontent.

5.3. Non-persistent connections

We assume non-persistent connections, as used byHTTP=1.0. Even though some browsers implementKeep-Alive, Squid=1.1 does not support this.

5.4. Relation, 28.08.97

Example from COM-MESH, using an ordinaryThursday as an example of a typical day Fig. 1.

We used Calamaris to extract figures per dayper object domain (Table 1). (Thanks to Ernst Heirifor the patch.) We examine results for the *.comdomain. This date was chosen because other resultswere gathered from other Web caches which may beused for comparison at a later stage.

Page 5: When to kill your siblings: cache mesh relation analysis

I. Melve / Computer Networks and ISDN Systems 30 (1998) 2105–2111 2109

Table 1

www-cache.uninett.no cache.nic.surfnet.nl

Results common for object domainTc 78364 202267Th 32194 87788Tm 46170 114479Dw 5.25 s 9.1 sTb 867258 kB 2563825 kBIb 5015 kB 12945 kB

Various cost factorsCts 1 1Ctw 250 250Cd 0.05 NOK=s 0.06 NOK=s

Total for all siblingsDs 5.19 s 6.68 sHb 56686 kB 303856 kB

Ds,.com domain

www-cache.uninett.no 3.31 scache.nic.surfnet.nl 5.94 s

Hb,.com domain

www-cache.uninett.no 34899 kBcache.nic.surfnet.nl 9416 kB

Fig. 1. COM-MESH, relations August 1997.

5.5. www-cache.uninett.no 28.08.97

Fig. 2 shows the gains experienced by www-cache.uninett.no in the week of 28.08.97 given thecost factors from Section 5.2. The sibling locatedin the US cost more than it gives back in terms

of traffic. The delay is not analyzed in this graph.The first question is if there is any overall benefit

from the sibling relations of www-cache.uninett.no.Computing the byte cost with siblings (using Ib

as ICP size multiplied with the number of siblings)gives a byte cost of .Tb�Hb/ŁCtwCHbŁCtsC IbŁCts

with sibling compared to a byte cost of Tb Ł Ctw

without sibling. Assume that all siblings have thesame cost factors (this is not entirely correct asone sibling is located in the US and the restin Europe). The cost gain from the siblings is14074694=216814500 D 6:5% The extra byte costintroduced by the siblings results in a gain for valuesof Ctw > 1:7 Ł Cts.

If the cost factors are Ctw D 10, Cts D 1, thebyte gain is significantly lower, around 0.2%. Thecomplexity of the configuration does not justify sucha small gain, but this is a choice that the cachemanager needs to make.

Cost of delay for the overall configuration isdetermined by the formula .Ds Ł ThC Dw Ł Tm/ ŁCd,whereas the delay without sibling may be estimatedby Dw Ł Tc. The savings are 1933 ks, which amountsto some 96650 NOK.

Page 6: When to kill your siblings: cache mesh relation analysis

2110 I. Melve / Computer Networks and ISDN Systems 30 (1998) 2105–2111

Fig. 2. Sibling cost=gain as seen from UNINETT, byte gain.

Proceeding to look at the results per sibling, us-ing the sibling cache.nic.surfnet.nl For values ofCtw > 1:53 Ł Cts there is a gain for byte cost forwww-cache.uninett.no.

Analyzing relation of cache.nic.surfnet.nl withwww-cache.uninett.no gives a reduction in delay andcost. For values of Ctw > 1:37 Ł Cts there is a gainin terms of byte cost. There is a higher gain for thisrelation than for the opposite relation, the advantageis not symmetrical, as shown in the two precedingparagraphs.

6. Conclusion

The framework for evaluating inter-cache com-munication consists of cost factors for the individualcache, together with delays and traffic flow measuredin bytes. By assigning cost factors, it is possible tofind one single measurement that gives the net gaincreated by a relationship with another cache.

This single measurement may be use in the opera-tion of a cache, to determine cache configuration andrelations with other caches in a mesh.

7. Further work

More work is needed on all relationships involvedin COM-MESH in order to properly evaluate thegain from such a large scale mesh. All relationshipsfor a given day should be calculated, if the data isavailable.

Much work remains in testing various time scalesfor evaluating cache relations. It remains to be seenif the best approach is to evaluate during peak hours,or day by day.

Calculations need to be tested in real time, tosee if this framework is applicable to real time ‘bestsibling’ selection mechanisms. This may be one stepon the road to mesh auto configuration.

There is still work to be done to evaluate whethercontinuous calculation is feasible. If calculations arecomplex and require knowledge available outside thenormal logging procedures, this may prove difficult.Any process for evaluation should not add significantload to the cache.

Comparing Web cache systems and the efficiencyof inter-cache communication protocols is anotherissue that remains to be studied.

Page 7: When to kill your siblings: cache mesh relation analysis

I. Melve / Computer Networks and ISDN Systems 30 (1998) 2105–2111 2111

8. Glossary

proxyAn intermediary program which acts as both aserver and a client for the purpose of making re-quests on behalf of other clients. Proxies are oftenused in firewalls and as gateways for handling re-quests via protocols not implemented by the useragent.

cacheA cache stores cacheable responses in order toreduce the response time and network bandwidthconsumption on future, equivalent requests. Anyclient, server or proxy may include a cache.

siblingCaching server participating in caching mesh,sends=receives requests to other cache servers.

parentNeighbor=sibling caching server through whichmisses are resolved.

relationRelation is used for the object exchange betweentwo caches.

object domainAll the objects that the cache asks a sibling for.

hit rateWeb object served from cache, as a proportion oftotal number of requests.

latencyTime before a request is served.

Acknowledgements

This work is based on the work done by TF-CACHE in the COM-MESH experiment. Ernst Heiriextended Calamaris and collected logs for the anal-

ysis, and Pal Løberg helped with the final analysis.Gerhard Winkler and Andreas Papst presented theproblems and questions that lay behind this presen-tation.

References

[1] Squid, Web proxy cache software from NLANR, http://squid.nlanr.net/

[2] I. Melve, L. Slettjord, H. Bekker, T. Verschuren, Web cachingarchitecture, DESIRE report, March 1997, http://www.uninett.no/prosjekt/desire/arneberg/

[3] A. Papst, G. Winkler, ACOnet, Automatic sibling configura-tion, Squid, work in progress (unpublished), http://www.aco.net/TF/

[4] R. Caceres, F. Douglis, A. Feldmann, G. Glass, M. Rabi-novich, Web proxy caching: the Devil is in the details, in:Proc. ACM SIGMETRICS Workshop on Internet Server Per-formance, June 1998, http://www.research.att.com/-ramon/papers/wisp98.ps.gz

[5] A. Jong, H. Bekker, T. Verschuren, I. Melve, Cost/benefitanalysis, DESIRE report, November 1997, http://www.surfnet.nl/surfnet/projects/desire/deliver/WP4/D4-2.html

[6] COM-MESH, Experiment in TF-CACHE, meshing nationalacademic networks in Europe, http://www.terena.nl/tech/projects/choc/com-mesh.html

[7] C. Beermann, Calamaris, Squid log analyzer, http://www.detmold.netsurf.de / homepages / cord / tools / squid / Welcome.html.en

Ingrid Melve is the CommunicationService Manager of UNINETT, theNorwegian academic network. Sheholds a MSc in electronic engineer-ing from the Norwegian Institute ofTechnology. Her work focuses onelectronic information systems andtheir interactions with network tech-nology. Web caching is her primaryinterest, she is the convener of TF-CACHE, TERENA’s task force forWeb caching.