Summary Cache: A Scalable Wide-Area Web Cache Sharing Protocol
description
Transcript of Summary Cache: A Scalable Wide-Area Web Cache Sharing Protocol
![Page 1: Summary Cache: A Scalable Wide-Area Web Cache Sharing Protocol](https://reader036.fdocuments.us/reader036/viewer/2022062309/56815177550346895dbfb181/html5/thumbnails/1.jpg)
Summary Cache: A Scalable Wide-Area Web Cache Sharing
ProtocolBy Abuzafor Rasal and Vinoth
Rayappan
![Page 2: Summary Cache: A Scalable Wide-Area Web Cache Sharing Protocol](https://reader036.fdocuments.us/reader036/viewer/2022062309/56815177550346895dbfb181/html5/thumbnails/2.jpg)
Web caching1
2
HTTP request
HTTP response
1
1
1
2
2
2
2
1
Client1
Client2
CacheServer
Client3
![Page 3: Summary Cache: A Scalable Wide-Area Web Cache Sharing Protocol](https://reader036.fdocuments.us/reader036/viewer/2022062309/56815177550346895dbfb181/html5/thumbnails/3.jpg)
Web Cache Sharing
Proxy Caches
Users
Regional Network
Rest of Internet
Bottleneck
. . . . . .
![Page 4: Summary Cache: A Scalable Wide-Area Web Cache Sharing Protocol](https://reader036.fdocuments.us/reader036/viewer/2022062309/56815177550346895dbfb181/html5/thumbnails/4.jpg)
Web Cache Sharing: Internet Cache Protocol (ICP)
• Internet Cache Protocol is currently implemented technique of web cache sharing
• Internet Cache Protocol = the proxy multicasts a query message to all other proxies whenever a cache miss occurs.
![Page 5: Summary Cache: A Scalable Wide-Area Web Cache Sharing Protocol](https://reader036.fdocuments.us/reader036/viewer/2022062309/56815177550346895dbfb181/html5/thumbnails/5.jpg)
Internet Cache Protocol
ClientProxy
Cache
Proxy
Cache Proxy
Cache
Proxy
Cache
InternetInternet
![Page 6: Summary Cache: A Scalable Wide-Area Web Cache Sharing Protocol](https://reader036.fdocuments.us/reader036/viewer/2022062309/56815177550346895dbfb181/html5/thumbnails/6.jpg)
Proxy
HTTP
INTERNET
Proxy Proxy …
Client 1 Client 2 Client n…..
1 2 N
First request: document is available in local proxy.
HTTPHIT
Internet Cache Protocol
![Page 7: Summary Cache: A Scalable Wide-Area Web Cache Sharing Protocol](https://reader036.fdocuments.us/reader036/viewer/2022062309/56815177550346895dbfb181/html5/thumbnails/7.jpg)
Proxy
HTTP
INTERNET
Proxy Proxy…
Client 1 Client 2 Client n…..
1 2 N
HTTP
ICP
Internet Cache Protocol
Second Request: document is not available in local proxy.
![Page 8: Summary Cache: A Scalable Wide-Area Web Cache Sharing Protocol](https://reader036.fdocuments.us/reader036/viewer/2022062309/56815177550346895dbfb181/html5/thumbnails/8.jpg)
Problem of ICP
• As the number of collaborating proxies increase the overhead dramatically increases, thus not scalable. – A proxy multicasts a query message to all
other proxies whenever a cache miss occurs
![Page 9: Summary Cache: A Scalable Wide-Area Web Cache Sharing Protocol](https://reader036.fdocuments.us/reader036/viewer/2022062309/56815177550346895dbfb181/html5/thumbnails/9.jpg)
• UDP = ICP query and replay messages
• TCP = HTTP traffic between proxies, servers, and clients
• Total Packets or IP = UDP + TCP
Problem of ICP
![Page 10: Summary Cache: A Scalable Wide-Area Web Cache Sharing Protocol](https://reader036.fdocuments.us/reader036/viewer/2022062309/56815177550346895dbfb181/html5/thumbnails/10.jpg)
Problem of ICP
Client Latency
2.75
3.072.85
2.42.62.8
33.2
No ICP ICP Overhead SC-ICPOverhead
UDP Msgs
615
54774
10790
20000
40000
60000
No ICP ICP Overhead SC-ICPOverhead
TCP msg
334000
328000330000
324000326000328000330000332000334000336000
No ICP ICP Overhead SC-ICPOverhead
Total Packets
355000
402000
351000
320000340000360000380000400000420000
No ICP ICP Overhead SC-ICPOverhead
+
= ;
![Page 11: Summary Cache: A Scalable Wide-Area Web Cache Sharing Protocol](https://reader036.fdocuments.us/reader036/viewer/2022062309/56815177550346895dbfb181/html5/thumbnails/11.jpg)
Summary Cache
• Each proxy maintains a Bloom Filter (data in compressed form) representing its local cache.
• Also, it holds Bloom Filters representing caches of other proxies.
• Updates to Bloom Filters are exchanged periodically or after a certain percentage of the documents in the cache was replaced.
• Request is sent only to proxy who most likely holds the requested document.
![Page 12: Summary Cache: A Scalable Wide-Area Web Cache Sharing Protocol](https://reader036.fdocuments.us/reader036/viewer/2022062309/56815177550346895dbfb181/html5/thumbnails/12.jpg)
Summary Cache
Client
InternetInternetProxy
Cache
Proxy
Cache
Proxy
Cache
Proxy
Cache
First request: document is in other proxy
![Page 13: Summary Cache: A Scalable Wide-Area Web Cache Sharing Protocol](https://reader036.fdocuments.us/reader036/viewer/2022062309/56815177550346895dbfb181/html5/thumbnails/13.jpg)
Summary Cache
Client
InternetInternetProxy
Cache
Proxy
Cache
Proxy
Cache
Proxy
Cache
Second request: the document is not in any proxy
![Page 14: Summary Cache: A Scalable Wide-Area Web Cache Sharing Protocol](https://reader036.fdocuments.us/reader036/viewer/2022062309/56815177550346895dbfb181/html5/thumbnails/14.jpg)
Summary Cache
Client
InternetInternetProxy
Cache
Proxy
Cache
Proxy
Cache
Proxy
Cache
Third request: summary gives false hit
![Page 15: Summary Cache: A Scalable Wide-Area Web Cache Sharing Protocol](https://reader036.fdocuments.us/reader036/viewer/2022062309/56815177550346895dbfb181/html5/thumbnails/15.jpg)
Summary Cache
• Two Parameter to design of Summary Cache protocol:– The frequency of summary updates. (inter-proxy traffic,
overhead)– The representation of summary (memory).
• Above Solution:– Delay update summaries until a fixed percentage i.e. 1% of the
cached documents are new. • Positive: Reduce overhead (traffic)• Negative: Introduce “false miss” error
– Store summaries as a “Bloom Filter”. This is efficient hash-based probabilistic scheme that represent URLs of cached document.
• Positive: Reduce memory requirement• Negative: Introduce “false hit” error
![Page 16: Summary Cache: A Scalable Wide-Area Web Cache Sharing Protocol](https://reader036.fdocuments.us/reader036/viewer/2022062309/56815177550346895dbfb181/html5/thumbnails/16.jpg)
Summary Cache• false misses:
– Definition : • the document requested is cached at some other proxy but its summary
does not reflect the fact.
– Effect: • In this case, a remote cache hit is lost, and the total hit ratio within the
collection of caches is reduced.
– Improvement: • can be eliminated/improved with higher frequency of update
• false hits: – Definition:
• the document requested is not cached at some other proxy but its summary indicates that it is. The proxy will send a query message to the other proxy, only to be noticed that the document is not cached there.
– Effect:• In this case, a query message is wasted.
– Improvement: • can be eliminated/improved by increasing the vector size of Bloom Filter or
increase memory size of representation
![Page 17: Summary Cache: A Scalable Wide-Area Web Cache Sharing Protocol](https://reader036.fdocuments.us/reader036/viewer/2022062309/56815177550346895dbfb181/html5/thumbnails/17.jpg)
Summary Cache
• Remote Stale Hits: document is cached at another proxy but the cached copy is stale. (Not because of update delay)
– Delta compression can be used to transfer the new document. Delta compression transfers only the difference between the old and the new document instead of downloading the whole document.
![Page 18: Summary Cache: A Scalable Wide-Area Web Cache Sharing Protocol](https://reader036.fdocuments.us/reader036/viewer/2022062309/56815177550346895dbfb181/html5/thumbnails/18.jpg)
Summary Cache
• Two factors limit the scalability:– The network overhead, the inter-proxies
communication. • Determined by update frequency, false hits and
remote hits
– Memory required to store the summaries. • Determined by size of individual summary and # of
proxies.
![Page 19: Summary Cache: A Scalable Wide-Area Web Cache Sharing Protocol](https://reader036.fdocuments.us/reader036/viewer/2022062309/56815177550346895dbfb181/html5/thumbnails/19.jpg)
ICP = Hit ratio when no update delay is introduced
exact_dir = Hit ratio with update delay introduced
false_hit = No delay – delay = ICP – exact_dir
stale-hit = Remote stale hit due to the document is stale (out dated) but not reflected in summary
Impact of Update Delay: Explanation of the Graph
![Page 20: Summary Cache: A Scalable Wide-Area Web Cache Sharing Protocol](https://reader036.fdocuments.us/reader036/viewer/2022062309/56815177550346895dbfb181/html5/thumbnails/20.jpg)
exact_dir = hit ratio decrease linearly as threshold increases.
stale-hit = not effected by threshold because stale-hit error exist for both ICP and Summary Cache.
False-hit = increases as threshold increases because deleted document in cache may still be show present in summary.
Impact of Update Delay: Observation of the Graph
![Page 21: Summary Cache: A Scalable Wide-Area Web Cache Sharing Protocol](https://reader036.fdocuments.us/reader036/viewer/2022062309/56815177550346895dbfb181/html5/thumbnails/21.jpg)
Summary Representations
• Summary Representation = how to store the summaries in proxies.
• Summary needs to be stored in DRAM (main memory) – Disk arms become bottlenecks in proxy cache– DRAM price continues to drop – DRAM is faster
![Page 22: Summary Cache: A Scalable Wide-Area Web Cache Sharing Protocol](https://reader036.fdocuments.us/reader036/viewer/2022062309/56815177550346895dbfb181/html5/thumbnails/22.jpg)
Summary Representations: Naïve approach
• Exact-directory = the summary is essentially the list of URLs of cached documents, with each URL represented by its 16-byts MD5 signature. – Positive: Less errors– Negative: Consumes too much memory
• Server-name = web server names in the URLs of cached documents. – Positive: Cut down memory requirement by a factor of
10 but introduces errors – Negative: Generate too many false hit thus increase
network traffic
![Page 23: Summary Cache: A Scalable Wide-Area Web Cache Sharing Protocol](https://reader036.fdocuments.us/reader036/viewer/2022062309/56815177550346895dbfb181/html5/thumbnails/23.jpg)
Summary Representations: Bloom Filters
• Process– Step 1: Take each URL as an input to four
different hash functions. – Step 2: Take each output of hash function (32
bits) and convert to 1 bit. – Step 3: Store 4 bits from four different hash
functions and stores into a vector.
• Positive: Consumes much less memory • Negative: Introduce insignificant errors
![Page 24: Summary Cache: A Scalable Wide-Area Web Cache Sharing Protocol](https://reader036.fdocuments.us/reader036/viewer/2022062309/56815177550346895dbfb181/html5/thumbnails/24.jpg)
Summary Representations
• Server name produces too much traffic in network because request is send to any proxies that has server name.
![Page 25: Summary Cache: A Scalable Wide-Area Web Cache Sharing Protocol](https://reader036.fdocuments.us/reader036/viewer/2022062309/56815177550346895dbfb181/html5/thumbnails/25.jpg)
Bloom filter
• Bloom filter is type technique used for compression of memory space( To avoid false hit)
• Summary cache : uses the bloom technique to do compression
• A method of representing a set of “A” of n elements to support
the membership queries.
•It is a mechanism for identifying which pages have associated comments stored with in common knowledge server
![Page 26: Summary Cache: A Scalable Wide-Area Web Cache Sharing Protocol](https://reader036.fdocuments.us/reader036/viewer/2022062309/56815177550346895dbfb181/html5/thumbnails/26.jpg)
Problem?
• Place A place B
cnn.com/index.html
wayne.edu/
CompactRepresentation
arbitrary URI
? Bloom
![Page 27: Summary Cache: A Scalable Wide-Area Web Cache Sharing Protocol](https://reader036.fdocuments.us/reader036/viewer/2022062309/56815177550346895dbfb181/html5/thumbnails/27.jpg)
How the bloom works?• Pick a large bit array with all ‘0’s• Pick # of independent hash function , in this
case we have four(4)• Every URL in the bag (Proxy summary cache) ,
you apply the four hash function, and we will be getting four integers.
• Use the four integers in to the bit array• Turn all the bits to 1• Repeat this to all URL in Proxy summary cache• The above is the Encryption process. • Repeat above steps in reverse for decrypting.
![Page 28: Summary Cache: A Scalable Wide-Area Web Cache Sharing Protocol](https://reader036.fdocuments.us/reader036/viewer/2022062309/56815177550346895dbfb181/html5/thumbnails/28.jpg)
How does hash works?Hash functionHash function turns data into a relatively turns data into a relatively small number that may serve as a digital small number that may serve as a digital "fingerprint" of the data."fingerprint" of the data.
![Page 29: Summary Cache: A Scalable Wide-Area Web Cache Sharing Protocol](https://reader036.fdocuments.us/reader036/viewer/2022062309/56815177550346895dbfb181/html5/thumbnails/29.jpg)
Bloom filter
A hashing technique m bit k independent hashing function many to one mapping
“false positive
![Page 30: Summary Cache: A Scalable Wide-Area Web Cache Sharing Protocol](https://reader036.fdocuments.us/reader036/viewer/2022062309/56815177550346895dbfb181/html5/thumbnails/30.jpg)
Bloom filter
• False positive - Given the query to b, we check bits at position
h1(b), h2(b)…..,hk(b)..if any of them is 0, b is not in the set of A.
- Other wise we know b is in a set A, although there is a certain probability that we are wrong.
• If fall positive increases number of access will go up, but when the fall negative increase , probability of getting wrong doc will go up.
• The salient feature of Bloom is there is a trade of between memory size(array) and false positive.
![Page 31: Summary Cache: A Scalable Wide-Area Web Cache Sharing Protocol](https://reader036.fdocuments.us/reader036/viewer/2022062309/56815177550346895dbfb181/html5/thumbnails/31.jpg)
Probability of false positive
upper graph: for 4 hash functions
lower graph: optimal integral number of hash functions(5 hash function)
![Page 32: Summary Cache: A Scalable Wide-Area Web Cache Sharing Protocol](https://reader036.fdocuments.us/reader036/viewer/2022062309/56815177550346895dbfb181/html5/thumbnails/32.jpg)
Bloom filter as summaries
• Provides straight forward mechanism to built summaries
• Proxy build bloom from the URL of cached docs
• Thus increasing the memory can decrease
flase positive and other wise
• provides the clear trade between the above two
![Page 33: Summary Cache: A Scalable Wide-Area Web Cache Sharing Protocol](https://reader036.fdocuments.us/reader036/viewer/2022062309/56815177550346895dbfb181/html5/thumbnails/33.jpg)
How the hash function built?
32 bit hash 32 bit hash 32 bit hash
101101110101010111100 …… 010111
www.abc.com
32 bit hash
MD5
128 bit
![Page 34: Summary Cache: A Scalable Wide-Area Web Cache Sharing Protocol](https://reader036.fdocuments.us/reader036/viewer/2022062309/56815177550346895dbfb181/html5/thumbnails/34.jpg)
Hit ratio
![Page 35: Summary Cache: A Scalable Wide-Area Web Cache Sharing Protocol](https://reader036.fdocuments.us/reader036/viewer/2022062309/56815177550346895dbfb181/html5/thumbnails/35.jpg)
Obeservations of the cache hit ratio
• Exact_dir and bloom filter_8, _16,_32 is have virtually the same hit ratio compared to server name.
• Exact_dir will give same hit as bloom, but it will consume more memory to store all the informations of URL.
• Incase of Bloom filter_8_16_32,it will consume less memory than exact_dir, because of hash function.
![Page 36: Summary Cache: A Scalable Wide-Area Web Cache Sharing Protocol](https://reader036.fdocuments.us/reader036/viewer/2022062309/56815177550346895dbfb181/html5/thumbnails/36.jpg)
False hit ratio under different summary representations
![Page 37: Summary Cache: A Scalable Wide-Area Web Cache Sharing Protocol](https://reader036.fdocuments.us/reader036/viewer/2022062309/56815177550346895dbfb181/html5/thumbnails/37.jpg)
Observation of false hit (miss) ratio
• Server name has a much higher false hit (miss) ratio. Why?
• Because it just got the server name and don’t have a specific address of the requested URL.
• So the request will be sent to all other proxies, but the hit will be in any of the one proxy and obviously false hit is high.
• Exact_dir will have less false hit ratio compared to all (but it does need large cache size (memory).
![Page 38: Summary Cache: A Scalable Wide-Area Web Cache Sharing Protocol](https://reader036.fdocuments.us/reader036/viewer/2022062309/56815177550346895dbfb181/html5/thumbnails/38.jpg)
Message per request
![Page 39: Summary Cache: A Scalable Wide-Area Web Cache Sharing Protocol](https://reader036.fdocuments.us/reader036/viewer/2022062309/56815177550346895dbfb181/html5/thumbnails/39.jpg)
Observations on Msg/request
• We included ICP in for a comparative study.• In case of ICP( With out the summary cache) the
request will sent to all proxy to find the requested URL. So obviously messages/client request will be high compared to others.
• In the other extreme the bloom_8_16_32 and exact_dir will spend much less msg/client request to find the URL. It is good and economical to go with.
• Server name will be in the mid the above, because it got more false hit (miss). So higher the msg/client request.
![Page 40: Summary Cache: A Scalable Wide-Area Web Cache Sharing Protocol](https://reader036.fdocuments.us/reader036/viewer/2022062309/56815177550346895dbfb181/html5/thumbnails/40.jpg)
Bytes of Msg size per request
![Page 41: Summary Cache: A Scalable Wide-Area Web Cache Sharing Protocol](https://reader036.fdocuments.us/reader036/viewer/2022062309/56815177550346895dbfb181/html5/thumbnails/41.jpg)
Observations on size of inter network msg in bytes
• We are considering this issue because, update messages is of higher size than the query messages.
• So, Summary caches uses the occasional burst of large messages in between the small query messages. So it reduces CPU overhead and network interface packet (Results are table 2 and 4) significantly
For query messages
Header size Average URL
ICP and others 20 50
For Summary updates
Header size Bytes/Change
Exact directory 20 16
Server name 20 16
Bloom filter based Summaries 32 4
![Page 42: Summary Cache: A Scalable Wide-Area Web Cache Sharing Protocol](https://reader036.fdocuments.us/reader036/viewer/2022062309/56815177550346895dbfb181/html5/thumbnails/42.jpg)
Memory requirments in terms of % of Proxy cache: NLANR 4 proxies
0.00%0.10%0.20%0.30%0.40%0.50%0.60%0.70%
% of memory size
1
Approach
Storage requirment in terms of Proxy cache size for trace NLANR
Series1
Series2
Series3
Series4
Series5
Exact_dir
server_na Bloom_8 Bloom_16
Bloom_32
![Page 43: Summary Cache: A Scalable Wide-Area Web Cache Sharing Protocol](https://reader036.fdocuments.us/reader036/viewer/2022062309/56815177550346895dbfb181/html5/thumbnails/43.jpg)
Memory requirments in terms of % of Proxy cache: DEC 16 proxies
0.00%
0.50%1.00%
1.50%
2.00%2.50%
3.00%
% of memory size
1
Approach
Storage requirements in terms of proxy cache for the traces DEC
Series1
Series2
Series3
Series4
Series5
Exact_dir
Server_n Bloom_8
Bloom_16
Bloom_32
![Page 44: Summary Cache: A Scalable Wide-Area Web Cache Sharing Protocol](https://reader036.fdocuments.us/reader036/viewer/2022062309/56815177550346895dbfb181/html5/thumbnails/44.jpg)
Summary
• Web caching is an active research area.• Directory server: Approach uses the a central server to
keep track of the cache directories of all the proxies query the server for the cache hits in other proxies
• The above approach is failed because being a centralized server the network overhead will be high because of serving the all request.
• To over come the above we got a summary cache enabled ICP web-cache sharing protocol.
• Our inspection of the Quesnet traces showed that the chid to parent ICP queries can be a significant portion of the messages that the parent proxy has to process. So in this case applying the summary cache will significantly reduce the # of queries and overhead.
![Page 45: Summary Cache: A Scalable Wide-Area Web Cache Sharing Protocol](https://reader036.fdocuments.us/reader036/viewer/2022062309/56815177550346895dbfb181/html5/thumbnails/45.jpg)
Future work
• Plan to investigate the impact of the protocol on the parent – child proxy cooperation and the optimal hierarchy configuration for a given work load
• Plan to investigate the application of summary cache in various web-cache consistency protocol
• Plan to design new method for summary cache implementation in proxy to speed up the look up.
![Page 46: Summary Cache: A Scalable Wide-Area Web Cache Sharing Protocol](https://reader036.fdocuments.us/reader036/viewer/2022062309/56815177550346895dbfb181/html5/thumbnails/46.jpg)
Conclusion• We proposed the summary-cache enhanced ICP, a scalable
world wide web cache sharing protocol and proved it is the best to go with compared all other techniques.
• Our study has two key concepts effects of delayed updates of summary cache, and the representation of summary.
• Solution to first is, we can delayed the updates1 % to 10 % (Proved based on trace driven simulation) and it will cause errors but it is bearable.
• Solution to second problem, we introduced bloom filter technique for representation of summary cache.
• We achieve over 50 % reduction in bandwidth, and reduces the inter-proxy communication messages by a factor of 25 to 60.