Hybrid Prefetching for WWW Proxy Servers
description
Transcript of Hybrid Prefetching for WWW Proxy Servers
Hybrid Prefetching for WWW Proxy Servers
Yui-Wen Horng , Wen-Jou Lin , Hsing MeiDepartment of Computer Science and Information
Engineering
Fu Jen Catholic University, Taiwan, R.O.C
International Conference on Parallel and Distributed Systems,1998
Mikt Tien
Syslab Yan Zen
Outline
1.Introduction 2.Related work 3.Prefetching Mechanism 4.Experiment Result 5.Conclusion and Future Work
1.Introduction Depend on the location of cache,We can classify
cache into three types: client cache,server cache,proxy cache
Some studies show that, the maximum possible hit rate of a proxy cache is about 30%-50%.To overcome prefetch is clear solution
So we classify prefetcher into three types: client prefetcher,server prefetcher,proxy prefetcher
Client Prefetcher can analyze personal requests to predict future request, proxy prefetcher can gather information from multi-client to multi-server.
2.Related Work
Interactive Prefetching proxy Server(Wcol) (Content Parsing)
-- To get linked documents by parsing HTML pages(include images). -- advantage: Hit rate of the cache is more than 60% -- disadvantage: the traffic is 4.12 times larger than a normal caching proxy and task to parse HTML also adds overhead to the server..
Related Work(cont.)
Top-10 Approach -- Requires cooperation between web server,proxy and client browser. The higher level servers know the popular documents to their lower level clients. -- advantage: Hit rate more than 40% and increase traffic is no more than 10% in most case. -- disadvantage: In order to achieve good prediction, every proxies and servers need to follow the same policy. That is the major problem in implementation.
Related Work(cont.)
Predictive Prefetching
-- The prefetcher install in client, but
communicates to a prediction engine ehich is
part of web server. This engine tracks client
request sequences and builds a dependency
graph which contains probability information,the
prefetcher can prefetch files with high probability.
-- disadvantage: Requires specially designed
protocol or modification to HTTP.
Related Work(cont.)
Prefetching Files System for WWW Servers -- It utilizes “referer” information contains in HTTP request message to build access probability graph. “Referer” is a header in HTTP request message, it indicates that the requested URL is linked from which URL. -- advantage: the response time can be reduced more than 20%. -- disadvantage: Not all requests contain this information and it takes time to accumulate enough data to build the graph.
Related Work(cont.)
Our approach
-- Hybrid prefetcher that both parse HTML and build
access probability graph. To make more intelligent
prefetching, both access popularity and probability
are considered.
3.Prefetching Mechanism
3.1 Problem 1:How to find more documents that may be requested in the
near future? Prefetch by Parsing HTML -- It does not need information from past request
history and can find related URLs even the request
URL was never retrieved before.
-- But ,it increase overhead of server,and increase
the traffic
3.1 Problem 1:How to find more documents that may be requested in the
near future?(cont.) Prefetch by Referer -- Building “Referer link graph”
-- The accumulated weight value of each node and edge can
also be used to calculate access probability which is useful
for prefetching.
-- disad: Maintain the graph increase memory overhead and not
all requests contain referer information.
3.1 Problem 1:How to find more documents that may be requested in the
near future?(cont.) Hybrid Prefetch -- If referer exist ,use referer to build “referer link
graph” ,else pasing the HTML file to build the link
graph.
-- The HTML files require parsing are less than first
approach, so the CPU overhead is smaller.
3.1 Problem 1:How to find more documents that may be requested in the
near future?(cont.) Prefetch by Directory -- Assumption: related documents are usually put in
the same directory in the web server.
-- If the directory structure of the web site does not
agree with our assumption, the ratio of successful
prefetchinf may be low.
3.2 Problem 2: How to increase the ratio of prefetched documents that are actually
be requested? Popularity Constraint -- Building a table to track popularity of each
requested document.The table is updated when
new requested is coming. Probability Constraint --
3.2 Problem 2: How to increase the ratio of prefetched documents that are actually
be requested?(cont.) Combined Constraint -- Combination of both constraints by “OR” them.
That is ,prefetch a document if it can pass either
constraint.
4.Experiment Results
Experiment A
Experiment B-Popularity Constraint(threshold)
prefetch level=2 , cache size =10MB
Experiment B—Probability Constraint
5.Conclusion and Future Work
Hybrid prefetching technique, which is effective to imprpove hit rate of cache proxy and the accuracy of prediction is higher than other methods.
It can accomplish more than 70% cache hit rate and the increased traffic rate is below 40%.
Our experiments also show that separated caches is better than one common cache if total size is small.