Storage Allocation in Prefetching Techniques of Web Caches D. Zeng, F. Wang, S. Ram Appeared in...
-
Upload
myles-greene -
Category
Documents
-
view
212 -
download
0
Transcript of Storage Allocation in Prefetching Techniques of Web Caches D. Zeng, F. Wang, S. Ram Appeared in...
Storage Allocation in Prefetching Techniques of Web Caches
D. Zeng, F. Wang, S. Ram
Appeared in proceedings of ACM conference in Electronic commerce (EC’03) San Diego June 9-12, 2003
Presented by Laura D. Goadrich
The Web Large-scale distributed information
system where data Objects are published and accessible by users
Problems caused by the demand of increased web capacity: Network traffic congestion Web server overloads
Solution: web caching
Web caching: Benefits:
Improves web performance (reduces access latency) Increases web capacity Alleviate traffic congestion (reducing network bandwidth
consumption) Reducing number of client requests (workload) Possibly improve failure tolerance and robustness of Web
(maintaining cached copies of web objects for unreachable networks)
Prefetching: Anticipate users’ future needs This research:
Focuses on making cache-related storage capacity decisions (storage capacity limits the number of prefetched web objects)
Therefore allocate cache storage in prefetching The authors state this focus has not been researched**
Ideas: Current research:
Predict user web accesses without considering cache storage limit
This research: optimization based models Maximize hit rate Maximize byte hit rate Minimize access latency(first 2 are primary goals of web caching:
maximize)
Benefit of this research: guide the operations of a prefetching system
Web prefetching techniques Client-initiated policies
User A is likely to access URL U2 right after URL U1 Patterns learned via Markov algorithms
Server-initiated policies Anticipate future requests based on server logs and
proactively send the corresponding Web objects to participating cache servers or client browsers
Top-n algorithm Hybrid policies
Combine user access patterns from clients and general statistics from servers to improve the quality of prediction
Failing of policies: how to make decisions of which Web objects to prefetch considering storage capacity
Assumptions/Notation
C maximum amount of storage space is available to store prefetched Web objects
i URL of potential interest
Pi Predicted probability with which URL i will be visited
(i, Pi) Prediction of users’ future accesses
N Set of all URLs of potential interest
Si э (Si<C)
Size of each Web object referred to by i
Hit Rate (HR) Model
NiX
CXS
XPZ
i
Niii
NiiiHR
1,0
max (1)
(2)
(3)
Byte Hit Rate (BHR) Model
NiX
CXS
XSPSP
Z
i
Niii
Niiii
Niii
BHR
1,0
1max (4)
(2)
(3)
Byte Hit Rate (BHR) Model
NiX
CXS
XSPSP
Z
i
Niii
Niiiiii
Niiiii
AL
1,0
)()(
1max
(7)
(2)
(3)
αi # of seconds to establish the network connection between the client machine and the Web server hosting i
βi # of seconds per byte to transmit i over the network
Transforming HR, BHR & AL into the Knapsack problem Benefits of Knapsack problem
Well studied “easiest” NP-hard problem Can solve optimally by a pseudo-polynomial
algorithm based on dynamic programming A fully polynomial approximation is possible
Focus on greedy algorithm (due to paper length limits)
Greedy Algorithm:
1. Sort all URLs into a sequence
2. Determine a threshold k defined as:
3. Prefetch Web objects referred to by URLs
N
N
i
i
i
i
i
i
N S
P
S
P
S
Piii
2
2
1
1,,, 21
CNjkji
ii 1
:,,2,1max
kiii ,,, 21 otherwiseX
iiiiifX
i
ki
,0
}),,,{(,1 21
Other Allocation Policies Tested
Optimal policy using CPLEX Disadvantages
Complex Increased implementation time Difficult to implement
Top-n Developed for Web usage prediction Used to regulate storage allocations by
appropriately setting n Equivalent to Greedy BHR relying only on Pi
Simulations
Small Large
|N| 50 200
rep Text multimedia
C 100,000 100,000α/ β 5,000 (slow) 30,000 (fast)
LN(μ,σ)= lognormal distribution with mean eμ and shape σ
a.
b.
Performance Comparison
Experimental Condition
Hit Rate Byte Hit Rate % Savings in Access Latency
Opt
G-HR
Top-n Opt
G-HR
Top-n
Opt
G-HR
Top-n
a=50, LN(10,.05), b=5000 .47 .45 .44 .45 .44 .44 .45 .44 .44
a=50 , LN(10,.05) ), b=30000 .47 .45 .44 .45 .44 .44 .46 .44 .44
a=50 , LN(10,1) ), b=5000 .47 .44 .35 .32 .28 .28 .33 .29 .29
a=50 , LN(10,1) ), b=30000 .47 .44 .35 .32 .28 .28 .38 .34 .32a=200 ,
LN(10,.05) ), b=5000
.36 .34 .33 .34 .33 .33 .34 .33 .33
a=200 , LN(10,.05) ),
b=30000
.36 .34 .33 .34 .33 .33 .35 .34 .33
a=200 , LN(10,1) ), b=5000 .36 .34 .25 .20 .17 .17 .22 .18 .18
a=200 , LN(10,1) ), b=30000 .36 .34 .25 .20 .17 .17 .27 .23 .21
Results Greedy algorithms and Top-n in general
achieve reasonable performance Greedy algorithms outperform Top-n with
respect to hit rate and access latency There exists a relatively large
performance gap between an optimal approach and fast heuristic methods when Web objects vary greatly in size Suggests the need for developing more
sophisticated allocation policies such as a dynamic programming-based approach
Contributions: Focus: stress importance of effective
storage allocation in prefetching
Paper contributions:1. Provide new formulations for prefetching
storage allocation2. Create computationally efficient allocation
policies based on storage allocations solved by the knapsack problem
3. Models created lead to more precise understanding of the applicability and effectiveness of Top-n policy
Future Work
Trace-based simulation Actual web access logs More realistic environment
Modeling Integrate allocation models with
caching storage management modelsi.e. Cache replacement
Changes- Recommendations
Not renaming the same constraints More resources (5 articles, 2
books) Discuss feasible solve times (opt) Test/Hypothesize implementation
strategies for real application