Investigating Distributed Caching Mechanisms for Hadoop Gurmeet Singh Puneet Chandra Rashid Tahir.
-
Upload
brayan-drury -
Category
Documents
-
view
218 -
download
1
Transcript of Investigating Distributed Caching Mechanisms for Hadoop Gurmeet Singh Puneet Chandra Rashid Tahir.
![Page 1: Investigating Distributed Caching Mechanisms for Hadoop Gurmeet Singh Puneet Chandra Rashid Tahir.](https://reader035.fdocuments.us/reader035/viewer/2022062417/5519ca99550346443e8b4822/html5/thumbnails/1.jpg)
Investigating Distributed Caching Mechanisms for Hadoop
Gurmeet SinghPuneet Chandra
Rashid Tahir
![Page 2: Investigating Distributed Caching Mechanisms for Hadoop Gurmeet Singh Puneet Chandra Rashid Tahir.](https://reader035.fdocuments.us/reader035/viewer/2022062417/5519ca99550346443e8b4822/html5/thumbnails/2.jpg)
GOAL
• Explore the feasibility of a distributed caching mechanism inside Hadoop
![Page 3: Investigating Distributed Caching Mechanisms for Hadoop Gurmeet Singh Puneet Chandra Rashid Tahir.](https://reader035.fdocuments.us/reader035/viewer/2022062417/5519ca99550346443e8b4822/html5/thumbnails/3.jpg)
Presentation Overview
• Motivation• Design• Experimental Results• Future Work
![Page 4: Investigating Distributed Caching Mechanisms for Hadoop Gurmeet Singh Puneet Chandra Rashid Tahir.](https://reader035.fdocuments.us/reader035/viewer/2022062417/5519ca99550346443e8b4822/html5/thumbnails/4.jpg)
Motivation
• Disk Access Times are a bottleneck in cluster computing
• Large amount of data is read from disk• DARE• RAMClouds• PACMan – Coordinated Cache Replacement
We want to strike a balance between RAM and Disk Storage
![Page 5: Investigating Distributed Caching Mechanisms for Hadoop Gurmeet Singh Puneet Chandra Rashid Tahir.](https://reader035.fdocuments.us/reader035/viewer/2022062417/5519ca99550346443e8b4822/html5/thumbnails/5.jpg)
Our Approach
• Integrate Memcached with Hadoop• Used Quickcached and Spymemcached• Reserve a portion of the main memory at each
node to serve as local cache• Local caches aggregate to abstract a distributed
caching mechanism governed by Memcached• Greedy caching strategy• Least Recently Used (LRU) cache eviction policy
![Page 6: Investigating Distributed Caching Mechanisms for Hadoop Gurmeet Singh Puneet Chandra Rashid Tahir.](https://reader035.fdocuments.us/reader035/viewer/2022062417/5519ca99550346443e8b4822/html5/thumbnails/6.jpg)
Design Overview
![Page 7: Investigating Distributed Caching Mechanisms for Hadoop Gurmeet Singh Puneet Chandra Rashid Tahir.](https://reader035.fdocuments.us/reader035/viewer/2022062417/5519ca99550346443e8b4822/html5/thumbnails/7.jpg)
Memcached
![Page 8: Investigating Distributed Caching Mechanisms for Hadoop Gurmeet Singh Puneet Chandra Rashid Tahir.](https://reader035.fdocuments.us/reader035/viewer/2022062417/5519ca99550346443e8b4822/html5/thumbnails/8.jpg)
Design Choice 1
• Simultaneous requests to Namenode and Memcached
Minimizes access latency with additional network overhead
![Page 9: Investigating Distributed Caching Mechanisms for Hadoop Gurmeet Singh Puneet Chandra Rashid Tahir.](https://reader035.fdocuments.us/reader035/viewer/2022062417/5519ca99550346443e8b4822/html5/thumbnails/9.jpg)
Design Choice 2• Send request to Namenode only in the case of
a cache miss
Minimizes network overhead with increased latency
![Page 10: Investigating Distributed Caching Mechanisms for Hadoop Gurmeet Singh Puneet Chandra Rashid Tahir.](https://reader035.fdocuments.us/reader035/viewer/2022062417/5519ca99550346443e8b4822/html5/thumbnails/10.jpg)
Design Choice 3
• Datanodes send requests only to Memcached
• Memcached checks for cached blocks
• If cache miss occurs, it contacts the namenode and returns the replicas’ addresses to the datanodes
![Page 11: Investigating Distributed Caching Mechanisms for Hadoop Gurmeet Singh Puneet Chandra Rashid Tahir.](https://reader035.fdocuments.us/reader035/viewer/2022062417/5519ca99550346443e8b4822/html5/thumbnails/11.jpg)
Global Cache Replacement• LRU based Global Cache Eviction Scheme
![Page 12: Investigating Distributed Caching Mechanisms for Hadoop Gurmeet Singh Puneet Chandra Rashid Tahir.](https://reader035.fdocuments.us/reader035/viewer/2022062417/5519ca99550346443e8b4822/html5/thumbnails/12.jpg)
Prefetching
![Page 13: Investigating Distributed Caching Mechanisms for Hadoop Gurmeet Singh Puneet Chandra Rashid Tahir.](https://reader035.fdocuments.us/reader035/viewer/2022062417/5519ca99550346443e8b4822/html5/thumbnails/13.jpg)
Simulation Results
• Test data ranging from 2GB to 24GB• Word Count and Grep
![Page 14: Investigating Distributed Caching Mechanisms for Hadoop Gurmeet Singh Puneet Chandra Rashid Tahir.](https://reader035.fdocuments.us/reader035/viewer/2022062417/5519ca99550346443e8b4822/html5/thumbnails/14.jpg)
0 5 10 15 20 25 30 35 400
20
40
60
80
100
Network Overhead vs Cache Size
Cache Size (GB)
% O
verh
eadWord Count
![Page 15: Investigating Distributed Caching Mechanisms for Hadoop Gurmeet Singh Puneet Chandra Rashid Tahir.](https://reader035.fdocuments.us/reader035/viewer/2022062417/5519ca99550346443e8b4822/html5/thumbnails/15.jpg)
Word Count
0 5 10 15 20 25 30 350
0.2
0.4
0.6
0.8
1
Hit Ratio vs Cache Size
Cache Size (GB)
Cach
e H
it Ra
tio
![Page 16: Investigating Distributed Caching Mechanisms for Hadoop Gurmeet Singh Puneet Chandra Rashid Tahir.](https://reader035.fdocuments.us/reader035/viewer/2022062417/5519ca99550346443e8b4822/html5/thumbnails/16.jpg)
Grep
0 5 10 15 20 25 30 350
20
40
60
80
100
Network Overhead vs Cache Size
Cache Size (GB)
% O
verh
ead
![Page 17: Investigating Distributed Caching Mechanisms for Hadoop Gurmeet Singh Puneet Chandra Rashid Tahir.](https://reader035.fdocuments.us/reader035/viewer/2022062417/5519ca99550346443e8b4822/html5/thumbnails/17.jpg)
Grep
0 5 10 15 20 25 30 350
0.2
0.4
0.6
0.8
1
Hit Ratio vs Cache Size
Cache Size (GB)
Hit
Ratio
![Page 18: Investigating Distributed Caching Mechanisms for Hadoop Gurmeet Singh Puneet Chandra Rashid Tahir.](https://reader035.fdocuments.us/reader035/viewer/2022062417/5519ca99550346443e8b4822/html5/thumbnails/18.jpg)
Future Work
• Implement a pre-fetching mechanism• Customized caching policies based on access
patterns• Compare and contrast caching with locality
aware scheduling
![Page 19: Investigating Distributed Caching Mechanisms for Hadoop Gurmeet Singh Puneet Chandra Rashid Tahir.](https://reader035.fdocuments.us/reader035/viewer/2022062417/5519ca99550346443e8b4822/html5/thumbnails/19.jpg)
Conclusion
• Caching can improve the performance of cluster based systems based on the access patterns of the workload being executed