MemC3: Compact and Concurrent MemCache with Dumber Caching and Smarter Hashing
description
Transcript of MemC3: Compact and Concurrent MemCache with Dumber Caching and Smarter Hashing
![Page 1: MemC3: Compact and Concurrent MemCache with Dumber Caching and Smarter Hashing](https://reader035.fdocuments.us/reader035/viewer/2022062410/5681620b550346895dd235b0/html5/thumbnails/1.jpg)
MemC3: Compact and Concurrent MemCache with Dumber
Caching and Smarter Hashing
Bin Fan, David G. Andersen, Michael Kaminsky
Presenter: Son Nguyen
![Page 2: MemC3: Compact and Concurrent MemCache with Dumber Caching and Smarter Hashing](https://reader035.fdocuments.us/reader035/viewer/2022062410/5681620b550346895dd235b0/html5/thumbnails/2.jpg)
Memcached internal• LRU caching using chaining Hashtable and
doubly linked list
![Page 3: MemC3: Compact and Concurrent MemCache with Dumber Caching and Smarter Hashing](https://reader035.fdocuments.us/reader035/viewer/2022062410/5681620b550346895dd235b0/html5/thumbnails/3.jpg)
Goals
• Reduce space overhead (bytes/key)• Improve throughput (queries/sec)• Target read-intensive workload with small
objects• Result: 3X throughput, 30% more objects
![Page 4: MemC3: Compact and Concurrent MemCache with Dumber Caching and Smarter Hashing](https://reader035.fdocuments.us/reader035/viewer/2022062410/5681620b550346895dd235b0/html5/thumbnails/4.jpg)
Doubly-linked-list’s problems
• At least two pointers per item -> expensive• Both read and write change the list’s structure
-> need locking between threads (no concurrency)
![Page 5: MemC3: Compact and Concurrent MemCache with Dumber Caching and Smarter Hashing](https://reader035.fdocuments.us/reader035/viewer/2022062410/5681620b550346895dd235b0/html5/thumbnails/5.jpg)
Solution: CLOCK-based LRU
• Approximate LRU• Multiple readers/single writer• Circular queue instead of linked list -> less
space overhead
![Page 6: MemC3: Compact and Concurrent MemCache with Dumber Caching and Smarter Hashing](https://reader035.fdocuments.us/reader035/viewer/2022062410/5681620b550346895dd235b0/html5/thumbnails/6.jpg)
CLOCK exampleentry (ka, va) (kb, vb) (kc, vc) (kd, vd) (ke, ve)
recency 1 0 1 1 0
entry (ka, va) (kb, vb) (kc, vc) (kd, vd) (ke, ve)
recency 1 0 1 0 0Read(kd):
entry (ka, va) (kb, vb) (kf, vf) (kd, vd) (ke, ve)
recency 1 1 0 0 0Write(kf, vf):
entry (kg, vg) (kb, vb) (kf, vf) (kd, vd) (ke, ve)
recency 0 1 0 1 1Write(kg, vg):
Originally:
![Page 7: MemC3: Compact and Concurrent MemCache with Dumber Caching and Smarter Hashing](https://reader035.fdocuments.us/reader035/viewer/2022062410/5681620b550346895dd235b0/html5/thumbnails/7.jpg)
Chaining Hashtable’s problems
• Use linked list -> costly space overhead for pointers
• Pointer dereference is slow (no advantage from CPU cache)
• Read is not constant time (due to possibly long list)
![Page 8: MemC3: Compact and Concurrent MemCache with Dumber Caching and Smarter Hashing](https://reader035.fdocuments.us/reader035/viewer/2022062410/5681620b550346895dd235b0/html5/thumbnails/8.jpg)
Solution: Cuckoo Hashing
• Use 2 hashtables• Each bucket has exactly 4 slots (fits in CPU
cache)• Each (key, value) object therefore can reside at
one of the 8 possible slots
![Page 9: MemC3: Compact and Concurrent MemCache with Dumber Caching and Smarter Hashing](https://reader035.fdocuments.us/reader035/viewer/2022062410/5681620b550346895dd235b0/html5/thumbnails/9.jpg)
Cuckoo Hashing
(ka,va)
HASH1(ka)
HASH2(ka)
![Page 10: MemC3: Compact and Concurrent MemCache with Dumber Caching and Smarter Hashing](https://reader035.fdocuments.us/reader035/viewer/2022062410/5681620b550346895dd235b0/html5/thumbnails/10.jpg)
Cuckoo Hashing
• Read: always 8 lookups (constant, fast)• Write: write(ka, va) – Find an empty slot in 8 possible slots of ka– If all are full then randomly kick some (kb, vb) out– Now find an empty slot for (kb, vb)– Repeat 500 times or until an empty slot is found– If still not found then do table expansion
![Page 11: MemC3: Compact and Concurrent MemCache with Dumber Caching and Smarter Hashing](https://reader035.fdocuments.us/reader035/viewer/2022062410/5681620b550346895dd235b0/html5/thumbnails/11.jpg)
Cuckoo HashingX
X X X
X
X X
X X
X X X
X c X X
X X
X X X X
X
X
(ka,va)
HASH1(ka)
HASH2(ka)
ba
Insert a:
![Page 12: MemC3: Compact and Concurrent MemCache with Dumber Caching and Smarter Hashing](https://reader035.fdocuments.us/reader035/viewer/2022062410/5681620b550346895dd235b0/html5/thumbnails/12.jpg)
Cuckoo HashingX
X X a X
X
X X
X X
X X X
X X X
X X
X X X X
X
X
(kb,vb)
HASH1(kb)
HASH2(kb) cb
Insert b:
![Page 13: MemC3: Compact and Concurrent MemCache with Dumber Caching and Smarter Hashing](https://reader035.fdocuments.us/reader035/viewer/2022062410/5681620b550346895dd235b0/html5/thumbnails/13.jpg)
Cuckoo HashingX
X X a X
X
X X
X X
X X X
X b X X
X X
X X X X
X
X
(kc,vc)
HASH1(kc)
HASH2(kc)
c
Insert c:
Done !!!
![Page 14: MemC3: Compact and Concurrent MemCache with Dumber Caching and Smarter Hashing](https://reader035.fdocuments.us/reader035/viewer/2022062410/5681620b550346895dd235b0/html5/thumbnails/14.jpg)
Cuckoo Hashing
• Problem: after (kb, vb) is kicked out, a reader might attempt to read (kb, vb) and get a false cache miss
• Solution: Compute the kick out path (Cuckoo path) first, then move items backward
• Before: (b,c,Null)->(a,c,Null)->(a,b,Null)->(a,b,c)• Fixed: (b,c,Null)->(b,c,c)->(b,b,c)->(a,b,c)
![Page 15: MemC3: Compact and Concurrent MemCache with Dumber Caching and Smarter Hashing](https://reader035.fdocuments.us/reader035/viewer/2022062410/5681620b550346895dd235b0/html5/thumbnails/15.jpg)
Cuckoo pathX
X X b X
X
X X
X X
X X X
X c X X
X X
X X X X
X
X
(ka,va)
HASH1(ka)
HASH2(ka)
Insert a:
![Page 16: MemC3: Compact and Concurrent MemCache with Dumber Caching and Smarter Hashing](https://reader035.fdocuments.us/reader035/viewer/2022062410/5681620b550346895dd235b0/html5/thumbnails/16.jpg)
Cuckoo path backward insertX
X X X
X
X X
X X
X X X
X X X
X X
X X X X
X
X
(ka,va)
HASH1(ka)
HASH2(ka)
Insert a:
c
ba
![Page 17: MemC3: Compact and Concurrent MemCache with Dumber Caching and Smarter Hashing](https://reader035.fdocuments.us/reader035/viewer/2022062410/5681620b550346895dd235b0/html5/thumbnails/17.jpg)
Cuckoo’s advantages
• Concurrency: multiple readers/single writer• Read optimized (entries fit in CPU cache)• Still O(1) amortized time for write• 30% less space overhead• 95% table occupancy
![Page 18: MemC3: Compact and Concurrent MemCache with Dumber Caching and Smarter Hashing](https://reader035.fdocuments.us/reader035/viewer/2022062410/5681620b550346895dd235b0/html5/thumbnails/18.jpg)
Evaluation68% throughput improvement in all hit case. 235% for all miss
![Page 19: MemC3: Compact and Concurrent MemCache with Dumber Caching and Smarter Hashing](https://reader035.fdocuments.us/reader035/viewer/2022062410/5681620b550346895dd235b0/html5/thumbnails/19.jpg)
Evaluation3x throughput on “real” workload
![Page 20: MemC3: Compact and Concurrent MemCache with Dumber Caching and Smarter Hashing](https://reader035.fdocuments.us/reader035/viewer/2022062410/5681620b550346895dd235b0/html5/thumbnails/20.jpg)
Discussion
• Write is slower than chaining Hashtable– Chaining Hashtable: 14.38 million keys/sec– Cuckoo: 7 million keys/sec
• Idea: finding cuckoo path in parallel– Benchmark doesn’t show much improvement
• Can we make it write-concurrent?