1
Geiger: Monitoring the Buffer Geiger: Monitoring the Buffer Cache in a Cache in a
Virtual Machine EnvironmentVirtual Machine Environment
Stephen T. JonesAndrea C. Arpaci-DusseauRemzi H. Arpaci-Dusseau
Department of Computer Sciences
Buffer CacheBuffer Cache
• In modern OSes, file system buffer and virtual memory system are unified– When first access a file, data is buffered in a
memory page– When under memory pressure, a page will be
evicted out• If the page is dirty, write to swap space or file
system first• Then the page can be reused• Later, if the data is needed, a page fault occurs
– Allocate a free page, reload the data from disk to the page
2
Useful Information About Useful Information About Buffer Cache Buffer Cache
• If VMM knows events of eviction/promotion– Tell if guest OS is thrashing and how much
more memory allocation is needed to prevent it
– Guide eviction-based cache placement • exclusive cache: when hits, data item is removed• A transparent secondary cache maybe desirable
– E.g. a 32-bit OS running on a host with 16 GB mem
• Why exclusive cache works?– Normally, when a page is read from disk, the OS will not
read it again without evicting it first– Increase cache utilization
3
4
Services in a VMMServices in a VMM
• VMM layer is attractive development target– Security (isolation from OS and apps)– Portability (transparent to OS)
• Our target services– VMM-driven eviction-based cache
placement• Increase hit-ratio for remote storage caches• Transparent to guest OS
– Working set size estimation for thrashing VMs• Complement ESX server technique
5
VMM Services Need InformationVMM Services Need Information
• Information about guest operating systems
• For our target services– Information about OS buffer cache
• Hidden from the VMM– Layered design approach– Narrow interface (virtual architecture)
6
Geiger Monitors Buffer CacheGeiger Monitors Buffer Cache
• Virtual machine monitor extension• Implicitly observes buffer cache events
– Uses only information intrinsically available to VMM
– Explicit approach possible, but drawbacks
• No guest OS modifications required• Applicable to closed and legacy OS• Accurate (usually less than 5% error)• Low cost (usually less than 3% overhead)• Enables service implementation in VMM
8
Buffer Cache EventsBuffer Cache Events
• Cache promotion– Disk block inserted into buffer cache
• Cache demotion – Disk block removed from cache
9
Detecting PromotionDetecting Promotion
B
• Block read• Block write• Disk reads and writes visible to the VMM• Associated Disk Location (ADL)
CC
A
CBBuffer cache
User process
A
A
Disk
ADL
10
Buffer cache
Detecting DemotionDetecting Demotion
B
• Detect when a page is removed from the cache
• VMM cannot observe page free directly• Instead, look for page reuse• If cache page data is reused, the page was
logically freed in the interim• Reuse inconsistent with ADL -> eviction
A B CCA
Disk
ADL
C
Read / Write EvictionsRead / Write Evictions
– Read eviction• A non-free page is reused for reading from a
different disk location• E.g. read a large file/memory space
– Write eviction• A non-free page is reused for writing. When
it is written-back, the reuse (eviction) is detected
• Lag
11
12
Existing TechniquesExisting Techniques
• Promotion via reads and writes• Demotion via reads and writes• Chen et al. -- USENIX 2003
– Within OS (pseudo device driver)
• Initial basis for Geiger
13
OutlineOutline
• Geiger approach• New Geiger techniquesNew Geiger techniques• Evaluation• Application
14
New Geiger TechniquesNew Geiger Techniques
• Other ways buffer cache pages are evicted
• Unified buffer cache/virtual memory system
• Non-I/O allocations cause eviction• Two new eviction detection heuristics
– Copy-on-write– Anonymous allocation
When Eviction Happens?When Eviction Happens?
• Explicit Eviction– Read eviction– Write eviction
• Implicit Eviction – A non-free page is reused without disk
writing or reading• Page allocation or Copy-on-Write
– E.g. when a process requests for a new page, a non-dirty page is allocated it
15
16
Detecting Allocation EvictionDetecting Allocation Eviction
• Page not-present fault• Page allocation (possible reuse)• New writable mapping• Detect eviction• Invalidate ADL
z
A’
A B CC
A
C
R
BzDiskBuffer cache
User process
17
Filesystem IssuesFilesystem Issues
• Filesystem features cause false positives
• Filesystem blocks can be deleted– Leads to dangling ADL and spurious
eviction
• Journaling causes aliasing– Same cache page written to both the
journal and filesystem locations– Interferes with write-eviction heuristic
18
Geiger Is Filesystem AwareGeiger Is Filesystem Aware
• Uses static filesystem info – Journal location and size– Block allocation bitmaps
• Ignore writes to the journal• Track allocation bitmap updates and
invalidate ADLs when blocks deallocated
• Significantly reduces Geiger false positives
Block LivenessBlock Liveness
• Reusing a free page is not an eviction– Geiger infers the liveness of a page from
the liveness of block
• A block dies– A file is deleted or truncated– A process with virtual memory usage
terminates
19
Block Liveness for FilesBlock Liveness for Files
• Observing the writes to superblock+:They are at some special disk location– : OS caches them in memory and sync
to disk every 30 secs or more
• Pages used to cache them are marked read-only– Write attempts will cause page-faults– Invalidate affected ADLs
20
Block Liveness for Swap SpaceBlock Liveness for Swap Space
• No on-disk structure to track block usage– When a disk block is written from a different
memory page, the original block is considered to be “dead”
– Maintain a reverse mapping from between blocks and ADLs
– Invalidate ADLs when blocks are overwritten– If no overwritting, dead blocks can’t be
detected• Leads to as much as 37% false positive eviction
21
23
Evaluation GoalsEvaluation Goals
• Measure Geiger accuracy– Missed evictions (false negatives)– Spurious evictions (false positives)
• Measure Geiger timeliness– Lag between actual event and detection
24
Experimental EnvironmentExperimental Environment
• Xen 2.0.7 VMM [Barham et al., SOSP03]
– Extensions to observe page faults, page table updates, and I/O requests/completions
• Linux 2.4 and 2.6 guests• Microbenchmarks
– Isolate specific eviction types– Read, write, COW, allocation
• Application benchmarks– Dbench, Mogrify, TPC-W, SPC disk trace
25
Eviction Detection AccuracyEviction Detection Accuracy
0.17%0.17%Alloc Evict
1.45%2.47%COW Evict
0.03%1.68%Write Evict
0.58%0.96%Read Evict
False Pos %False Neg %Workload
27
Application AccuracyApplication Accuracy
2.46%0.65%w/ blockliveness
Mogrify
0.32%2.24%SPC Web2
3.12%0.14%TPC-W
22.99%0.05%w/o blockliveness
Mogrify
5.72%2.30%w/ block liveness
Dbench
30.23%1.10%w/o block liveness
Dbench
False Pos%
False Neg%
Geiger OptWorkload
28
OutlineOutline
• Geiger approach• New Geiger techniques• Evaluation• ApplicationApplication
– Eviction-based cache placementEviction-based cache placement
29
Application:Application:Eviction-based Cache PlacementEviction-based Cache Placement
• Disk cache utilization is critical to performance
• Storage servers have large caches• Demand-based placement => poor utilization• Increase cache utilization via exclusivity• Use client cache eviction as placement hint
[Chen et al., USENIX ’03, Wong and Wilkes, USENIX ‘02]
• Use VMM-based, implicit eviction information to inform a remote storage cache
• No client or OS storage interfaces change
30
Cache Placement ResultsCache Placement Results
• Geiger outperforms demand placement• Mogrify: buffer misses too many evictions• Mogrify: false positives are fortuitous• Dbench: Lag causes OS to outperform Geiger
13%51%
OutlineOutline
• Geiger approach• New Geiger techniques• Evaluation• ApplicationApplication
– Eviction-based cache placementEviction-based cache placement– Working set size estimatorWorking set size estimator
31
LRU Miss Ratio CurveLRU Miss Ratio Curve
32
d e f g h i j k l m n c k l m ncb c d e f g h i j k l m n c k l m n
aba abc defghijklmn abc defghijklm n abdefghijckl n m abc defghijk mn l abc defghijlmn k abdefghijklmn c
4
n
321 1
LRU Queue
Pages in LRU order
Hit Histogram
Fault Curve
0000000000 0000
m n
1 14
l m nk l m nc k l m na b c d e f g h i j k l m n c k l m n
5
1
1 14114
i
i ihistfault 1
Associated with each LRU position
pages
faults
Application:Application:Working Set Size EstimatorWorking Set Size Estimator• MemRx:
• Observe evictions/reloads • Compute miss ratio curve
33
WSS = current memory allocation + LRU estimation
Only works when WSS > current memory size
Estimation Results:Estimation Results:MicrobenchmarksMicrobenchmarks
34
Virtual Machine is configured with 128 MB memory
Each benchmark accesses 256 MB file/memory
FS: file accessVM: memory access
36
SummarySummary
• System services in a VMM• Need information about the guest OS• Implicit information about the buffer
cache– No guest OS modification– Accurate– Low overhead
• Build services and optimizations in a VMM– Eviction-based cache placement– Working set size estimation
Top Related