Effectively Prefetching Remote Memory with Leap · Life of a Page w/ Leap User Space Kernel Space...
Transcript of Effectively Prefetching Remote Memory with Leap · Life of a Page w/ Leap User Space Kernel Space...
![Page 1: Effectively Prefetching Remote Memory with Leap · Life of a Page w/ Leap User Space Kernel Space Remote Memory Memory Management Unit (MMU) Process 1 Process 2 … Process N Page](https://reader033.fdocuments.us/reader033/viewer/2022060710/6076929b82e4d031dc2ac96f/html5/thumbnails/1.jpg)
Effectively Prefetching Remote Memory
with Leap Hasan Al Maruf and Mosharaf Chowdhury
1
![Page 2: Effectively Prefetching Remote Memory with Leap · Life of a Page w/ Leap User Space Kernel Space Remote Memory Memory Management Unit (MMU) Process 1 Process 2 … Process N Page](https://reader033.fdocuments.us/reader033/viewer/2022060710/6076929b82e4d031dc2ac96f/html5/thumbnails/2.jpg)
2
Memory-Intensive Applications
![Page 3: Effectively Prefetching Remote Memory with Leap · Life of a Page w/ Leap User Space Kernel Space Remote Memory Memory Management Unit (MMU) Process 1 Process 2 … Process N Page](https://reader033.fdocuments.us/reader033/viewer/2022060710/6076929b82e4d031dc2ac96f/html5/thumbnails/3.jpg)
Perform Great!
3
TPC-C on VoltDB
38.61
6.61
1.010
5
10
15
20
25
30
35
40
100% 75% 50%
TPS
(T
hous
ands
)
In-Memory Working Set
![Page 4: Effectively Prefetching Remote Memory with Leap · Life of a Page w/ Leap User Space Kernel Space Remote Memory Memory Management Unit (MMU) Process 1 Process 2 … Process N Page](https://reader033.fdocuments.us/reader033/viewer/2022060710/6076929b82e4d031dc2ac96f/html5/thumbnails/4.jpg)
Perform Great Until Memory Runs Out
4
TPC-C on VoltDB
38.61
6.61
1.010
5
10
15
20
25
30
35
40
100% 75% 50%
TPS
(T
hous
ands
)
In-Memory Working Set
![Page 5: Effectively Prefetching Remote Memory with Leap · Life of a Page w/ Leap User Space Kernel Space Remote Memory Memory Management Unit (MMU) Process 1 Process 2 … Process N Page](https://reader033.fdocuments.us/reader033/viewer/2022060710/6076929b82e4d031dc2ac96f/html5/thumbnails/5.jpg)
Perform Great Until Memory Runs Out
5
TPC-C on VoltDB
38.61
6.61
1.010
5
10
15
20
25
30
35
40
100% 75% 50%
TPS
(T
hous
ands
)
In-Memory Working Set
PageRank on PowerGraph
116.19 124.96
424.47
0
100
200
300
400
500
100% 75% 50%
Com
plet
ion
Tim
e (s
)
In-Memory Working Set
![Page 6: Effectively Prefetching Remote Memory with Leap · Life of a Page w/ Leap User Space Kernel Space Remote Memory Memory Management Unit (MMU) Process 1 Process 2 … Process N Page](https://reader033.fdocuments.us/reader033/viewer/2022060710/6076929b82e4d031dc2ac96f/html5/thumbnails/6.jpg)
50% Less Memory Causes Slowdown of …
PageRank on PowerGraph
6
TPC-C on VoltDB
38.61
6.61
1.010
5
10
15
20
25
30
35
40
100% 75% 50%
TPS
(T
hous
ands
)
In-Memory Working Set
116.19 124.96
424.47
0
100
200
300
400
500
100% 75% 50%
Com
plet
ion
Tim
e (s
)
In-Memory Working Set
![Page 7: Effectively Prefetching Remote Memory with Leap · Life of a Page w/ Leap User Space Kernel Space Remote Memory Memory Management Unit (MMU) Process 1 Process 2 … Process N Page](https://reader033.fdocuments.us/reader033/viewer/2022060710/6076929b82e4d031dc2ac96f/html5/thumbnails/7.jpg)
Between a Rock and a Hard Place
OverallocationLeads to underutilization
30-40% in Google, Alibaba, and Facebook
UnderallocationLeads to severe performance loss
VS.
7
![Page 8: Effectively Prefetching Remote Memory with Leap · Life of a Page w/ Leap User Space Kernel Space Remote Memory Memory Management Unit (MMU) Process 1 Process 2 … Process N Page](https://reader033.fdocuments.us/reader033/viewer/2022060710/6076929b82e4d031dc2ac96f/html5/thumbnails/8.jpg)
Machine 1 Machine 2 Machine 3 Machine N
Used Memory Free Memory
…
Disaggregated Memory
Memory Disaggregation
Remote Memory 8
![Page 9: Effectively Prefetching Remote Memory with Leap · Life of a Page w/ Leap User Space Kernel Space Remote Memory Memory Management Unit (MMU) Process 1 Process 2 … Process N Page](https://reader033.fdocuments.us/reader033/viewer/2022060710/6076929b82e4d031dc2ac96f/html5/thumbnails/9.jpg)
Remote Memory Access
9
User-space Applications
Memory Disaggregation Frameworks
Remote Memory
Infiniswap(NSDI’17)
Remote memory paging
Remote Regions(ATC’18)
Remote file abstraction
LegoOS(OSDI’18)
Disaggregated OS
4KB page access latency local vs. remote
100 ns vs. 4 µs
![Page 10: Effectively Prefetching Remote Memory with Leap · Life of a Page w/ Leap User Space Kernel Space Remote Memory Memory Management Unit (MMU) Process 1 Process 2 … Process N Page](https://reader033.fdocuments.us/reader033/viewer/2022060710/6076929b82e4d031dc2ac96f/html5/thumbnails/10.jpg)
Remote Memory Access
10
User-space Applications
Memory Disaggregation Frameworks
Remote Memory
Infiniswap(NSDI’17)
Remote memory paging
Remote Regions(ATC’18)
Remote file abstraction
LegoOS(OSDI’18)
Disaggregated OS
[1] P. X. Gao et al. “Network requirements for resource disaggregation” OSDI’16.
Latency requirement for preferable performance[1]
3 µs
Existing frameworks can’t achieve!
4KB page access latency local vs. remote
100 ns vs. 4 µs
![Page 11: Effectively Prefetching Remote Memory with Leap · Life of a Page w/ Leap User Space Kernel Space Remote Memory Memory Management Unit (MMU) Process 1 Process 2 … Process N Page](https://reader033.fdocuments.us/reader033/viewer/2022060710/6076929b82e4d031dc2ac96f/html5/thumbnails/11.jpg)
Remote Memory Access
11
User-space Applications
Memory Disaggregation Frameworks
Remote Memory
Infiniswap(NSDI’17)
Remote memory paging
Remote Regions(ATC’18)
Remote file abstraction
LegoOS(OSDI’18)
Disaggregated OS
variation in network latency
[1] P. X. Gao et al. “Network requirements for resource disaggregation” OSDI’16.
data path overhead Latency requirement for
preferable performance[1]
3 µs
Existing frameworks can’t achieve!
4KB page access latency local vs. remote
100 ns vs. 4 µs
![Page 12: Effectively Prefetching Remote Memory with Leap · Life of a Page w/ Leap User Space Kernel Space Remote Memory Memory Management Unit (MMU) Process 1 Process 2 … Process N Page](https://reader033.fdocuments.us/reader033/viewer/2022060710/6076929b82e4d031dc2ac96f/html5/thumbnails/12.jpg)
Life of a Page
I/O Scheduler Request QueueRequest queue processing:
Insertion, Merging, Sorting, Staging and Dispatch
Dispatch Queue
Device Mapping Layer
Generic Block Layer bio10.04 us
2.1 us
Remote MemoryRDMA: 4.3 us
CacheMiss
0.27 usCache
Hit
User Space
Kernel SpaceMemory Management
Unit (MMU)
Process 1 Process 2 Process N…
Page FaultMMU
Page Cache
12
Block Device Driver
21.88 us
![Page 13: Effectively Prefetching Remote Memory with Leap · Life of a Page w/ Leap User Space Kernel Space Remote Memory Memory Management Unit (MMU) Process 1 Process 2 … Process N Page](https://reader033.fdocuments.us/reader033/viewer/2022060710/6076929b82e4d031dc2ac96f/html5/thumbnails/13.jpg)
Where Does the Time Go?Page Request
In Page Cache?
Read Request?
Yes
Update Page Table & End I/O
Yes
0.12 µs
0.15 µs
Fast Path
13
![Page 14: Effectively Prefetching Remote Memory with Leap · Life of a Page w/ Leap User Space Kernel Space Remote Memory Memory Management Unit (MMU) Process 1 Process 2 … Process N Page](https://reader033.fdocuments.us/reader033/viewer/2022060710/6076929b82e4d031dc2ac96f/html5/thumbnails/14.jpg)
Where Does the Time Go?Page Request
In Page Cache?
Allocate Cache for Page
Read Request?
No
Yes
Update Page Table & End I/OPrepare for I/O
YesNo
Queue and Batch Requests
Execute I/O
0.12 µs
2.1 µs
10.04 µs
21.88 µs
RDMA: 4.3 µs
0.15 µs
Fast Path
Slow Path
14
![Page 15: Effectively Prefetching Remote Memory with Leap · Life of a Page w/ Leap User Space Kernel Space Remote Memory Memory Management Unit (MMU) Process 1 Process 2 … Process N Page](https://reader033.fdocuments.us/reader033/viewer/2022060710/6076929b82e4d031dc2ac96f/html5/thumbnails/15.jpg)
Design Goal
1. Increase cache hit• faster path serves more page faults
2. Reduce the latency of the slow path• remove unnecessary block-layer operations for RDMA
15
![Page 16: Effectively Prefetching Remote Memory with Leap · Life of a Page w/ Leap User Space Kernel Space Remote Memory Memory Management Unit (MMU) Process 1 Process 2 … Process N Page](https://reader033.fdocuments.us/reader033/viewer/2022060710/6076929b82e4d031dc2ac96f/html5/thumbnails/16.jpg)
Online remote memory prefetcherLeap
Identifies memory access patterns to prefetch pages in a • fast,• cache-efficient, and• resilient manner
without modifying any • applications, or• hardware
16
![Page 17: Effectively Prefetching Remote Memory with Leap · Life of a Page w/ Leap User Space Kernel Space Remote Memory Memory Management Unit (MMU) Process 1 Process 2 … Process N Page](https://reader033.fdocuments.us/reader033/viewer/2022060710/6076929b82e4d031dc2ac96f/html5/thumbnails/17.jpg)
Life of a PageUser Space
Kernel Space
Device Mapping Layer
Block Device Driver
Generic Block Layer
I/O Scheduler Request QueueRequest queue processing:
Insertion, Merging, Sorting, Staging and Dispatch
bio
Remote Memory
Dispatch Queue
Memory ManagementUnit (MMU)
Process 1 Process 2 Process N…
Page Fault
RDMA: 4.3 us
0.27 us
10.04 us
21.88 us
2.1 us
CacheMiss
CacheHit
MMU Page Cache
17
![Page 18: Effectively Prefetching Remote Memory with Leap · Life of a Page w/ Leap User Space Kernel Space Remote Memory Memory Management Unit (MMU) Process 1 Process 2 … Process N Page](https://reader033.fdocuments.us/reader033/viewer/2022060710/6076929b82e4d031dc2ac96f/html5/thumbnails/18.jpg)
Life of a Page w/ LeapUser Space
Kernel Space
Remote Memory
Memory ManagementUnit (MMU)
Process 1 Process 2 Process N…
Page Fault
RDMA: 4.3 us
0.27 us
2.1 us
CacheMiss
CacheHit
MMU Page Cache
18
![Page 19: Effectively Prefetching Remote Memory with Leap · Life of a Page w/ Leap User Space Kernel Space Remote Memory Memory Management Unit (MMU) Process 1 Process 2 … Process N Page](https://reader033.fdocuments.us/reader033/viewer/2022060710/6076929b82e4d031dc2ac96f/html5/thumbnails/19.jpg)
Life of a Page w/ LeapUser Space
Kernel Space
Remote Memory
Memory ManagementUnit (MMU)
Process 1 Process 2 Process N…
Page Fault
RDMA: 4.3 us
0.27 us
2.1 us
CacheMiss
CacheHit
MMU Page Cache
Process Specific Page Access Tracker
Leap
Trend Detection
Prefetch CandidateGeneration
Prefetcher
Eager Cache Eviction
19
0.34 us
![Page 20: Effectively Prefetching Remote Memory with Leap · Life of a Page w/ Leap User Space Kernel Space Remote Memory Memory Management Unit (MMU) Process 1 Process 2 … Process N Page](https://reader033.fdocuments.us/reader033/viewer/2022060710/6076929b82e4d031dc2ac96f/html5/thumbnails/20.jpg)
Prefetching in Linux
Reads ahead pages sequentially
Based only on the last page access
Does not distinguish between processes
Cannot detect thread-level access irregularities
too aggressive on seq: cache pollution
too conservative off seq: brings nothing
20
![Page 21: Effectively Prefetching Remote Memory with Leap · Life of a Page w/ Leap User Space Kernel Space Remote Memory Memory Management Unit (MMU) Process 1 Process 2 … Process N Page](https://reader033.fdocuments.us/reader033/viewer/2022060710/6076929b82e4d031dc2ac96f/html5/thumbnails/21.jpg)
Prefetching Techniques
ApproachLow Computational Complexity
Low Memory Overhead
UnmodifiedApplication
HW/SW Independence
Temporal Locality
Spatial Locality
Low Cache Pollution
Next N-Line Yes Yes Yes Yes No Yes No
Stride Yes Yes Yes Yes No Yes No
Instruction Prefetch No No No No Yes Yes No
Linux Read-Ahead Yes Yes Yes Yes Yes Yes No
Leap Yes Yes Yes Yes Yes Yes Yes
21
![Page 22: Effectively Prefetching Remote Memory with Leap · Life of a Page w/ Leap User Space Kernel Space Remote Memory Memory Management Unit (MMU) Process 1 Process 2 … Process N Page](https://reader033.fdocuments.us/reader033/viewer/2022060710/6076929b82e4d031dc2ac96f/html5/thumbnails/22.jpg)
Prefetching Techniques
ApproachLow Computational Complexity
Low Memory Overhead
UnmodifiedApplication
HW/SW Independence
Temporal Locality
Spatial Locality
Low Cache Pollution
Next N-Line Yes Yes Yes Yes No Yes No
Stride Yes Yes Yes Yes No Yes No
Instruction Prefetch No No No No Yes Yes No
Linux Read-Ahead Yes Yes Yes Yes Yes Yes No
Leap Yes Yes Yes Yes Yes Yes Yes
22
![Page 23: Effectively Prefetching Remote Memory with Leap · Life of a Page w/ Leap User Space Kernel Space Remote Memory Memory Management Unit (MMU) Process 1 Process 2 … Process N Page](https://reader033.fdocuments.us/reader033/viewer/2022060710/6076929b82e4d031dc2ac96f/html5/thumbnails/23.jpg)
Prefetching Techniques
ApproachLow Computational Complexity
Low Memory Overhead
UnmodifiedApplication
HW/SW Independence
Temporal Locality
Spatial Locality
Low Cache Pollution
Next N-Line Yes Yes Yes Yes No Yes No
Stride Yes Yes Yes Yes No Yes No
Instruction Prefetch No No No No Yes Yes No
Linux Read-Ahead Yes Yes Yes Yes Yes Yes No
Leap Yes Yes Yes Yes Yes Yes Yes
23
![Page 24: Effectively Prefetching Remote Memory with Leap · Life of a Page w/ Leap User Space Kernel Space Remote Memory Memory Management Unit (MMU) Process 1 Process 2 … Process N Page](https://reader033.fdocuments.us/reader033/viewer/2022060710/6076929b82e4d031dc2ac96f/html5/thumbnails/24.jpg)
Prefetching Techniques
ApproachLow Computational Complexity
Low Memory Overhead
UnmodifiedApplication
HW/SW Independence
Temporal Locality
Spatial Locality
Low Cache Pollution
Next N-Line Yes Yes Yes Yes No Yes No
Stride Yes Yes Yes Yes No Yes No
Instruction Prefetch No No No No Yes Yes No
Linux Read-Ahead Yes Yes Yes Yes Yes Yes No
Leap Yes Yes Yes Yes Yes Yes Yes
24
![Page 25: Effectively Prefetching Remote Memory with Leap · Life of a Page w/ Leap User Space Kernel Space Remote Memory Memory Management Unit (MMU) Process 1 Process 2 … Process N Page](https://reader033.fdocuments.us/reader033/viewer/2022060710/6076929b82e4d031dc2ac96f/html5/thumbnails/25.jpg)
Prefetching Techniques
ApproachLow Computational Complexity
Low Memory Overhead
UnmodifiedApplication
HW/SW Independence
Temporal Locality
Spatial Locality
Low Cache Pollution
Next N-Line Yes Yes Yes Yes No Yes No
Stride Yes Yes Yes Yes No Yes No
Instruction Prefetch No No No No Yes Yes No
Linux Read-Ahead Yes Yes Yes Yes Yes Yes No
Leap Yes Yes Yes Yes Yes Yes Yes
25
![Page 26: Effectively Prefetching Remote Memory with Leap · Life of a Page w/ Leap User Space Kernel Space Remote Memory Memory Management Unit (MMU) Process 1 Process 2 … Process N Page](https://reader033.fdocuments.us/reader033/viewer/2022060710/6076929b82e4d031dc2ac96f/html5/thumbnails/26.jpg)
Leap PrefetcherLinear-time and constant memory space
Two main components:§ Trend detection§ Prefetch window size detection
Get Prefetch Window Size
Window Size = 0?
Read only the requested page
Trend Found?
Prefetch with Current Trend
Prefetch with Previous Trend
YesNo
No Yes
26
![Page 27: Effectively Prefetching Remote Memory with Leap · Life of a Page w/ Leap User Space Kernel Space Remote Memory Memory Management Unit (MMU) Process 1 Process 2 … Process N Page](https://reader033.fdocuments.us/reader033/viewer/2022060710/6076929b82e4d031dc2ac96f/html5/thumbnails/27.jpg)
Trend DetectionStart with a smaller window of Access History
Majority found?
Doubles the window size
No Yes
Run Boyer-Moore on the window
Return Majority ∆maj
Max. window
size?
YesNo trend found
No
Flexible to short term irregularity
Identifies the majority element in access history
Regular trends can be found within recent accesses
27
![Page 28: Effectively Prefetching Remote Memory with Leap · Life of a Page w/ Leap User Space Kernel Space Remote Memory Memory Management Unit (MMU) Process 1 Process 2 … Process N Page](https://reader033.fdocuments.us/reader033/viewer/2022060710/6076929b82e4d031dc2ac96f/html5/thumbnails/28.jpg)
Trend Detection Example
t4 t5 t6 t7
0x3C 0x02 0x04 0x06
t0 t1 t2 t3
0x48 0x45 0x42 0x3F-3-3-3+72 +2+2-58-3
t0 t1 t2 t3
0x48 0x45 0x42 0x3F-3-3-3+72
t8 t9 t10 t11
0x08 0x0A 0x0C 0x10+4+2+2+2 +2+2-39-41
t12 t13 t14 t15
0x39 0x12 0x14 0x16
t8 t1 t2 t3
0x08 0x45 0x42 0x3F-3-3-3+2 +2+2-58-3
t4 t5 t6 t7
0x3C 0x02 0x04 0x06
(a) at time t3 (b) at time t7
(c) at time t8 (d) at time t15
trend of -3 trend of -3 disappears, no major new trend
trend of +2 detected trend of +2 detected among irregularities
28
![Page 29: Effectively Prefetching Remote Memory with Leap · Life of a Page w/ Leap User Space Kernel Space Remote Memory Memory Management Unit (MMU) Process 1 Process 2 … Process N Page](https://reader033.fdocuments.us/reader033/viewer/2022060710/6076929b82e4d031dc2ac96f/html5/thumbnails/29.jpg)
Prefetch Window Size Detection
29
Cache hit indicates prefetch utilization
High cache hit: increase prefetch window aggressively
No cache hit
Gradual slow down helps during sudden changes
trend availability: increase prefetch window gradually
no trend: decrease prefetch window gradually
![Page 30: Effectively Prefetching Remote Memory with Leap · Life of a Page w/ Leap User Space Kernel Space Remote Memory Memory Management Unit (MMU) Process 1 Process 2 … Process N Page](https://reader033.fdocuments.us/reader033/viewer/2022060710/6076929b82e4d031dc2ac96f/html5/thumbnails/30.jpg)
Evaluation
Memory Disaggregation Frameworks
Deploy and evaluate over 56 Gbps InfiniBand network
30
Disaggregated VMM: Infiniswap
Disaggregated VFS: Remote Regions
![Page 31: Effectively Prefetching Remote Memory with Leap · Life of a Page w/ Leap User Space Kernel Space Remote Memory Memory Management Unit (MMU) Process 1 Process 2 … Process N Page](https://reader033.fdocuments.us/reader033/viewer/2022060710/6076929b82e4d031dc2ac96f/html5/thumbnails/31.jpg)
Lowers Remote Page Access Latency by…Sequential Access
0
0.2
0.4
0.6
0.8
1
0.01 1 100 10000
CD
F
Latency (us)
Infiniswap
Infiniswap+Leap
Stride Access
0
0.2
0.4
0.6
0.8
1
0.01 1 100 10000
CD
F
Latency (us)
31
![Page 32: Effectively Prefetching Remote Memory with Leap · Life of a Page w/ Leap User Space Kernel Space Remote Memory Memory Management Unit (MMU) Process 1 Process 2 … Process N Page](https://reader033.fdocuments.us/reader033/viewer/2022060710/6076929b82e4d031dc2ac96f/html5/thumbnails/32.jpg)
Efficient Pattern Detection
Detects 29.70% more sequential accesses
Detects most of the irregularity
32
![Page 33: Effectively Prefetching Remote Memory with Leap · Life of a Page w/ Leap User Space Kernel Space Remote Memory Memory Management Unit (MMU) Process 1 Process 2 … Process N Page](https://reader033.fdocuments.us/reader033/viewer/2022060710/6076929b82e4d031dc2ac96f/html5/thumbnails/33.jpg)
Efficient Pattern Detection
Detects 29.70% more sequential accesses
Detects most of the irregularity
During irregularities, doing nothing helps the most
33
![Page 34: Effectively Prefetching Remote Memory with Leap · Life of a Page w/ Leap User Space Kernel Space Remote Memory Memory Management Unit (MMU) Process 1 Process 2 … Process N Page](https://reader033.fdocuments.us/reader033/viewer/2022060710/6076929b82e4d031dc2ac96f/html5/thumbnails/34.jpg)
Perform Great Even After Memory Runs Out
TPC-C on VoltDB
37.00
27.74
19.33
1.50
5
10
15
20
25
30
35
40
100% 75% 50% 25%
TPS
(T
hous
ands
)
In-Memory Working Set
Infiniswap
37 36.3 35.6
15.6
0
5
10
15
20
25
30
35
40
100% 75% 50% 25%
TPS
(T
hous
ands
)
In-Memory Working Set
TPC-C on VoltDB
Infiniswap + Leap
34
38.61
6.61
1.010
5
10
15
20
25
30
35
40
100% 75% 50%
TPS
(T
hous
ands
)
In-Memory Working Set
Disk
TPC-C on VoltDB
![Page 35: Effectively Prefetching Remote Memory with Leap · Life of a Page w/ Leap User Space Kernel Space Remote Memory Memory Management Unit (MMU) Process 1 Process 2 … Process N Page](https://reader033.fdocuments.us/reader033/viewer/2022060710/6076929b82e4d031dc2ac96f/html5/thumbnails/35.jpg)
Perform Great Even After Memory Runs Out
TPC-C on VoltDB
37.00
27.74
19.33
1.50
5
10
15
20
25
30
35
40
100% 75% 50% 25%
TPS
(T
hous
ands
)
In-Memory Working Set
Infiniswap
37 36.3 35.6
15.6
0
5
10
15
20
25
30
35
40
100% 75% 50% 25%
TPS
(T
hous
ands
)
In-Memory Working Set
TPC-C on VoltDB
Infiniswap + Leap
35
38.61
6.62
1.01 Fails0
5
10
15
20
25
30
35
40
100% 75% 50% 25%
TPS
(T
hous
ands
)
In-Memory Working Set
Disk
TPC-C on VoltDB
![Page 36: Effectively Prefetching Remote Memory with Leap · Life of a Page w/ Leap User Space Kernel Space Remote Memory Memory Management Unit (MMU) Process 1 Process 2 … Process N Page](https://reader033.fdocuments.us/reader033/viewer/2022060710/6076929b82e4d031dc2ac96f/html5/thumbnails/36.jpg)
Benefit Breakdown of Leap’s Components
Data path optimizations: single-μs latency till 95th percentile
Prefetcher: sub-μs latency till 85th percentile
Eager cache eviction: improves the 99thpercentile latency by 22%
36
![Page 37: Effectively Prefetching Remote Memory with Leap · Life of a Page w/ Leap User Space Kernel Space Remote Memory Memory Management Unit (MMU) Process 1 Process 2 … Process N Page](https://reader033.fdocuments.us/reader033/viewer/2022060710/6076929b82e4d031dc2ac96f/html5/thumbnails/37.jpg)
Future Work1. Thread-specific prefetching for multiple concurrent streams• memory is managed at the process level • this requires significant changes in virtual memory subsystem
2. Optimized remote I/O interface • load balancing, • fault-tolerance, • data locality, and • application-specific isolation in remote memory
37
![Page 38: Effectively Prefetching Remote Memory with Leap · Life of a Page w/ Leap User Space Kernel Space Remote Memory Memory Management Unit (MMU) Process 1 Process 2 … Process N Page](https://reader033.fdocuments.us/reader033/viewer/2022060710/6076929b82e4d031dc2ac96f/html5/thumbnails/38.jpg)
Leap
38
Lightweight and efficient data path for remote memorysource code available at https://github.com/SymbioticLab/leap
Online prefetcher with a leaner data path and eager cache eviction policy to improve• cache hit,• remote I/O latency, and• application-level performance
without modifying any • application, or• hardware
![Page 39: Effectively Prefetching Remote Memory with Leap · Life of a Page w/ Leap User Space Kernel Space Remote Memory Memory Management Unit (MMU) Process 1 Process 2 … Process N Page](https://reader033.fdocuments.us/reader033/viewer/2022060710/6076929b82e4d031dc2ac96f/html5/thumbnails/39.jpg)
Thank You!source code available at https://github.com/SymbioticLab/leap
39