K2: Work-Constraining Scheduling of NVMe-Attached...
Transcript of K2: Work-Constraining Scheduling of NVMe-Attached...
![Page 1: K2: Work-Constraining Scheduling of NVMe-Attached Storage2019.rtss.org/wp-content/uploads/2020/05/277... · 2020. 5. 19. · K2: Work-Constraining Scheduling of NVMe-Attached Storage](https://reader035.fdocuments.us/reader035/viewer/2022071608/6145e22a8f9ff812541fe926/html5/thumbnails/1.jpg)
K2: Work-Constraining Scheduling of NVMe-Attached Storage
Till Miemietz, Hannes Weisbach, Michael Roitzsch and Hermann Härtig
Presentation at the 40 th IEEE Real-Time Systems Symposium
Hong Kong ⚫ 4th of December 2019
![Page 2: K2: Work-Constraining Scheduling of NVMe-Attached Storage2019.rtss.org/wp-content/uploads/2020/05/277... · 2020. 5. 19. · K2: Work-Constraining Scheduling of NVMe-Attached Storage](https://reader035.fdocuments.us/reader035/viewer/2022071608/6145e22a8f9ff812541fe926/html5/thumbnails/2.jpg)
What are the implications of fast storage devices for
real-time systems?
2
![Page 3: K2: Work-Constraining Scheduling of NVMe-Attached Storage2019.rtss.org/wp-content/uploads/2020/05/277... · 2020. 5. 19. · K2: Work-Constraining Scheduling of NVMe-Attached Storage](https://reader035.fdocuments.us/reader035/viewer/2022071608/6145e22a8f9ff812541fe926/html5/thumbnails/3.jpg)
What May a Modern Storage Stack Look Like?
SSD
Driver
Block Layer
Apps (e.g., File Systems)
1 2 3
Apps @ CPU 0 Apps @ CPU 1 Apps @ CPU 2
I/O scheduler-specific staging queue scheme
32
1
NVMe Queue Pair NVMe Queue Pair
NVMe commands
bios
requests
4
Dispatching Queues(FIFO only)
![Page 4: K2: Work-Constraining Scheduling of NVMe-Attached Storage2019.rtss.org/wp-content/uploads/2020/05/277... · 2020. 5. 19. · K2: Work-Constraining Scheduling of NVMe-Attached Storage](https://reader035.fdocuments.us/reader035/viewer/2022071608/6145e22a8f9ff812541fe926/html5/thumbnails/4.jpg)
What May a Modern Storage Stack Look Like?
SSD
Driver
Block Layer
Apps (e.g., File Systems)
Controller with Flash Translation Layer (FTL)
Flash Package Flash Package
Block Block
P P
SQ CQ SQ CQ SQ CQ
NVMe commands
bios
requests
5
![Page 5: K2: Work-Constraining Scheduling of NVMe-Attached Storage2019.rtss.org/wp-content/uploads/2020/05/277... · 2020. 5. 19. · K2: Work-Constraining Scheduling of NVMe-Attached Storage](https://reader035.fdocuments.us/reader035/viewer/2022071608/6145e22a8f9ff812541fe926/html5/thumbnails/5.jpg)
Latency Characteristics of SSDs – Motivation
● Gap between CPU and storage devices is shrinking
→ Speedup of 1000x compared to HDDs
→ Up to 40% of storage latency caused by software
● High degree of abstraction (FTL) is source of non-determinism
→ Garbage Collection
→ Caching
→ Scheduling of multiple NVMe queue pairs
● Can host-sided I/O schedulers still be used to enforce latency goals?
6
![Page 6: K2: Work-Constraining Scheduling of NVMe-Attached Storage2019.rtss.org/wp-content/uploads/2020/05/277... · 2020. 5. 19. · K2: Work-Constraining Scheduling of NVMe-Attached Storage](https://reader035.fdocuments.us/reader035/viewer/2022071608/6145e22a8f9ff812541fe926/html5/thumbnails/6.jpg)
Dissecting Linux' Block I/O Schedulers
● Performing micro benchmarks in a simulated real-time scenario
→ Samsung 970evo (250GB, NVMe 1.3) on a desktop system
● Multiple background processes used to create load on the drive
→ Goal is bandwidth maximization
● Single foreground high-priority process that periodically issues requests
→ Goal is fast access to the drive
● Analyse latency characteristics of RT process using different I/O schedulers
7
![Page 7: K2: Work-Constraining Scheduling of NVMe-Attached Storage2019.rtss.org/wp-content/uploads/2020/05/277... · 2020. 5. 19. · K2: Work-Constraining Scheduling of NVMe-Attached Storage](https://reader035.fdocuments.us/reader035/viewer/2022071608/6145e22a8f9ff812541fe926/html5/thumbnails/7.jpg)
Block I/O Schedulers in Linux 4.15
● None (default)
● mq-deadline
→ Orders requests by target block address
● Kyber
→ Core-local, balances latencies of coarse-grained request classes
● Bandwidth Fair Queuing (BFQ)
→ Bandwidth control per process
→ Aware of I/O priorities
8
![Page 8: K2: Work-Constraining Scheduling of NVMe-Attached Storage2019.rtss.org/wp-content/uploads/2020/05/277... · 2020. 5. 19. · K2: Work-Constraining Scheduling of NVMe-Attached Storage](https://reader035.fdocuments.us/reader035/viewer/2022071608/6145e22a8f9ff812541fe926/html5/thumbnails/8.jpg)
Latency Characteristics for Random Reads
● Both real-time and background processes issuing small random reads (4K)
→ X-axis shows achieved bandwidth, color of plotting symbols depicts targeted throughput
● Plots look similar for larger block sizes
9
![Page 9: K2: Work-Constraining Scheduling of NVMe-Attached Storage2019.rtss.org/wp-content/uploads/2020/05/277... · 2020. 5. 19. · K2: Work-Constraining Scheduling of NVMe-Attached Storage](https://reader035.fdocuments.us/reader035/viewer/2022071608/6145e22a8f9ff812541fe926/html5/thumbnails/9.jpg)
Latency Characteristics for Write-Only Workloads
● Writing is very fast as long as drive-internal SLC cache is accessed
● Garbage collection drastically increases storage latency of RT process
10
![Page 10: K2: Work-Constraining Scheduling of NVMe-Attached Storage2019.rtss.org/wp-content/uploads/2020/05/277... · 2020. 5. 19. · K2: Work-Constraining Scheduling of NVMe-Attached Storage](https://reader035.fdocuments.us/reader035/viewer/2022071608/6145e22a8f9ff812541fe926/html5/thumbnails/10.jpg)
Dissecting Linux’ I/O Schedulers – Lessons learned
● Reading is often slower than writing
→ Caching can avoid synchronous access of second-level flash cells when writing
● No performance isolation
→ Real-time process faces high latency when SSD is fully loaded
→ BFQ is unable to enforce priorities correctly
● From latency viewpoint I/O schedulers show little differences
→ However, complex implementations have latency penalties of up to 10%
11
![Page 11: K2: Work-Constraining Scheduling of NVMe-Attached Storage2019.rtss.org/wp-content/uploads/2020/05/277... · 2020. 5. 19. · K2: Work-Constraining Scheduling of NVMe-Attached Storage](https://reader035.fdocuments.us/reader035/viewer/2022071608/6145e22a8f9ff812541fe926/html5/thumbnails/11.jpg)
K2: Work-Constraining I/O Scheduling
● Work-conserving behavior of current schedulers is not optimal w.r.t. latencies
→ Stalls high-priority read requests
→ Amplifies effects of garbage collection
● Concept: Limit the number of requests that are served in parallel
→ Device-wide limit of inflight requests
→ Requests are stored in per-priority FIFO queues
→ Submit new requests on completion of previous ones
● Queue length as a tunable parameter
→ Trade global throughput for softly bounded I/O latency of high-priority processes
12
![Page 12: K2: Work-Constraining Scheduling of NVMe-Attached Storage2019.rtss.org/wp-content/uploads/2020/05/277... · 2020. 5. 19. · K2: Work-Constraining Scheduling of NVMe-Attached Storage](https://reader035.fdocuments.us/reader035/viewer/2022071608/6145e22a8f9ff812541fe926/html5/thumbnails/12.jpg)
Evaluation of K2 – Random Reads (64K)
● Limiting the length of the device queues enforces correct service order
→ Tested K2 with a queue length of 8, 16, and 32
→ Note the different scales of the y-axis!
● Host gains flexibility to enforce quick submission of real-time requests
13
![Page 13: K2: Work-Constraining Scheduling of NVMe-Attached Storage2019.rtss.org/wp-content/uploads/2020/05/277... · 2020. 5. 19. · K2: Work-Constraining Scheduling of NVMe-Attached Storage](https://reader035.fdocuments.us/reader035/viewer/2022071608/6145e22a8f9ff812541fe926/html5/thumbnails/13.jpg)
Evaluation of K2 – Sequential Write Operations (64K)
● Performance for very fast operations similar to mq-deadline and Kyber
● K2 can not avoid garbage collection but mitigates impact on RT applications
14
![Page 14: K2: Work-Constraining Scheduling of NVMe-Attached Storage2019.rtss.org/wp-content/uploads/2020/05/277... · 2020. 5. 19. · K2: Work-Constraining Scheduling of NVMe-Attached Storage](https://reader035.fdocuments.us/reader035/viewer/2022071608/6145e22a8f9ff812541fe926/html5/thumbnails/14.jpg)
Evaluation of K2 – Application Benchmark
● Tested read-only OLTP benchmarks of sysbench with with read / write background load
15
![Page 15: K2: Work-Constraining Scheduling of NVMe-Attached Storage2019.rtss.org/wp-content/uploads/2020/05/277... · 2020. 5. 19. · K2: Work-Constraining Scheduling of NVMe-Attached Storage](https://reader035.fdocuments.us/reader035/viewer/2022071608/6145e22a8f9ff812541fe926/html5/thumbnails/15.jpg)
Summary
● Fast SSDs impose new challenges on the OS to enforce timely access to storage
→ Complex abstractions cause non-determinism of performance parameters
● Current I/O schedulers are not suitable for real-time demands
→ No performance isolation
→ Overridden by drive-internal scheduler
● K2: work-constraining I/O scheduling to limit storage access latency
→ Trade throughput for lower tail-latencies
→ Improve worst-case latency up to 10x for reading, 6.8x for writing
→ Works with components-off-the-shelf
16
![Page 16: K2: Work-Constraining Scheduling of NVMe-Attached Storage2019.rtss.org/wp-content/uploads/2020/05/277... · 2020. 5. 19. · K2: Work-Constraining Scheduling of NVMe-Attached Storage](https://reader035.fdocuments.us/reader035/viewer/2022071608/6145e22a8f9ff812541fe926/html5/thumbnails/16.jpg)
Additional Slides
![Page 17: K2: Work-Constraining Scheduling of NVMe-Attached Storage2019.rtss.org/wp-content/uploads/2020/05/277... · 2020. 5. 19. · K2: Work-Constraining Scheduling of NVMe-Attached Storage](https://reader035.fdocuments.us/reader035/viewer/2022071608/6145e22a8f9ff812541fe926/html5/thumbnails/17.jpg)
Latency Characteristics for Random Reads (All Percentiles)
● Both real-time and background processes issuing small random reads (4K)
19
![Page 18: K2: Work-Constraining Scheduling of NVMe-Attached Storage2019.rtss.org/wp-content/uploads/2020/05/277... · 2020. 5. 19. · K2: Work-Constraining Scheduling of NVMe-Attached Storage](https://reader035.fdocuments.us/reader035/viewer/2022071608/6145e22a8f9ff812541fe926/html5/thumbnails/18.jpg)
Latency Characteristics for Write-Only Workloads
● Garbage collection also affects lower percentiles
20
![Page 19: K2: Work-Constraining Scheduling of NVMe-Attached Storage2019.rtss.org/wp-content/uploads/2020/05/277... · 2020. 5. 19. · K2: Work-Constraining Scheduling of NVMe-Attached Storage](https://reader035.fdocuments.us/reader035/viewer/2022071608/6145e22a8f9ff812541fe926/html5/thumbnails/19.jpg)
Latency Characteristics for Mixed Workloads (64K)
● Access times of real-time application are similar to reading
21
![Page 20: K2: Work-Constraining Scheduling of NVMe-Attached Storage2019.rtss.org/wp-content/uploads/2020/05/277... · 2020. 5. 19. · K2: Work-Constraining Scheduling of NVMe-Attached Storage](https://reader035.fdocuments.us/reader035/viewer/2022071608/6145e22a8f9ff812541fe926/html5/thumbnails/20.jpg)
Impact of Scheduler Complexity
● For small random writes, complex policies have a notable impact on overall latency
22
![Page 21: K2: Work-Constraining Scheduling of NVMe-Attached Storage2019.rtss.org/wp-content/uploads/2020/05/277... · 2020. 5. 19. · K2: Work-Constraining Scheduling of NVMe-Attached Storage](https://reader035.fdocuments.us/reader035/viewer/2022071608/6145e22a8f9ff812541fe926/html5/thumbnails/21.jpg)
Evaluation of K2 – Random Reads (64K, All Percentiles)
● Limiting the length of the device queues enforces correct service order
23
![Page 22: K2: Work-Constraining Scheduling of NVMe-Attached Storage2019.rtss.org/wp-content/uploads/2020/05/277... · 2020. 5. 19. · K2: Work-Constraining Scheduling of NVMe-Attached Storage](https://reader035.fdocuments.us/reader035/viewer/2022071608/6145e22a8f9ff812541fe926/html5/thumbnails/22.jpg)
Evaluation of K2 – Mixed Workloads (64K)
● Reduction of tail latencies is also present for mixed read / write requests
● Bandwidth penalty reduced by fast write operations
24
![Page 23: K2: Work-Constraining Scheduling of NVMe-Attached Storage2019.rtss.org/wp-content/uploads/2020/05/277... · 2020. 5. 19. · K2: Work-Constraining Scheduling of NVMe-Attached Storage](https://reader035.fdocuments.us/reader035/viewer/2022071608/6145e22a8f9ff812541fe926/html5/thumbnails/23.jpg)
● Table shows 99.9th percentile of latency at maximum throughput
→ Throughput loss of K2 is mitigated by large block sizes
Evaluation – Comparison of I/O Schedulers
25
![Page 24: K2: Work-Constraining Scheduling of NVMe-Attached Storage2019.rtss.org/wp-content/uploads/2020/05/277... · 2020. 5. 19. · K2: Work-Constraining Scheduling of NVMe-Attached Storage](https://reader035.fdocuments.us/reader035/viewer/2022071608/6145e22a8f9ff812541fe926/html5/thumbnails/24.jpg)
● Table shows throughput at maximum 99.9th percentile of latency
Evaluation – Comparison of I/O Schedulers
26