Waqar Ali, Michael Bechtel, Heechul Yun University of Kansas · given bandwidth budget set by the...
Transcript of Waqar Ali, Michael Bechtel, Heechul Yun University of Kansas · given bandwidth budget set by the...
![Page 1: Waqar Ali, Michael Bechtel, Heechul Yun University of Kansas · given bandwidth budget set by the RT task 8 0 1ms 2ms Budget Core activity 2 1 computation memory fetch * Yun et al.,](https://reader033.fdocuments.us/reader033/viewer/2022042808/5f8a46201d6eba04de2d8a35/html5/thumbnails/1.jpg)
Analyzable and Practical Real-Time GangScheduling on Multicore Using RT-Gang
Waqar Ali, Michael Bechtel, Heechul Yun
University of Kansas
![Page 2: Waqar Ali, Michael Bechtel, Heechul Yun University of Kansas · given bandwidth budget set by the RT task 8 0 1ms 2ms Budget Core activity 2 1 computation memory fetch * Yun et al.,](https://reader033.fdocuments.us/reader033/viewer/2022042808/5f8a46201d6eba04de2d8a35/html5/thumbnails/2.jpg)
Outline
• RT-Gang
• Tutorial
• DeepPicar Case Study
2
![Page 3: Waqar Ali, Michael Bechtel, Heechul Yun University of Kansas · given bandwidth budget set by the RT task 8 0 1ms 2ms Budget Core activity 2 1 computation memory fetch * Yun et al.,](https://reader033.fdocuments.us/reader033/viewer/2022042808/5f8a46201d6eba04de2d8a35/html5/thumbnails/3.jpg)
Multicore Processors
• Provide high computing performance
• Needed for intelligent safety-critical real-time systems
3
![Page 4: Waqar Ali, Michael Bechtel, Heechul Yun University of Kansas · given bandwidth budget set by the RT task 8 0 1ms 2ms Budget Core activity 2 1 computation memory fetch * Yun et al.,](https://reader033.fdocuments.us/reader033/viewer/2022042808/5f8a46201d6eba04de2d8a35/html5/thumbnails/4.jpg)
Parallel Real-Time Tasks
• Many emerging workloads in AI, vision, robotics are parallel real-time tasks
4
Effect of parallelization on DNN control task
33% 50%
DNN based real-time control *
* M. Bojarski, "End to End Learning for Self-Driving Cars." arXiv:1604.07316, 2016
![Page 5: Waqar Ali, Michael Bechtel, Heechul Yun University of Kansas · given bandwidth budget set by the RT task 8 0 1ms 2ms Budget Core activity 2 1 computation memory fetch * Yun et al.,](https://reader033.fdocuments.us/reader033/viewer/2022042808/5f8a46201d6eba04de2d8a35/html5/thumbnails/5.jpg)
Effect of Co-Scheduling
• DNN control task suffers >10X slowdown
– Due to interference in shared memory hierarchy
5
0
2
4
6
8
10
12
DNN (Core 0,1) BwWrite (Core 2,3)
Norm
aliz
ed E
xeuction T
ime
SoloCorun
DRAM
LLC
Core1 Core2 Core3 Core4
DNN BwWrite
5%
10X
interference
It can be worse! (> 300X slowdown)*
* Michael G. Bechtel and Heechul Yun. “Denial-of-Service Attacks on Shared Cache in Multicore: Analysis and Prevention.” In RTAS, 2019
![Page 6: Waqar Ali, Michael Bechtel, Heechul Yun University of Kansas · given bandwidth budget set by the RT task 8 0 1ms 2ms Budget Core activity 2 1 computation memory fetch * Yun et al.,](https://reader033.fdocuments.us/reader033/viewer/2022042808/5f8a46201d6eba04de2d8a35/html5/thumbnails/6.jpg)
Observations
• Interference in shared memory hierarchy– Can be very high and unpredictable– Depends on the hardware (black box)
• Constructive sharing (Good)– Between threads of a single parallel task
• Destructive sharing (Bad)– Between threads of different tasks
• Goal: analyzable and efficient parallel real-time task scheduling framework for multicore– By avoiding destructive sharing
6
![Page 7: Waqar Ali, Michael Bechtel, Heechul Yun University of Kansas · given bandwidth budget set by the RT task 8 0 1ms 2ms Budget Core activity 2 1 computation memory fetch * Yun et al.,](https://reader033.fdocuments.us/reader033/viewer/2022042808/5f8a46201d6eba04de2d8a35/html5/thumbnails/7.jpg)
RT-Gang
• One (parallel) real-time task---a gang---at a time– Eliminate inter-task interference by construction
• Schedule best-effort tasks during slacks w/ throttling– Improve utilization with bounded impacts on the RT tasks
7* Waqar Ali and Heechul Yun. RT-Gang: Real-Time Gang Scheduling Framework for Safety-Critical Systems. In RTAS, 2019.
![Page 8: Waqar Ali, Michael Bechtel, Heechul Yun University of Kansas · given bandwidth budget set by the RT task 8 0 1ms 2ms Budget Core activity 2 1 computation memory fetch * Yun et al.,](https://reader033.fdocuments.us/reader033/viewer/2022042808/5f8a46201d6eba04de2d8a35/html5/thumbnails/8.jpg)
Safe Best-Effort Task Throttling
• Throttle the best-effort core(s) if it exceeds a given bandwidth budget set by the RT task
8
1ms 2ms0
Budget
Core
activity
21
computation memory fetch
* Yun et al., “MemGuard: Memory Bandwidth Reservation System for Efficient Performance Isolation in Multi-core Platforms.” In RTAS, 2013
Basic throttling mechanism *
* W. Ali and H. Yun., “Protecting Real-Time GPU Kernels on Integrated CPU-GPU SoC Platforms.” In ECRTS, 2018
![Page 9: Waqar Ali, Michael Bechtel, Heechul Yun University of Kansas · given bandwidth budget set by the RT task 8 0 1ms 2ms Budget Core activity 2 1 computation memory fetch * Yun et al.,](https://reader033.fdocuments.us/reader033/viewer/2022042808/5f8a46201d6eba04de2d8a35/html5/thumbnails/9.jpg)
Implementation
• Modified Linux’s RT scheduler
– Implemented as a “feature” of SCHED_FIFO (sched/rt.c)
• Best-effort task throttling
– A separate kernel module based on BWLOCK++ *
9* W. Ali and H. Yun., “Protecting Real-Time GPU Kernels on Integrated CPU-GPU SoC Platforms.” In ECRTS, 2018
![Page 10: Waqar Ali, Michael Bechtel, Heechul Yun University of Kansas · given bandwidth budget set by the RT task 8 0 1ms 2ms Budget Core activity 2 1 computation memory fetch * Yun et al.,](https://reader033.fdocuments.us/reader033/viewer/2022042808/5f8a46201d6eba04de2d8a35/html5/thumbnails/10.jpg)
Outline
• RT-Gang
• Tutorial
• DeepPicar Case Study
10
![Page 11: Waqar Ali, Michael Bechtel, Heechul Yun University of Kansas · given bandwidth budget set by the RT task 8 0 1ms 2ms Budget Core activity 2 1 computation memory fetch * Yun et al.,](https://reader033.fdocuments.us/reader033/viewer/2022042808/5f8a46201d6eba04de2d8a35/html5/thumbnails/11.jpg)
Source Code Repository
• git clone https://github.com/CSL-KU/RT-Gang
11
![Page 12: Waqar Ali, Michael Bechtel, Heechul Yun University of Kansas · given bandwidth budget set by the RT task 8 0 1ms 2ms Budget Core activity 2 1 computation memory fetch * Yun et al.,](https://reader033.fdocuments.us/reader033/viewer/2022042808/5f8a46201d6eba04de2d8a35/html5/thumbnails/12.jpg)
Installation
• From the Linux kernel directory:
– patch -p1 < ../RT-Gang/rtgang-v4.19.patch
– Compile & install & restart
• To check if installed correctly:– sudo cat /sys/kernel/debug/sched_features | grep RT_GANG_LOCK
12
![Page 13: Waqar Ali, Michael Bechtel, Heechul Yun University of Kansas · given bandwidth budget set by the RT task 8 0 1ms 2ms Budget Core activity 2 1 computation memory fetch * Yun et al.,](https://reader033.fdocuments.us/reader033/viewer/2022042808/5f8a46201d6eba04de2d8a35/html5/thumbnails/13.jpg)
Enable/Disable RT-Gang
• RT-Gang is enabled/disabled through the kernel's scheduling feature
13
![Page 14: Waqar Ali, Michael Bechtel, Heechul Yun University of Kansas · given bandwidth budget set by the RT task 8 0 1ms 2ms Budget Core activity 2 1 computation memory fetch * Yun et al.,](https://reader033.fdocuments.us/reader033/viewer/2022042808/5f8a46201d6eba04de2d8a35/html5/thumbnails/14.jpg)
Best-Effort Task Throttling
• Throttling is enabled through a kernel module– cd RT-Gang/throttling/kernel_module
– make
– sudo insmod exe/bwlockmod.ko
14
![Page 15: Waqar Ali, Michael Bechtel, Heechul Yun University of Kansas · given bandwidth budget set by the RT task 8 0 1ms 2ms Budget Core activity 2 1 computation memory fetch * Yun et al.,](https://reader033.fdocuments.us/reader033/viewer/2022042808/5f8a46201d6eba04de2d8a35/html5/thumbnails/15.jpg)
Best-Effort Task Throttling
• Only occurs when a real-time task is running– W/o real-time task
– W/ real-time task
15
![Page 16: Waqar Ali, Michael Bechtel, Heechul Yun University of Kansas · given bandwidth budget set by the RT task 8 0 1ms 2ms Budget Core activity 2 1 computation memory fetch * Yun et al.,](https://reader033.fdocuments.us/reader033/viewer/2022042808/5f8a46201d6eba04de2d8a35/html5/thumbnails/16.jpg)
Outline
• RT-Gang
• Tutorial
• DeepPicar Case Study
16
![Page 17: Waqar Ali, Michael Bechtel, Heechul Yun University of Kansas · given bandwidth budget set by the RT task 8 0 1ms 2ms Budget Core activity 2 1 computation memory fetch * Yun et al.,](https://reader033.fdocuments.us/reader033/viewer/2022042808/5f8a46201d6eba04de2d8a35/html5/thumbnails/17.jpg)
DeepPicar
• A low cost, small scale replication of NVIDIA’s DAVE-2
• Uses the exact same DNN
• Runs on a Raspberry Pi 3 in real-time
17* Bechtel et al. DeepPicar: A Low-cost Deep Neural Network-based Autonomous Car. In RTCSA, 2018
https://github.com/mbechtel2/DeepPicar-v2
![Page 18: Waqar Ali, Michael Bechtel, Heechul Yun University of Kansas · given bandwidth budget set by the RT task 8 0 1ms 2ms Budget Core activity 2 1 computation memory fetch * Yun et al.,](https://reader033.fdocuments.us/reader033/viewer/2022042808/5f8a46201d6eba04de2d8a35/html5/thumbnails/18.jpg)
DNN based Real-Time Control
• DNN Inferencing is the most compute intensive part.
• Parallelized by TensorFlow to utilize multiple cores.
18
![Page 19: Waqar Ali, Michael Bechtel, Heechul Yun University of Kansas · given bandwidth budget set by the RT task 8 0 1ms 2ms Budget Core activity 2 1 computation memory fetch * Yun et al.,](https://reader033.fdocuments.us/reader033/viewer/2022042808/5f8a46201d6eba04de2d8a35/html5/thumbnails/19.jpg)
Experiment Setup
• DNN control task of DeepPicar (real-world RT)
• IsolBench BwWrite benchmark (synthetic RT)
• Parboil benchmarks (real-world BE)
19
Task WCET (C ms)
Period (P ms)
# Threads
34 100 2
220 340 2
∞ N/A 4
∞ N/A 4DRAM
LLC
Core1 Core2 Core3 Core4
DNN BwWrite
Parboil cutcp & lbm
RT
BE
![Page 20: Waqar Ali, Michael Bechtel, Heechul Yun University of Kansas · given bandwidth budget set by the RT task 8 0 1ms 2ms Budget Core activity 2 1 computation memory fetch * Yun et al.,](https://reader033.fdocuments.us/reader033/viewer/2022042808/5f8a46201d6eba04de2d8a35/html5/thumbnails/20.jpg)
Execution Time Distribution
• RT-Gang achieves deterministic timing
20
What does this look like in the real world?
![Page 23: Waqar Ali, Michael Bechtel, Heechul Yun University of Kansas · given bandwidth budget set by the RT task 8 0 1ms 2ms Budget Core activity 2 1 computation memory fetch * Yun et al.,](https://reader033.fdocuments.us/reader033/viewer/2022042808/5f8a46201d6eba04de2d8a35/html5/thumbnails/23.jpg)
Conclusion
• Parallel real-time task scheduling
– Hard to analyze on COTS multicore
– Due to interference in shared memory hierarchy
• RT-Gang
– Analyzable and efficient parallel real-time gang scheduling framework, implemented in Linux
– Avoid interference by construction
• Can protect critical real-time tasks
23
https://github.com/CSL-KU/rt-gang
![Page 24: Waqar Ali, Michael Bechtel, Heechul Yun University of Kansas · given bandwidth budget set by the RT task 8 0 1ms 2ms Budget Core activity 2 1 computation memory fetch * Yun et al.,](https://reader033.fdocuments.us/reader033/viewer/2022042808/5f8a46201d6eba04de2d8a35/html5/thumbnails/24.jpg)
Thank You!
Disclaimer:
This research is supported by NSF CNS 1718880, CNS 1815959, and NSA Science of Security initiative contract #H98230-18-D-0009.
24