Really Good Bulletin Boards: Making Your Classroom Walls Work ...
Making OpenVX Really “Real Time”
Transcript of Making OpenVX Really “Real Time”
![Page 1: Making OpenVX Really “Real Time”](https://reader034.fdocuments.us/reader034/viewer/2022042313/625c48c74d2f020e2569b347/html5/thumbnails/1.jpg)
Making OpenVX Really “Real Time”
Ming Yang1, Tanya Amert1, Kecheng Yang1,2, Nathan Otterness1, James H. Anderson1, F. Donelson Smith1, and Shige Wang3
1The University of North Carolina at Chapel Hill 2Texas State University
3General Motors Research
![Page 2: Making OpenVX Really “Real Time”](https://reader034.fdocuments.us/reader034/viewer/2022042313/625c48c74d2f020e2569b347/html5/thumbnails/2.jpg)
700 ms
![Page 3: Making OpenVX Really “Real Time”](https://reader034.fdocuments.us/reader034/viewer/2022042313/625c48c74d2f020e2569b347/html5/thumbnails/3.jpg)
![Page 4: Making OpenVX Really “Real Time”](https://reader034.fdocuments.us/reader034/viewer/2022042313/625c48c74d2f020e2569b347/html5/thumbnails/4.jpg)
A new approach for
graph scheduling
![Page 5: Making OpenVX Really “Real Time”](https://reader034.fdocuments.us/reader034/viewer/2022042313/625c48c74d2f020e2569b347/html5/thumbnails/5.jpg)
Shorter response time +
Less capacity loss
![Page 6: Making OpenVX Really “Real Time”](https://reader034.fdocuments.us/reader034/viewer/2022042313/625c48c74d2f020e2569b347/html5/thumbnails/6.jpg)
1. State of the art
2. Our approach
3. Future work
!6
![Page 7: Making OpenVX Really “Real Time”](https://reader034.fdocuments.us/reader034/viewer/2022042313/625c48c74d2f020e2569b347/html5/thumbnails/7.jpg)
!7 Source: https://www.khronos.org/openvx/
OpenVX Node
OpenVX Node
OpenVX Node
OpenVX Node
Example OpenVX Graph
Native Camera Control
Downstream Application Processing
Graph-based architecture
Application Application
GPU FPGA DSP
Portability to diverse hardware
Does OpenVX really target “real-time” processing?
![Page 8: Making OpenVX Really “Real Time”](https://reader034.fdocuments.us/reader034/viewer/2022042313/625c48c74d2f020e2569b347/html5/thumbnails/8.jpg)
!8 Source: https://www.khronos.org/openvx/
1. It lacks real-time concepts
OpenVX Node
OpenVX Node
OpenVX Node
OpenVX Node
Example OpenVX Graph
Native Camera Control
Downstream Application Processing
2. Entire graphs = monolithic schedulable entities
Does OpenVX really target “real-time” processing?
![Page 9: Making OpenVX Really “Real Time”](https://reader034.fdocuments.us/reader034/viewer/2022042313/625c48c74d2f020e2569b347/html5/thumbnails/9.jpg)
!9 Source: https://www.khronos.org/openvx/
1. It lacks real-time concepts2. Entire graphs = monolithic schedulable entities
DC
BA
Does OpenVX really target “real-time” processing?
![Page 10: Making OpenVX Really “Real Time”](https://reader034.fdocuments.us/reader034/viewer/2022042313/625c48c74d2f020e2569b347/html5/thumbnails/10.jpg)
DCBA
!10 Source: https://www.khronos.org/openvx/
1. It lacks real-time concepts2. Entire graphs = monolithic schedulable entities
DC
BA
Monolithic schedulingTime
A …A B C D
Does OpenVX really target “real-time” processing?
![Page 11: Making OpenVX Really “Real Time”](https://reader034.fdocuments.us/reader034/viewer/2022042313/625c48c74d2f020e2569b347/html5/thumbnails/11.jpg)
Prior Work• OpenVX nodes = schedulable entities [23, 51]
!11
Coarse-grained scheduling
D
C
B
A
Coarse-grained schedulingTime
A
B
C
D …
Task A:
Task B:
Task C:
Task D:
DC
BA
![Page 12: Making OpenVX Really “Real Time”](https://reader034.fdocuments.us/reader034/viewer/2022042313/625c48c74d2f020e2569b347/html5/thumbnails/12.jpg)
Prior Work• OpenVX nodes = schedulable entities [23, 51]
!12
Coarse-grained scheduling
Remaining problems: 1. More parallelism to be explored2. Suspension-oblivious analysis was applied and
causes capacity loss.
![Page 13: Making OpenVX Really “Real Time”](https://reader034.fdocuments.us/reader034/viewer/2022042313/625c48c74d2f020e2569b347/html5/thumbnails/13.jpg)
Fine-Grained SchedulingThis Work
![Page 14: Making OpenVX Really “Real Time”](https://reader034.fdocuments.us/reader034/viewer/2022042313/625c48c74d2f020e2569b347/html5/thumbnails/14.jpg)
!14
1. Coarse-grained vs. fine-grained
2. Response-time bounds analysis
3. Case study
![Page 15: Making OpenVX Really “Real Time”](https://reader034.fdocuments.us/reader034/viewer/2022042313/625c48c74d2f020e2569b347/html5/thumbnails/15.jpg)
!15
1. Coarse-grained vs. fine-grained
2. Response-time bounds analysis
3. Case study
![Page 16: Making OpenVX Really “Real Time”](https://reader034.fdocuments.us/reader034/viewer/2022042313/625c48c74d2f020e2569b347/html5/thumbnails/16.jpg)
Coarse-Grained Scheduling
!16
Time…
Task A:
Task B:
Task C:
Task D:
DC
BA
Suspension for GPU execution
Time
Task A:
Task E:
Task C:
Task D:
Task F:
Task G:
DC
A
GPU execution
E F G
Fine-Grained Scheduling
![Page 17: Making OpenVX Really “Real Time”](https://reader034.fdocuments.us/reader034/viewer/2022042313/625c48c74d2f020e2569b347/html5/thumbnails/17.jpg)
!17
1. Coarse-grained vs. fine-grained
2. Response-time bounds analysis
3. Case study
![Page 18: Making OpenVX Really “Real Time”](https://reader034.fdocuments.us/reader034/viewer/2022042313/625c48c74d2f020e2569b347/html5/thumbnails/18.jpg)
Deriving Response-Time Bounds for a DAG*
Step 1: Schedule the nodes as sporadic tasks
Step 2: Compute bounds for every node
Step 3: Sum the bounds of nodes on the critical path
!18
* C. Liu and J. Anderson, “Supporting Soft Real-Time DAG-based Systems on Multiprocessors with No Utilization Loss,” in RTSS, 2013.
![Page 19: Making OpenVX Really “Real Time”](https://reader034.fdocuments.us/reader034/viewer/2022042313/625c48c74d2f020e2569b347/html5/thumbnails/19.jpg)
!19
Deriving Response-Time Bounds for a DAG
DC
AB E F
![Page 20: Making OpenVX Really “Real Time”](https://reader034.fdocuments.us/reader034/viewer/2022042313/625c48c74d2f020e2569b347/html5/thumbnails/20.jpg)
!20
Deriving Response-Time Bounds for a DAG
DC
AB E F
![Page 21: Making OpenVX Really “Real Time”](https://reader034.fdocuments.us/reader034/viewer/2022042313/625c48c74d2f020e2569b347/html5/thumbnails/21.jpg)
CPU
GPU
!21
Deriving Response-Time Bounds for a DAG
DC
AB
E
F
…
…
Need a response-time bound analysis for GPU tasks
![Page 22: Making OpenVX Really “Real Time”](https://reader034.fdocuments.us/reader034/viewer/2022042313/625c48c74d2f020e2569b347/html5/thumbnails/22.jpg)
2048
2048
A system model of GPU Tasks
!22
τ1 = (3076,6,2,1024)
SM1
SM0
0 6 Time3
τi = (Ci, Ti, Bi, Hi)
Period
Number of blocks
Number of threads per block (or block size)
Per-block worst-case workload
C1
B1
H1 = 1024
T1
![Page 23: Making OpenVX Really “Real Time”](https://reader034.fdocuments.us/reader034/viewer/2022042313/625c48c74d2f020e2569b347/html5/thumbnails/23.jpg)
Response-Time Bounds Proof Sketch
!23
τk,j
rk,j + Rkτk
1. We first show the necessity of a total utilization bound and intra-task parallelism via counterexamples.
![Page 24: Making OpenVX Really “Real Time”](https://reader034.fdocuments.us/reader034/viewer/2022042313/625c48c74d2f020e2569b347/html5/thumbnails/24.jpg)
Response-Time Bounds Proof Sketch
!24
τk,j
rk,j + Rkτk
1. We first show the necessity of a total utilization bound and intra-task parallelism via counterexamples.
Time
Releases:
1 2 3 4 5
1
2
3
4
5
Without intra-task parallelism:
With intra-task parallelism:
![Page 25: Making OpenVX Really “Real Time”](https://reader034.fdocuments.us/reader034/viewer/2022042313/625c48c74d2f020e2569b347/html5/thumbnails/25.jpg)
Response-Time Bounds Proof Sketch
!25
τk,j
rk,j + Rkτk
SM1
SM0
Time
Rk
τk,j
rk,j
2. We then bound the unfinished workload from jobs released at
or before .rk,j
3. We prove the job finishes before
.rk,j + Rk
1. We first show the necessity of a total utilization bound and intra-task parallelism via counterexamples.
![Page 26: Making OpenVX Really “Real Time”](https://reader034.fdocuments.us/reader034/viewer/2022042313/625c48c74d2f020e2569b347/html5/thumbnails/26.jpg)
!26
1. Coarse-grained vs. fine-grained
2. Response-time bounds analysis
3. Case study
![Page 27: Making OpenVX Really “Real Time”](https://reader034.fdocuments.us/reader034/viewer/2022042313/625c48c74d2f020e2569b347/html5/thumbnails/27.jpg)
Case Study: Comparing Fine-Grained/Coarse-Grained/Monolithic Scheduling
!27
Resize Image Compute Gradients
Compute Orientation Histograms
Normalize Orientation HistogramsResize Image
Resize ImageCompute GradientsCompute
Gradients
Compute Orientation HistogramsCompute
Orientation Histograms
Normalize Orientation Histograms
Normalize Orientation Histograms
vxHOGCellsNodevxHOGCells
NodevxHOGFeature
sNodevxHOGFeaturesNodevxHOGFeaturesNode
vxHOGCellsNode
• Application: Histogram of Oriented Gradients (HOG)
CPU+GPU Execution (Coarse-Grained) GPU Execution (Fine-Grained)
![Page 28: Making OpenVX Really “Real Time”](https://reader034.fdocuments.us/reader034/viewer/2022042313/625c48c74d2f020e2569b347/html5/thumbnails/28.jpg)
• Application: Histogram of Oriented Gradients (HOG)
• 6 instances
• 33 ms period
• 30,000 samples
• Platform: NVIDIA Titan V GPU + Two eight-core Intel CPUs.
• Schedulers: G-EDF, G-FL (fair-lateness)
!28
Case Study: Comparing Fine-Grained/Coarse-Grained/Monolithic Scheduling
![Page 29: Making OpenVX Really “Real Time”](https://reader034.fdocuments.us/reader034/viewer/2022042313/625c48c74d2f020e2569b347/html5/thumbnails/29.jpg)
!29
Left is better
Time
% o
f sam
ples
50% samples have response time less
than 60 ms
Case Study: Comparing Fine-Grained/Coarse-Grained/Monolithic Scheduling
![Page 30: Making OpenVX Really “Real Time”](https://reader034.fdocuments.us/reader034/viewer/2022042313/625c48c74d2f020e2569b347/html5/thumbnails/30.jpg)
!30
FL: fair-lateness
[1] Fine-grained (G-FL) [2] Coarse-grained (G-EDF) [3] Monolithic (G-EDF)Average Response Time (ms) 65.99 136.57 84669.47
Maximum Response Time (ms) 125.66 427.07 170091.06
Case Study: Comparing Fine-Grained/Coarse-Grained/Monolithic Scheduling
![Page 31: Making OpenVX Really “Real Time”](https://reader034.fdocuments.us/reader034/viewer/2022042313/625c48c74d2f020e2569b347/html5/thumbnails/31.jpg)
!31
FL: fair-lateness
[1] Fine-grained (G-FL) [2] Coarse-grained (G-EDF) [3] Monolithic (G-EDF)Average Response Time (ms) 65.99 136.57 84669.47
Maximum Response Time (ms) 125.66 427.07 170091.06
Half the average response time
[1] [2]
Case Study: Comparing Fine-Grained/Coarse-Grained/Monolithic Scheduling
![Page 32: Making OpenVX Really “Real Time”](https://reader034.fdocuments.us/reader034/viewer/2022042313/625c48c74d2f020e2569b347/html5/thumbnails/32.jpg)
!32
FL: fair-lateness
[1] Fine-grained (G-FL) [2] Coarse-grained (G-EDF) [3] Monolithic (G-EDF)Average Response Time (ms) 65.99 136.57 84669.47
Maximum Response Time (ms) 125.66 427.07 170091.06
Half the average response time
One-third the maximum response time
[1] [2]
Case Study: Comparing Fine-Grained/Coarse-Grained/Monolithic Scheduling
![Page 33: Making OpenVX Really “Real Time”](https://reader034.fdocuments.us/reader034/viewer/2022042313/625c48c74d2f020e2569b347/html5/thumbnails/33.jpg)
!33
FL: fair-lateness
[1] Fine-grained (G-FL) [2] Coarse-grained (G-EDF) [3] Monolithic (G-EDF)Average Response Time (ms) 65.99 136.57 84669.47
Maximum Response Time (ms) 125.66 427.07 170091.06
[1] [2]
[3]
Half the average response time
One-third the maximum response time
Case Study: Comparing Fine-Grained/Coarse-Grained/Monolithic Scheduling
![Page 34: Making OpenVX Really “Real Time”](https://reader034.fdocuments.us/reader034/viewer/2022042313/625c48c74d2f020e2569b347/html5/thumbnails/34.jpg)
!34
FL: fair-lateness
[1] Fine-grained (G-FL) [2] Coarse-grained (G-EDF) [3] Monolithic (G-EDF)Average Response Time (ms) 65.99 136.57 84669.47
Maximum Response Time (ms) 125.66 427.07 170091.06
[1] [2]
[3]
[3]
Half the average response time
One-third the maximum response time
Case Study: Comparing Fine-Grained/Coarse-Grained/Monolithic Scheduling
![Page 35: Making OpenVX Really “Real Time”](https://reader034.fdocuments.us/reader034/viewer/2022042313/625c48c74d2f020e2569b347/html5/thumbnails/35.jpg)
!35
FL: fair-lateness
[1] Fine-grained (G-FL) [2] Coarse-grained (G-EDF) [3] Monolithic (G-EDF)Average Response Time (ms) 65.99 136.57 84669.47
Maximum Response Time (ms) 125.66 427.07 170091.06
Analytical Bound (ms) N/A
[1] [2]
[3]
[3]
Case Study: Comparing Fine-Grained/Coarse-Grained/Monolithic Scheduling
![Page 36: Making OpenVX Really “Real Time”](https://reader034.fdocuments.us/reader034/viewer/2022042313/625c48c74d2f020e2569b347/html5/thumbnails/36.jpg)
!36
FL: fair-lateness
[1] Fine-grained (G-FL) [2] Coarse-grained (G-EDF) [3] Monolithic (G-EDF)Average Response Time (ms) 65.99 136.57 84669.47
Maximum Response Time (ms) 125.66 427.07 170091.06
Analytical Bound (ms) N/A N/A
[1] [2]
[3]
[3]
Case Study: Comparing Fine-Grained/Coarse-Grained/Monolithic Scheduling
![Page 37: Making OpenVX Really “Real Time”](https://reader034.fdocuments.us/reader034/viewer/2022042313/625c48c74d2f020e2569b347/html5/thumbnails/37.jpg)
!37
FL: fair-lateness
[1] Fine-grained (G-FL) [2] Coarse-grained (G-EDF) [3] Monolithic (G-EDF)Average Response Time (ms) 65.99 136.57 84669.47
Maximum Response Time (ms) 125.66 427.07 170091.06
Analytical Bound (ms) 542.39 N/A N/A
[1] [2]
[3]
[3]
Case Study: Comparing Fine-Grained/Coarse-Grained/Monolithic Scheduling
![Page 38: Making OpenVX Really “Real Time”](https://reader034.fdocuments.us/reader034/viewer/2022042313/625c48c74d2f020e2569b347/html5/thumbnails/38.jpg)
!38
FL: fair-lateness
[1] Fine-grained (G-FL) [2] Coarse-grained (G-EDF) [3] Monolithic (G-EDF)Average Response Time (ms) 65.99 136.57 84669.47
Maximum Response Time (ms) 125.66 427.07 170091.06
Analytical Bound (ms) 542.39 N/A N/A
[1] [2]
[3]
[3]
An alert driver takes 700 ms to react.
Case Study: Comparing Fine-Grained/Coarse-Grained/Monolithic Scheduling
![Page 39: Making OpenVX Really “Real Time”](https://reader034.fdocuments.us/reader034/viewer/2022042313/625c48c74d2f020e2569b347/html5/thumbnails/39.jpg)
!39
FL: fair-lateness
[1] Fine-grained (G-FL) [2] Coarse-grained (G-EDF) [3] Monolithic (G-EDF)Average Response Time (ms) 65.99 136.57 84669.47
Maximum Response Time (ms) 125.66 427.07 170091.06
Analytical Bound (ms) 542.39 N/A N/A
[1] [2]
[3]
[3]
An alert driver takes 700 ms to react.
• Fair-lateness-based scheduler is beneficial as it reduced node response times by up to 9.9%.
• Overheads of supporting fine-grained scheduling was 14.15%.
Case Study: Comparing Fine-Grained/Coarse-Grained/Monolithic Scheduling
![Page 40: Making OpenVX Really “Real Time”](https://reader034.fdocuments.us/reader034/viewer/2022042313/625c48c74d2f020e2569b347/html5/thumbnails/40.jpg)
Conclusions
1. Fine-grained scheduling
2. Response-time bounds analysis for GPU tasks
3. Case study
!40
![Page 41: Making OpenVX Really “Real Time”](https://reader034.fdocuments.us/reader034/viewer/2022042313/625c48c74d2f020e2569b347/html5/thumbnails/41.jpg)
Future Work
1. Cycles in the graph
2. Other resource constraints
3. Schedulability studies
!41
![Page 42: Making OpenVX Really “Real Time”](https://reader034.fdocuments.us/reader034/viewer/2022042313/625c48c74d2f020e2569b347/html5/thumbnails/42.jpg)
Thanks!