Evaluating Orthogonalitybetween Application Auto tuning ...
Transcript of Evaluating Orthogonalitybetween Application Auto tuning ...
![Page 1: Evaluating Orthogonalitybetween Application Auto tuning ...](https://reader030.fdocuments.us/reader030/viewer/2022012221/61e052d5c782cd029b570401/html5/thumbnails/1.jpg)
Evaluating Orthogonality between Application Auto‐tuning and Run‐Time Resource Management for Adaptive OpenCL ApplicationsEdoardo Paone, Davide Gadioli, Gianluca Palermo, Vittorio Zaccaria, Cristina SilvanoPolitecnico di Milano
![Page 2: Evaluating Orthogonalitybetween Application Auto tuning ...](https://reader030.fdocuments.us/reader030/viewer/2022012221/61e052d5c782cd029b570401/html5/thumbnails/2.jpg)
Computer Architecture Evolution
“The number of transistors incorporated in a chip will approximately double every two years” – Gordon Moore, Intel co-founder
Time
2
80863um
3861.5um
Pentium 40.18um
Core2 Duo65nm
Nehalem45nm
![Page 3: Evaluating Orthogonalitybetween Application Auto tuning ...](https://reader030.fdocuments.us/reader030/viewer/2022012221/61e052d5c782cd029b570401/html5/thumbnails/3.jpg)
80863um
3861.5um
Pentium 40.8um
Core2 Duo65nm
Nehalem45nm
“Moore’s Law” on Performance
1987 2003 2011 2020
Perf
orm
ance
4
![Page 4: Evaluating Orthogonalitybetween Application Auto tuning ...](https://reader030.fdocuments.us/reader030/viewer/2022012221/61e052d5c782cd029b570401/html5/thumbnails/4.jpg)
80863um
3861.5um
Pentium 40.8um
Core2 Duo65nm
Nehalem45nm
“Moore’s Law” on Performance
1987 2003 2011 2020
Perf
orm
ance
The Golden Era:- Single-processor
- 1st Power Wall
3
![Page 5: Evaluating Orthogonalitybetween Application Auto tuning ...](https://reader030.fdocuments.us/reader030/viewer/2022012221/61e052d5c782cd029b570401/html5/thumbnails/5.jpg)
80863um
3861.5um
Pentium 40.8um
Core2 Duo65nm
Nehalem45nm
“Moore’s Law” on Performance
1987 2003 2011 2020
Perf
orm
ance
The Multicore Era:- 2 to 16 cores- On-chip shared LL$
- Programmability challenge
3
![Page 6: Evaluating Orthogonalitybetween Application Auto tuning ...](https://reader030.fdocuments.us/reader030/viewer/2022012221/61e052d5c782cd029b570401/html5/thumbnails/6.jpg)
80863um
3861.5um
Pentium 40.8um
Core2 Duo65nm
Nehalem45nm
“Moore’s Law” on Performance
1987 2003 2011 2020
Perf
orm
ance
The Manycore Era:- Larger # of cores- Networks on-Chip
- Programmability challenge + Dynamic Resource Management
?
3
![Page 7: Evaluating Orthogonalitybetween Application Auto tuning ...](https://reader030.fdocuments.us/reader030/viewer/2022012221/61e052d5c782cd029b570401/html5/thumbnails/7.jpg)
Main Idea
In the context of resource consolidation, analyze the orthogonal effects of: Resource Management Application Auto‐Tuning
Approximate computing
In the context of resource consolidation, analyze the orthogonal effects of: Resource Management Application Auto‐Tuning
Approximate computing
…
Target Platforms
4
![Page 8: Evaluating Orthogonalitybetween Application Auto tuning ...](https://reader030.fdocuments.us/reader030/viewer/2022012221/61e052d5c782cd029b570401/html5/thumbnails/8.jpg)
Main Idea
In the context of resource consolidation, analyze the orthogonal effects of: Resource Management Application Auto‐Tuning
Approximate computing
In the context of resource consolidation, analyze the orthogonal effects of: Resource Management Application Auto‐Tuning
Approximate computing
…
Target Platforms
4
Multicore PlatformMulticore Platform
![Page 9: Evaluating Orthogonalitybetween Application Auto tuning ...](https://reader030.fdocuments.us/reader030/viewer/2022012221/61e052d5c782cd029b570401/html5/thumbnails/9.jpg)
Run‐Time Resource Management
Amit Kumar Singh, Muhammad Shafique, Akash Kumar, and Jörg Henkel. Mapping on multi/many‐core systems: survey of current and emerging trends. In Proceedings of the 50th Annual Design Automation Conference (DAC). 2013.
RTRM
5
App1 App2 App3
Target Platform
![Page 10: Evaluating Orthogonalitybetween Application Auto tuning ...](https://reader030.fdocuments.us/reader030/viewer/2022012221/61e052d5c782cd029b570401/html5/thumbnails/10.jpg)
RTRM ‐ Overview
6
RTRM
App1 App2 App3
Accounting
Mapping
Target Platform
![Page 11: Evaluating Orthogonalitybetween Application Auto tuning ...](https://reader030.fdocuments.us/reader030/viewer/2022012221/61e052d5c782cd029b570401/html5/thumbnails/11.jpg)
RTRM ‐ Overview
Resource accounting phase grants resources to critical workloads while optimize resource usage by best‐effort workloads
6
RTRM
App1 App2 App3
Accounting
Mapping
44 44 66
Target Platform
![Page 12: Evaluating Orthogonalitybetween Application Auto tuning ...](https://reader030.fdocuments.us/reader030/viewer/2022012221/61e052d5c782cd029b570401/html5/thumbnails/12.jpg)
RTRM ‐ Overview
Resource accounting phase grants resources to critical workloads while optimize resource usage by best‐effort workloads
6
RTRM
App1 App2 App3
Accounting
Mapping
Mapping phase maps virtual resources on physical resources
to achieve optimal platform usage to handle run‐time variations
44 44 66
Target Platform
![Page 13: Evaluating Orthogonalitybetween Application Auto tuning ...](https://reader030.fdocuments.us/reader030/viewer/2022012221/61e052d5c782cd029b570401/html5/thumbnails/13.jpg)
Application Auto‐TuningKey idea is that most of the applications are configurable thanks to a set of parameters
7
![Page 14: Evaluating Orthogonalitybetween Application Auto tuning ...](https://reader030.fdocuments.us/reader030/viewer/2022012221/61e052d5c782cd029b570401/html5/thumbnails/14.jpg)
Application Auto‐TuningKey idea is that most of the applications are configurable thanks to a set of parameters
7
![Page 15: Evaluating Orthogonalitybetween Application Auto tuning ...](https://reader030.fdocuments.us/reader030/viewer/2022012221/61e052d5c782cd029b570401/html5/thumbnails/15.jpg)
Application Auto‐TuningKey idea is that most of the applications are configurable thanks to a set of parameters
Parameters:• Color
Parameters:• Color
7
![Page 16: Evaluating Orthogonalitybetween Application Auto tuning ...](https://reader030.fdocuments.us/reader030/viewer/2022012221/61e052d5c782cd029b570401/html5/thumbnails/16.jpg)
Application Auto‐TuningKey idea is that most of the applications are configurable thanks to a set of parameters
Parameters:• Color • Shape
Parameters:• Color • Shape
7
![Page 17: Evaluating Orthogonalitybetween Application Auto tuning ...](https://reader030.fdocuments.us/reader030/viewer/2022012221/61e052d5c782cd029b570401/html5/thumbnails/17.jpg)
Application Auto‐TuningKey idea is that most of the applications are configurable thanks to a set of parameters
Parameters:• Color • Shape
• Size
Parameters:• Color • Shape
• Size
7
![Page 18: Evaluating Orthogonalitybetween Application Auto tuning ...](https://reader030.fdocuments.us/reader030/viewer/2022012221/61e052d5c782cd029b570401/html5/thumbnails/18.jpg)
Application Auto‐Tuning
Run-Time Knobs
Key idea is that most of the applications are configurable thanks to a set of parameters
7
![Page 19: Evaluating Orthogonalitybetween Application Auto tuning ...](https://reader030.fdocuments.us/reader030/viewer/2022012221/61e052d5c782cd029b570401/html5/thumbnails/19.jpg)
Why Run‐Time Knobs & Auto‐Tuning?In some applications internal knobs that can be used to trade‐off between application quality of results and performance
8
QoR
Performance
![Page 20: Evaluating Orthogonalitybetween Application Auto tuning ...](https://reader030.fdocuments.us/reader030/viewer/2022012221/61e052d5c782cd029b570401/html5/thumbnails/20.jpg)
Why Run‐Time Knobs & Auto‐Tuning?In some applications internal knobs that can be used to trade‐off between application quality of results and performance
Autonomous Video-surveillance
System
8
![Page 21: Evaluating Orthogonalitybetween Application Auto tuning ...](https://reader030.fdocuments.us/reader030/viewer/2022012221/61e052d5c782cd029b570401/html5/thumbnails/21.jpg)
Why Run‐Time Knobs & Auto‐Tuning?In some applications internal knobs that can be used to trade‐off between application quality of results and performance
Video Frame Rate
Video Resolution
Autonomous Video-surveillance
System
8
![Page 22: Evaluating Orthogonalitybetween Application Auto tuning ...](https://reader030.fdocuments.us/reader030/viewer/2022012221/61e052d5c782cd029b570401/html5/thumbnails/22.jpg)
Why Run‐Time Knobs & Auto‐Tuning?In some applications internal knobs that can be used to trade‐off between application quality of results and performance
Video Frame Rate
Video Resolution
Autonomous Video-surveillance
System
8
![Page 23: Evaluating Orthogonalitybetween Application Auto tuning ...](https://reader030.fdocuments.us/reader030/viewer/2022012221/61e052d5c782cd029b570401/html5/thumbnails/23.jpg)
Why Run‐Time Knobs & Auto‐Tuning?In some applications internal knobs that can be used to trade‐off between application quality of results and performance
Video Frame Rate
Video Resolution
Autonomous Video-surveillance
System
8
![Page 24: Evaluating Orthogonalitybetween Application Auto tuning ...](https://reader030.fdocuments.us/reader030/viewer/2022012221/61e052d5c782cd029b570401/html5/thumbnails/24.jpg)
Why Run‐Time Knobs & Auto‐Tuning?In some applications internal knobs that can be used to trade‐off between application quality of results and performance
Video Frame Rate
Video Resolution
Autonomous Video-surveillance
System
8
![Page 25: Evaluating Orthogonalitybetween Application Auto tuning ...](https://reader030.fdocuments.us/reader030/viewer/2022012221/61e052d5c782cd029b570401/html5/thumbnails/25.jpg)
Why Run‐Time Knobs & Auto‐Tuning?In some applications internal knobs that can be used to trade‐off between application quality of results and performance
Video Frame Rate
Video Resolution
Autonomous Video-surveillance
System
8
![Page 26: Evaluating Orthogonalitybetween Application Auto tuning ...](https://reader030.fdocuments.us/reader030/viewer/2022012221/61e052d5c782cd029b570401/html5/thumbnails/26.jpg)
Why Run‐Time Knobs & Auto‐Tuning?In some applications internal knobs that can be used to trade‐off between application quality of results and performance
Video Frame Rate
Video Resolution
Autonomous Video-surveillance
System
8
![Page 27: Evaluating Orthogonalitybetween Application Auto tuning ...](https://reader030.fdocuments.us/reader030/viewer/2022012221/61e052d5c782cd029b570401/html5/thumbnails/27.jpg)
Application Auto‐Tuning Framework
9
![Page 28: Evaluating Orthogonalitybetween Application Auto tuning ...](https://reader030.fdocuments.us/reader030/viewer/2022012221/61e052d5c782cd029b570401/html5/thumbnails/28.jpg)
Application Auto‐Tuning Framework
Execution LoopExecution Loop
9
![Page 29: Evaluating Orthogonalitybetween Application Auto tuning ...](https://reader030.fdocuments.us/reader030/viewer/2022012221/61e052d5c782cd029b570401/html5/thumbnails/29.jpg)
Application Auto‐Tuning Framework
MonitoringMonitoring
9
![Page 30: Evaluating Orthogonalitybetween Application Auto tuning ...](https://reader030.fdocuments.us/reader030/viewer/2022012221/61e052d5c782cd029b570401/html5/thumbnails/30.jpg)
Application Auto‐Tuning Framework
Re-ConfigureRe-Configure
9
![Page 31: Evaluating Orthogonalitybetween Application Auto tuning ...](https://reader030.fdocuments.us/reader030/viewer/2022012221/61e052d5c782cd029b570401/html5/thumbnails/31.jpg)
Target HW Platform
Orthogonality Concept
Platform OS
Req
uests
Res
ourc
es
10
![Page 32: Evaluating Orthogonalitybetween Application Auto tuning ...](https://reader030.fdocuments.us/reader030/viewer/2022012221/61e052d5c782cd029b570401/html5/thumbnails/32.jpg)
Target HW Platform
Orthogonality Concept
Platform OS
Req
uests
Res
ourc
es
Exploitation of OpenCLDevice Fission to limit
resource requests
Exploitation of OpenCLDevice Fission to limit
resource requests
10
![Page 33: Evaluating Orthogonalitybetween Application Auto tuning ...](https://reader030.fdocuments.us/reader030/viewer/2022012221/61e052d5c782cd029b570401/html5/thumbnails/33.jpg)
Run-Time Resource Manager
Orthogonality Concept
Platform OS
Target HW PlatformReq
uests
Res
ourc
es
Exploitation of OpenCLDevice Fission to limit
resource requests
Exploitation of OpenCLDevice Fission to limit
resource requests
10
![Page 34: Evaluating Orthogonalitybetween Application Auto tuning ...](https://reader030.fdocuments.us/reader030/viewer/2022012221/61e052d5c782cd029b570401/html5/thumbnails/34.jpg)
The Multi‐View Case Study
2 eyes = 3 dimensions
11
![Page 35: Evaluating Orthogonalitybetween Application Auto tuning ...](https://reader030.fdocuments.us/reader030/viewer/2022012221/61e052d5c782cd029b570401/html5/thumbnails/35.jpg)
Implementation 1
PR PL QR QL
P
Q
CAM LEFT
CAM RIGHT
PL
PR
QR
QL
DP
DQ
1 Ke Zhang, Jiangbo Lu, and Gauthier Lafruit, “Cross-Based Local Stereo Matching Using Orthogonal Integral Images”,IEEE Transactions On Circuits and Systems For Video Technology, Vol. 19, No. 7, July 2009
1 Ke Zhang, Jiangbo Lu, and Gauthier Lafruit, “Cross-Based Local Stereo Matching Using Orthogonal Integral Images”,IEEE Transactions On Circuits and Systems For Video Technology, Vol. 19, No. 7, July 2009
CAM LEFT CAM RIGHT
12
![Page 36: Evaluating Orthogonalitybetween Application Auto tuning ...](https://reader030.fdocuments.us/reader030/viewer/2022012221/61e052d5c782cd029b570401/html5/thumbnails/36.jpg)
Pixel disparity
Left camera Right camera
reference disparity
36
13
![Page 37: Evaluating Orthogonalitybetween Application Auto tuning ...](https://reader030.fdocuments.us/reader030/viewer/2022012221/61e052d5c782cd029b570401/html5/thumbnails/37.jpg)
Pixel disparity
Left camera Right camera
reference disparity
1
2
3
QoRDisparity
Error
QoRDisparity
Error
13
5 Application Knobs
![Page 38: Evaluating Orthogonalitybetween Application Auto tuning ...](https://reader030.fdocuments.us/reader030/viewer/2022012221/61e052d5c782cd029b570401/html5/thumbnails/38.jpg)
Experimental SetupTarget Platform
AMD NUMA Architecture: 4 nodes ‐ 4 cores • OpenCL 1.2 run‐time provided by AMD
14
![Page 39: Evaluating Orthogonalitybetween Application Auto tuning ...](https://reader030.fdocuments.us/reader030/viewer/2022012221/61e052d5c782cd029b570401/html5/thumbnails/39.jpg)
Experimental SetupTarget Platform
AMD NUMA Architecture: 4 nodes ‐ 4 cores • OpenCL 1.2 run‐time provided by AMD
Workload Definition:Single application – multiple instancesDynamic workload in terms of start time,amount of data to process, frame‐rate goal
14
![Page 40: Evaluating Orthogonalitybetween Application Auto tuning ...](https://reader030.fdocuments.us/reader030/viewer/2022012221/61e052d5c782cd029b570401/html5/thumbnails/40.jpg)
Experimental SetupTarget Platform
AMD NUMA Architecture: 4 nodes ‐ 4 cores • OpenCL 1.2 run‐time provided by AMD
Workload Definition:Single application – multiple instancesDynamic workload in terms of start time,amount of data to process, frame‐rate goal
Evaluation MetricsNormalized Actual Penalty (Performance/Quality metric)• User satisfaction in terms of Application Frame‐Rate
Normalized Application Error (Quality metric) • User satisfaction in terms of quality of the resulting image (1/QoR)
Difference w.r.t. off‐line profiling (Predictability metric)14
![Page 41: Evaluating Orthogonalitybetween Application Auto tuning ...](https://reader030.fdocuments.us/reader030/viewer/2022012221/61e052d5c782cd029b570401/html5/thumbnails/41.jpg)
Application Auto‐Tuning Effects
15
![Page 42: Evaluating Orthogonalitybetween Application Auto tuning ...](https://reader030.fdocuments.us/reader030/viewer/2022012221/61e052d5c782cd029b570401/html5/thumbnails/42.jpg)
Application Auto‐Tuning Effects
15
![Page 43: Evaluating Orthogonalitybetween Application Auto tuning ...](https://reader030.fdocuments.us/reader030/viewer/2022012221/61e052d5c782cd029b570401/html5/thumbnails/43.jpg)
Application Auto‐Tuning Effects
15
![Page 44: Evaluating Orthogonalitybetween Application Auto tuning ...](https://reader030.fdocuments.us/reader030/viewer/2022012221/61e052d5c782cd029b570401/html5/thumbnails/44.jpg)
Comparative AnalysisApplication Auto-tuning
Run-T
ime
Res
ourc
e M
anag
emen
tOFF ON
OFF
ON
PLAIN-LINUX ADAPTIVE-LINUX
PLAIN-RTRM ADAPTIVE-RTRM
(No Device Fission)
16
![Page 45: Evaluating Orthogonalitybetween Application Auto tuning ...](https://reader030.fdocuments.us/reader030/viewer/2022012221/61e052d5c782cd029b570401/html5/thumbnails/45.jpg)
Run‐Time Results
Missed Deadlines
Missed Deadlines
Design-Time Vs Run-Time
Profiling
Design-Time Vs Run-Time
Profiling
QoRDisparity
Error
QoRDisparity
Error
1 2 3 4 5 6
1 2 3 4 5 6
1 2 3 4 5 6 #Multi-View Apps#Multi-View Apps17
![Page 46: Evaluating Orthogonalitybetween Application Auto tuning ...](https://reader030.fdocuments.us/reader030/viewer/2022012221/61e052d5c782cd029b570401/html5/thumbnails/46.jpg)
Run‐Time Results
Missed Deadlines
Missed Deadlines
Design-Time Vs Run-Time
Profiling
Design-Time Vs Run-Time
Profiling
QoRDisparity
Error
QoRDisparity
Error
- - ++
- + -+
- - ++
1 2 3 4 5 6
1 2 3 4 5 6
1 2 3 4 5 6 #Multi-View Apps#Multi-View Apps17
![Page 47: Evaluating Orthogonalitybetween Application Auto tuning ...](https://reader030.fdocuments.us/reader030/viewer/2022012221/61e052d5c782cd029b570401/html5/thumbnails/47.jpg)
Run‐Time Results
Missed Deadlines
Missed Deadlines
Design-Time Vs Run-Time
Profiling
Design-Time Vs Run-Time
Profiling
QoRDisparity
Error
QoRDisparity
Error
- - ++
- + -+
- - ++
1 2 3 4 5 6
1 2 3 4 5 6
1 2 3 4 5 6 #Multi-View Apps#Multi-View Apps17
![Page 48: Evaluating Orthogonalitybetween Application Auto tuning ...](https://reader030.fdocuments.us/reader030/viewer/2022012221/61e052d5c782cd029b570401/html5/thumbnails/48.jpg)
Run‐Time Results
Missed Deadlines
Missed Deadlines
Design-Time Vs Run-Time
Profiling
Design-Time Vs Run-Time
Profiling
QoRDisparity
Error
QoRDisparity
Error
- - ++
- + -+
- - ++
1 2 3 4 5 6
1 2 3 4 5 6
1 2 3 4 5 6 #Multi-View Apps#Multi-View Apps17
![Page 49: Evaluating Orthogonalitybetween Application Auto tuning ...](https://reader030.fdocuments.us/reader030/viewer/2022012221/61e052d5c782cd029b570401/html5/thumbnails/49.jpg)
Run‐Time Results
Missed Deadlines
Missed Deadlines
Design-Time Vs Run-Time
Profiling
Design-Time Vs Run-Time
Profiling
QoRDisparity
Error
QoRDisparity
Error
- - + +
- + - +
- - + +
1 2 3 4 5 6
1 2 3 4 5 6
1 2 3 4 5 6 #Multi-View Apps#Multi-View Apps17
![Page 50: Evaluating Orthogonalitybetween Application Auto tuning ...](https://reader030.fdocuments.us/reader030/viewer/2022012221/61e052d5c782cd029b570401/html5/thumbnails/50.jpg)
Run‐Time Results
18
APPSAPPS
![Page 51: Evaluating Orthogonalitybetween Application Auto tuning ...](https://reader030.fdocuments.us/reader030/viewer/2022012221/61e052d5c782cd029b570401/html5/thumbnails/51.jpg)
Resource‐Aware AS‐RTM
Target HW Platform
Platform OS
Req
uests
Res
ourc
esResource
Availability
19
![Page 52: Evaluating Orthogonalitybetween Application Auto tuning ...](https://reader030.fdocuments.us/reader030/viewer/2022012221/61e052d5c782cd029b570401/html5/thumbnails/52.jpg)
Resource‐Aware AS‐RTM
Target HW Platform
Platform OS
Req
uests
Res
ourc
esResource
Availability
19
![Page 53: Evaluating Orthogonalitybetween Application Auto tuning ...](https://reader030.fdocuments.us/reader030/viewer/2022012221/61e052d5c782cd029b570401/html5/thumbnails/53.jpg)
ConclusionsWe considered the problem of managing multiple OpenCL applications for server consolidation on multi‐core platforms
We implemented an approach exploiting run‐time management frameworks operating both at application level or at OS/resource level
Analysis of results:Auto‐tuning is necessary to modulate performance and QoRResource‐awareness is needed for predictability by means of resource isolation (RTRM) or simple monitor (RA‐AS‐RTM)
20