Energy Models in Data Parallel CPU/GPU Computations
-
Upload
alessandro-lenzi -
Category
Engineering
-
view
289 -
download
1
Transcript of Energy Models in Data Parallel CPU/GPU Computations
![Page 1: Energy Models in Data Parallel CPU/GPU Computations](https://reader034.fdocuments.us/reader034/viewer/2022051504/58f024671a28abee618b45e3/html5/thumbnails/1.jpg)
Energy models in data parallel CPU/GPU computations
Master Degree in Computer Science and Networking
Università di Pisa and Scuola Superiore Sant’Anna
Academic Year 2014/2015
Supervisor: Candidate:Prof. Marco Danelutto Alessandro Lenzi
Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 1 / 23
![Page 2: Energy Models in Data Parallel CPU/GPU Computations](https://reader034.fdocuments.us/reader034/viewer/2022051504/58f024671a28abee618b45e3/html5/thumbnails/2.jpg)
Outline
1 IntroductionI Energy consumption issueI Heterogeneous, parallel architecturesI ProblemI Energy consumption in CPU/GPU computations
2 GPU Energy Model
3 CPU Energy Model
4 Aggregated model and usage
Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 2 / 23
![Page 3: Energy Models in Data Parallel CPU/GPU Computations](https://reader034.fdocuments.us/reader034/viewer/2022051504/58f024671a28abee618b45e3/html5/thumbnails/3.jpg)
Section 1
Introduction
Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 3 / 23
![Page 4: Energy Models in Data Parallel CPU/GPU Computations](https://reader034.fdocuments.us/reader034/viewer/2022051504/58f024671a28abee618b45e3/html5/thumbnails/4.jpg)
The problem of energy consumption
I Technical problemI Cities worth of power, power provvisioning is an issue
I Higher temperature, more failuresI Mobile devices with limited energy budget.
I Economic issue
I Cost of energy > cost of hardware
I Environmental concern
I 2% of greenhouse emissions cause by US IT sector
Energy consumption is nowadays a growing concern in IT industry.
Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 4 / 23
![Page 5: Energy Models in Data Parallel CPU/GPU Computations](https://reader034.fdocuments.us/reader034/viewer/2022051504/58f024671a28abee618b45e3/html5/thumbnails/5.jpg)
The problem of energy consumption
I Technical problemI Cities worth of power, power provvisioning is an issueI Higher temperature, more failures
I Mobile devices with limited energy budget.I Economic issue
I Cost of energy > cost of hardware
I Environmental concern
I 2% of greenhouse emissions cause by US IT sector
Energy consumption is nowadays a growing concern in IT industry.
Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 4 / 23
![Page 6: Energy Models in Data Parallel CPU/GPU Computations](https://reader034.fdocuments.us/reader034/viewer/2022051504/58f024671a28abee618b45e3/html5/thumbnails/6.jpg)
The problem of energy consumption
I Technical problemI Cities worth of power, power provvisioning is an issueI Higher temperature, more failuresI Mobile devices with limited energy budget.
I Economic issue
I Cost of energy > cost of hardware
I Environmental concern
I 2% of greenhouse emissions cause by US IT sector
Energy consumption is nowadays a growing concern in IT industry.
Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 4 / 23
![Page 7: Energy Models in Data Parallel CPU/GPU Computations](https://reader034.fdocuments.us/reader034/viewer/2022051504/58f024671a28abee618b45e3/html5/thumbnails/7.jpg)
The problem of energy consumption
I Technical problemI Cities worth of power, power provvisioning is an issueI Higher temperature, more failuresI Mobile devices with limited energy budget.
I Economic issue
I Cost of energy > cost of hardwareI Environmental concern
I 2% of greenhouse emissions cause by US IT sector
Energy consumption is nowadays a growing concern in IT industry.
Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 4 / 23
![Page 8: Energy Models in Data Parallel CPU/GPU Computations](https://reader034.fdocuments.us/reader034/viewer/2022051504/58f024671a28abee618b45e3/html5/thumbnails/8.jpg)
The problem of energy consumption
I Technical problemI Cities worth of power, power provvisioning is an issueI Higher temperature, more failuresI Mobile devices with limited energy budget.
I Economic issueI Cost of energy > cost of hardware
I Environmental concern
I 2% of greenhouse emissions cause by US IT sector
Energy consumption is nowadays a growing concern in IT industry.
Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 4 / 23
![Page 9: Energy Models in Data Parallel CPU/GPU Computations](https://reader034.fdocuments.us/reader034/viewer/2022051504/58f024671a28abee618b45e3/html5/thumbnails/9.jpg)
The problem of energy consumption
I Technical problemI Cities worth of power, power provvisioning is an issueI Higher temperature, more failuresI Mobile devices with limited energy budget.
I Economic issueI Cost of energy > cost of hardware
I Environmental concern
I 2% of greenhouse emissions cause by US IT sector
Energy consumption is nowadays a growing concern in IT industry.
Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 4 / 23
![Page 10: Energy Models in Data Parallel CPU/GPU Computations](https://reader034.fdocuments.us/reader034/viewer/2022051504/58f024671a28abee618b45e3/html5/thumbnails/10.jpg)
The problem of energy consumption
I Technical problemI Cities worth of power, power provvisioning is an issueI Higher temperature, more failuresI Mobile devices with limited energy budget.
I Economic issueI Cost of energy > cost of hardware
I Environmental concernI 2% of greenhouse emissions cause by US IT sector
Energy consumption is nowadays a growing concern in IT industry.
Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 4 / 23
![Page 11: Energy Models in Data Parallel CPU/GPU Computations](https://reader034.fdocuments.us/reader034/viewer/2022051504/58f024671a28abee618b45e3/html5/thumbnails/11.jpg)
The problem of energy consumption
I Technical problemI Cities worth of power, power provvisioning is an issueI Higher temperature, more failuresI Mobile devices with limited energy budget.
I Economic issueI Cost of energy > cost of hardware
I Environmental concernI 2% of greenhouse emissions cause by US IT sector
Energy consumption is nowadays a growing concern in IT industry.
Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 4 / 23
![Page 12: Energy Models in Data Parallel CPU/GPU Computations](https://reader034.fdocuments.us/reader034/viewer/2022051504/58f024671a28abee618b45e3/html5/thumbnails/12.jpg)
The problem of energy consumption
I Technical problemI Cities worth of power, power provvisioning is an issueI Higher temperature, more failuresI Mobile devices with limited energy budget.
I Economic issueI Cost of energy > cost of hardware
I Environmental concernI 2% of greenhouse emissions cause by US IT sector → 3% by 2020
Energy consumption is nowadays a growing concern in IT industry.
Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 4 / 23
![Page 13: Energy Models in Data Parallel CPU/GPU Computations](https://reader034.fdocuments.us/reader034/viewer/2022051504/58f024671a28abee618b45e3/html5/thumbnails/13.jpg)
The problem of energy consumption
I Technical problemI Cities worth of power, power provvisioning is an issueI Higher temperature, more failuresI Mobile devices with limited energy budget.
I Economic issueI Cost of energy > cost of hardware
I Environmental concernI 2% of greenhouse emissions cause by US IT sector → 3% by 2020
Energy consumption is nowadays a growing concern in IT industry.
Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 4 / 23
![Page 14: Energy Models in Data Parallel CPU/GPU Computations](https://reader034.fdocuments.us/reader034/viewer/2022051504/58f024671a28abee618b45e3/html5/thumbnails/14.jpg)
Scenario
Parallel, heterogeneous architectures everywhere!I Even low-end computers have multiple CPU cores and a GPU
I Different architectures, different energy footprint depending on thecomputation
I Saving energy requires efficient resource usage
I Structured parallel application development methodology
I Divide work according to energy consumption, not just time!
We selected a known, widely used data-parallel pattern
Map parallel pattern
I Map: higher order function, taking function f and collection A
I f applied in parallel on different partitionsI suitable for both CPU and GPU architectures
Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 5 / 23
![Page 15: Energy Models in Data Parallel CPU/GPU Computations](https://reader034.fdocuments.us/reader034/viewer/2022051504/58f024671a28abee618b45e3/html5/thumbnails/15.jpg)
Scenario
Parallel, heterogeneous architectures everywhere!I Even low-end computers have multiple CPU cores and a GPUI Different architectures, different energy footprint depending on the
computation
I Saving energy requires efficient resource usage
I Structured parallel application development methodology
I Divide work according to energy consumption, not just time!
We selected a known, widely used data-parallel pattern
Map parallel pattern
I Map: higher order function, taking function f and collection A
I f applied in parallel on different partitionsI suitable for both CPU and GPU architectures
Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 5 / 23
![Page 16: Energy Models in Data Parallel CPU/GPU Computations](https://reader034.fdocuments.us/reader034/viewer/2022051504/58f024671a28abee618b45e3/html5/thumbnails/16.jpg)
Scenario
Parallel, heterogeneous architectures everywhere!I Even low-end computers have multiple CPU cores and a GPUI Different architectures, different energy footprint depending on the
computation
I Saving energy requires efficient resource usage
I Structured parallel application development methodology
I Divide work according to energy consumption, not just time!
We selected a known, widely used data-parallel pattern
Map parallel pattern
I Map: higher order function, taking function f and collection A
I f applied in parallel on different partitionsI suitable for both CPU and GPU architectures
Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 5 / 23
![Page 17: Energy Models in Data Parallel CPU/GPU Computations](https://reader034.fdocuments.us/reader034/viewer/2022051504/58f024671a28abee618b45e3/html5/thumbnails/17.jpg)
Scenario
Parallel, heterogeneous architectures everywhere!I Even low-end computers have multiple CPU cores and a GPUI Different architectures, different energy footprint depending on the
computation
I Saving energy requires efficient resource usageI Structured parallel application development methodology
I Divide work according to energy consumption, not just time!
We selected a known, widely used data-parallel pattern
Map parallel pattern
I Map: higher order function, taking function f and collection A
I f applied in parallel on different partitionsI suitable for both CPU and GPU architectures
Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 5 / 23
![Page 18: Energy Models in Data Parallel CPU/GPU Computations](https://reader034.fdocuments.us/reader034/viewer/2022051504/58f024671a28abee618b45e3/html5/thumbnails/18.jpg)
Scenario
Parallel, heterogeneous architectures everywhere!I Even low-end computers have multiple CPU cores and a GPUI Different architectures, different energy footprint depending on the
computation
I Saving energy requires efficient resource usageI Structured parallel application development methodology
I Divide work according to energy consumption, not just time!
We selected a known, widely used data-parallel pattern
Map parallel pattern
I Map: higher order function, taking function f and collection A
I f applied in parallel on different partitionsI suitable for both CPU and GPU architectures
Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 5 / 23
![Page 19: Energy Models in Data Parallel CPU/GPU Computations](https://reader034.fdocuments.us/reader034/viewer/2022051504/58f024671a28abee618b45e3/html5/thumbnails/19.jpg)
Scenario
Parallel, heterogeneous architectures everywhere!I Even low-end computers have multiple CPU cores and a GPUI Different architectures, different energy footprint depending on the
computation
I Saving energy requires efficient resource usageI Structured parallel application development methodology
I Divide work according to energy consumption, not just time!
We selected a known, widely used data-parallel pattern
Map parallel pattern
I Map: higher order function, taking function f and collection A
I f applied in parallel on different partitionsI suitable for both CPU and GPU architectures
Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 5 / 23
![Page 20: Energy Models in Data Parallel CPU/GPU Computations](https://reader034.fdocuments.us/reader034/viewer/2022051504/58f024671a28abee618b45e3/html5/thumbnails/20.jpg)
Scenario
Parallel, heterogeneous architectures everywhere!I Even low-end computers have multiple CPU cores and a GPUI Different architectures, different energy footprint depending on the
computation
I Saving energy requires efficient resource usageI Structured parallel application development methodology
I Divide work according to energy consumption, not just time!
We selected a known, widely used data-parallel pattern
Map parallel pattern
I Map: higher order function, taking function f and collection A
I f applied in parallel on different partitionsI suitable for both CPU and GPU architectures
Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 5 / 23
![Page 21: Energy Models in Data Parallel CPU/GPU Computations](https://reader034.fdocuments.us/reader034/viewer/2022051504/58f024671a28abee618b45e3/html5/thumbnails/21.jpg)
Scenario
Parallel, heterogeneous architectures everywhere!I Even low-end computers have multiple CPU cores and a GPUI Different architectures, different energy footprint depending on the
computation
I Saving energy requires efficient resource usageI Structured parallel application development methodology
I Divide work according to energy consumption, not just time!
We selected a known, widely used data-parallel pattern
Map parallel patternI Map: higher order function, taking function f and collection A
I f applied in parallel on different partitionsI suitable for both CPU and GPU architectures
Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 5 / 23
![Page 22: Energy Models in Data Parallel CPU/GPU Computations](https://reader034.fdocuments.us/reader034/viewer/2022051504/58f024671a28abee618b45e3/html5/thumbnails/22.jpg)
Scenario
Parallel, heterogeneous architectures everywhere!I Even low-end computers have multiple CPU cores and a GPUI Different architectures, different energy footprint depending on the
computation
I Saving energy requires efficient resource usageI Structured parallel application development methodology
I Divide work according to energy consumption, not just time!
We selected a known, widely used data-parallel pattern
Map parallel patternI Map: higher order function, taking function f and collection A
I f applied in parallel on different partitions
I suitable for both CPU and GPU architectures
Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 5 / 23
![Page 23: Energy Models in Data Parallel CPU/GPU Computations](https://reader034.fdocuments.us/reader034/viewer/2022051504/58f024671a28abee618b45e3/html5/thumbnails/23.jpg)
Scenario
Parallel, heterogeneous architectures everywhere!I Even low-end computers have multiple CPU cores and a GPUI Different architectures, different energy footprint depending on the
computation
I Saving energy requires efficient resource usageI Structured parallel application development methodology
I Divide work according to energy consumption, not just time!
We selected a known, widely used data-parallel pattern
Map parallel patternI Map: higher order function, taking function f and collection A
I f applied in parallel on different partitionsI suitable for both CPU and GPU architectures
Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 5 / 23
![Page 24: Energy Models in Data Parallel CPU/GPU Computations](https://reader034.fdocuments.us/reader034/viewer/2022051504/58f024671a28abee618b45e3/html5/thumbnails/24.jpg)
Scenario
Parallel, heterogeneous architectures everywhere!I Even low-end computers have multiple CPU cores and a GPUI Different architectures, different energy footprint depending on the
computation
I Saving energy requires efficient resource usageI Structured parallel application development methodology
I Divide work according to energy consumption, not just time!
We selected a known, widely used data-parallel pattern
Map parallel patternI Map: higher order function, taking function f and collection A
I f applied in parallel on different partitionsI suitable for both CPU and GPU architectures
Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 5 / 23
![Page 25: Energy Models in Data Parallel CPU/GPU Computations](https://reader034.fdocuments.us/reader034/viewer/2022051504/58f024671a28abee618b45e3/html5/thumbnails/25.jpg)
Problem
ProblemFind a model predicting the energy consumption of different map parallelapplications executed on both CPU and GPU.
Why modelling energy?I Divide tasks properly between CPU and GPU
I Select parallelism degree on both devicesI Autonomic manager: allows to respond to non-functional concernsI Model for the programmerI Aim: Save energy!
Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 6 / 23
![Page 26: Energy Models in Data Parallel CPU/GPU Computations](https://reader034.fdocuments.us/reader034/viewer/2022051504/58f024671a28abee618b45e3/html5/thumbnails/26.jpg)
Problem
ProblemFind a model predicting the energy consumption of different map parallelapplications executed on both CPU and GPU.
Why modelling energy?I Divide tasks properly between CPU and GPUI Select parallelism degree on both devices
I Autonomic manager: allows to respond to non-functional concernsI Model for the programmerI Aim: Save energy!
Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 6 / 23
![Page 27: Energy Models in Data Parallel CPU/GPU Computations](https://reader034.fdocuments.us/reader034/viewer/2022051504/58f024671a28abee618b45e3/html5/thumbnails/27.jpg)
Problem
ProblemFind a model predicting the energy consumption of different map parallelapplications executed on both CPU and GPU.
Why modelling energy?I Divide tasks properly between CPU and GPUI Select parallelism degree on both devicesI Autonomic manager: allows to respond to non-functional concerns
I Model for the programmerI Aim: Save energy!
Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 6 / 23
![Page 28: Energy Models in Data Parallel CPU/GPU Computations](https://reader034.fdocuments.us/reader034/viewer/2022051504/58f024671a28abee618b45e3/html5/thumbnails/28.jpg)
Problem
ProblemFind a model predicting the energy consumption of different map parallelapplications executed on both CPU and GPU.
Why modelling energy?I Divide tasks properly between CPU and GPUI Select parallelism degree on both devicesI Autonomic manager: allows to respond to non-functional concernsI Model for the programmer
I Aim: Save energy!
Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 6 / 23
![Page 29: Energy Models in Data Parallel CPU/GPU Computations](https://reader034.fdocuments.us/reader034/viewer/2022051504/58f024671a28abee618b45e3/html5/thumbnails/29.jpg)
Problem
ProblemFind a model predicting the energy consumption of different map parallelapplications executed on both CPU and GPU.
Why modelling energy?I Divide tasks properly between CPU and GPUI Select parallelism degree on both devicesI Autonomic manager: allows to respond to non-functional concernsI Model for the programmerI Aim: Save energy!
Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 6 / 23
![Page 30: Energy Models in Data Parallel CPU/GPU Computations](https://reader034.fdocuments.us/reader034/viewer/2022051504/58f024671a28abee618b45e3/html5/thumbnails/30.jpg)
Problem
ProblemFind a model predicting the energy consumption of different map parallelapplications executed on both CPU and GPU.
Why modelling energy?I Divide tasks properly between CPU and GPUI Select parallelism degree on both devicesI Autonomic manager: allows to respond to non-functional concernsI Model for the programmerI Aim: Save energy!
Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 6 / 23
![Page 31: Energy Models in Data Parallel CPU/GPU Computations](https://reader034.fdocuments.us/reader034/viewer/2022051504/58f024671a28abee618b45e3/html5/thumbnails/31.jpg)
Dividing map tasks between CPU/GPU cores
Partitioning between CPU and GPU; GPU part transferred through PCIebus
Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 7 / 23
![Page 32: Energy Models in Data Parallel CPU/GPU Computations](https://reader034.fdocuments.us/reader034/viewer/2022051504/58f024671a28abee618b45e3/html5/thumbnails/32.jpg)
Dividing map tasks between CPU/GPU cores
Within the device, data is partitioned among Streaming Multiprocessors(SM)
Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 7 / 23
![Page 33: Energy Models in Data Parallel CPU/GPU Computations](https://reader034.fdocuments.us/reader034/viewer/2022051504/58f024671a28abee618b45e3/html5/thumbnails/33.jpg)
Dividing map tasks between CPU/GPU cores
In a single SM, f is applied on the collection using threads, scheduled inbatch of 32 on CUDA cores
Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 7 / 23
![Page 34: Energy Models in Data Parallel CPU/GPU Computations](https://reader034.fdocuments.us/reader034/viewer/2022051504/58f024671a28abee618b45e3/html5/thumbnails/34.jpg)
Dividing map tasks between CPU/GPU cores
On the host side, the collection is partitioned between the cores, that applyf in parallel
Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 7 / 23
![Page 35: Energy Models in Data Parallel CPU/GPU Computations](https://reader034.fdocuments.us/reader034/viewer/2022051504/58f024671a28abee618b45e3/html5/thumbnails/35.jpg)
Dividing map tasks between CPU/GPU cores
Result is transferred back from GPU to host memory, obtaining the finalresult
Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 7 / 23
![Page 36: Energy Models in Data Parallel CPU/GPU Computations](https://reader034.fdocuments.us/reader034/viewer/2022051504/58f024671a28abee618b45e3/html5/thumbnails/36.jpg)
Energy consumption of a map computation
E =∫ tend
0P(t)dt
I P(t) depends on computation and used resources (parallelism degree)
I Divide energy depending on phase, where P(t) constant
I Phases:
computing (host and device), communication and waiting
For a single phase: E = P×Tphase
Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 8 / 23
![Page 37: Energy Models in Data Parallel CPU/GPU Computations](https://reader034.fdocuments.us/reader034/viewer/2022051504/58f024671a28abee618b45e3/html5/thumbnails/37.jpg)
Energy consumption of a map computation
E =∫ tend
0P(t)dt
I P(t) depends on computation and used resources (parallelism degree)
I Divide energy depending on phase, where P(t) constant
I Phases:
computing (host and device), communication and waiting
For a single phase: E = P×Tphase
Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 8 / 23
![Page 38: Energy Models in Data Parallel CPU/GPU Computations](https://reader034.fdocuments.us/reader034/viewer/2022051504/58f024671a28abee618b45e3/html5/thumbnails/38.jpg)
Energy consumption of a map computation
E =∫ tend
0P(t)dt
I P(t) depends on computation and used resources (parallelism degree)
I Divide energy depending on phase, where P(t) constant
I Phases:
computing (host and device), communication and waiting
For a single phase: E = P×Tphase
Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 8 / 23
![Page 39: Energy Models in Data Parallel CPU/GPU Computations](https://reader034.fdocuments.us/reader034/viewer/2022051504/58f024671a28abee618b45e3/html5/thumbnails/39.jpg)
Energy consumption of a map computation
E =∫ tend
0P(t)dt
I P(t) depends on computation and used resources (parallelism degree)
I Divide energy depending on phase, where P(t) constant
I Phases: computing (host and device),
communication and waiting
For a single phase: E = P×Tphase
Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 8 / 23
![Page 40: Energy Models in Data Parallel CPU/GPU Computations](https://reader034.fdocuments.us/reader034/viewer/2022051504/58f024671a28abee618b45e3/html5/thumbnails/40.jpg)
Energy consumption of a map computation
E =∫ tend
0P(t)dt
I P(t) depends on computation and used resources (parallelism degree)
I Divide energy depending on phase, where P(t) constant
I Phases: computing (host and device), communication
and waiting
For a single phase: E = P×Tphase
Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 8 / 23
![Page 41: Energy Models in Data Parallel CPU/GPU Computations](https://reader034.fdocuments.us/reader034/viewer/2022051504/58f024671a28abee618b45e3/html5/thumbnails/41.jpg)
Energy consumption of a map computation
E =∫ tend
0P(t)dt
I P(t) depends on computation and used resources (parallelism degree)
I Divide energy depending on phase, where P(t) constant
I Phases: computing (host and device), communication and waiting
For a single phase: E = P×Tphase
Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 8 / 23
![Page 42: Energy Models in Data Parallel CPU/GPU Computations](https://reader034.fdocuments.us/reader034/viewer/2022051504/58f024671a28abee618b45e3/html5/thumbnails/42.jpg)
Energy consumption of a map computation
E =∫ tend
0P(t)dt
I P(t) depends on computation and used resources (parallelism degree)
I Divide energy depending on phase, where P(t) constant
I Phases: computing (host and device), communication and waiting
For a single phase: E = P×Tphase
Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 8 / 23
![Page 43: Energy Models in Data Parallel CPU/GPU Computations](https://reader034.fdocuments.us/reader034/viewer/2022051504/58f024671a28abee618b45e3/html5/thumbnails/43.jpg)
Energy consumption of a map computation
E =∫ tend
0P(t)dt
I P(t) depends on computation and used resources (parallelism degree)
I Divide energy depending on phase, where P(t) constant
I Phases: computing (host and device), communication and waiting
For a single phase: E = P×Tphase
Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 8 / 23
![Page 44: Energy Models in Data Parallel CPU/GPU Computations](https://reader034.fdocuments.us/reader034/viewer/2022051504/58f024671a28abee618b45e3/html5/thumbnails/44.jpg)
Section 2
GPU Energy Model
Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 9 / 23
![Page 45: Energy Models in Data Parallel CPU/GPU Computations](https://reader034.fdocuments.us/reader034/viewer/2022051504/58f024671a28abee618b45e3/html5/thumbnails/45.jpg)
Predicting power with regression
Explanatory Variables: b, number of blocks (SM) used; w , number ofwarps (32 threads) for each bPower varies according to
√w for fixed b
P(b,w) = β0(b)+β1(b)√w
Cost of modellingFeasible only for repeated computations, otherwise too high.Time: at least 6 seconds Energy: about 1300 J
Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 10 / 23
![Page 46: Energy Models in Data Parallel CPU/GPU Computations](https://reader034.fdocuments.us/reader034/viewer/2022051504/58f024671a28abee618b45e3/html5/thumbnails/46.jpg)
Predicting power with regression
Explanatory Variables: b, number of blocks (SM) used; w , number ofwarps (32 threads) for each bPower varies according to
√w for fixed b
P(b,w) = β0(b)+β1(b)√w
Cost of modellingFeasible only for repeated computations, otherwise too high.Time: at least 6 seconds Energy: about 1300 J
Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 10 / 23
![Page 47: Energy Models in Data Parallel CPU/GPU Computations](https://reader034.fdocuments.us/reader034/viewer/2022051504/58f024671a28abee618b45e3/html5/thumbnails/47.jpg)
Predicting power with regression
Explanatory Variables: b, number of blocks (SM) used; w , number ofwarps (32 threads) for each bPower varies according to
√w for fixed b
P(b,w) = β0(b)+β1(b)√w
Cost of modellingFeasible only for repeated computations, otherwise too high.Time: at least 6 seconds Energy: about 1300 J
Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 10 / 23
![Page 48: Energy Models in Data Parallel CPU/GPU Computations](https://reader034.fdocuments.us/reader034/viewer/2022051504/58f024671a28abee618b45e3/html5/thumbnails/48.jpg)
Predicting power with regression
Explanatory Variables: b, number of blocks (SM) used; w , number ofwarps (32 threads) for each bPower varies according to
√w for fixed b
P(b,w) = β0(b)+β1(b)√w
I Precise, but costly to calculateI Estimate β0(b) and β1(b) as
functions of b
I trade samples (energyconsumption) with precision
I 6 samples → error below 5.21%
Cost of modellingFeasible only for repeated computations, otherwise too high.Time: at least 6 seconds Energy: about 1300 J
Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 10 / 23
![Page 49: Energy Models in Data Parallel CPU/GPU Computations](https://reader034.fdocuments.us/reader034/viewer/2022051504/58f024671a28abee618b45e3/html5/thumbnails/49.jpg)
Predicting power with regression
Explanatory Variables: b, number of blocks (SM) used; w , number ofwarps (32 threads) for each bPower varies according to
√w for fixed b
P(b,w) = β0(b)+β1(b)√w
I Precise, but costly to calculateI Estimate β0(b) and β1(b) as
functions of bI trade samples (energy
consumption) with precision
I 6 samples → error below 5.21%
Cost of modellingFeasible only for repeated computations, otherwise too high.Time: at least 6 seconds Energy: about 1300 J
Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 10 / 23
![Page 50: Energy Models in Data Parallel CPU/GPU Computations](https://reader034.fdocuments.us/reader034/viewer/2022051504/58f024671a28abee618b45e3/html5/thumbnails/50.jpg)
Predicting power with regression
Explanatory Variables: b, number of blocks (SM) used; w , number ofwarps (32 threads) for each bPower varies according to
√w for fixed b
P(b,w) = β0(b)+β1(b)√w
I Precise, but costly to calculateI Estimate β0(b) and β1(b) as
functions of bI trade samples (energy
consumption) with precision
I 6 samples → error below 5.21%
Cost of modellingFeasible only for repeated computations, otherwise too high.Time: at least 6 seconds Energy: about 1300 J
Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 10 / 23
![Page 51: Energy Models in Data Parallel CPU/GPU Computations](https://reader034.fdocuments.us/reader034/viewer/2022051504/58f024671a28abee618b45e3/html5/thumbnails/51.jpg)
Predicting power with regression
Explanatory Variables: b, number of blocks (SM) used; w , number ofwarps (32 threads) for each bPower varies according to
√w for fixed b
P(b,w) = β0(b)+β1(b)√w
I Precise, but costly to calculateI Estimate β0(b) and β1(b) as
functions of bI trade samples (energy
consumption) with precision
I 6 samples → error below 5.21%
Cost of modellingFeasible only for repeated computations, otherwise too high.Time: at least 6 seconds Energy: about 1300 J
Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 10 / 23
![Page 52: Energy Models in Data Parallel CPU/GPU Computations](https://reader034.fdocuments.us/reader034/viewer/2022051504/58f024671a28abee618b45e3/html5/thumbnails/52.jpg)
Predicting power with regression
Explanatory Variables: b, number of blocks (SM) used; w , number ofwarps (32 threads) for each bPower varies according to
√w for fixed b
P(b,w) = β0(b)+β1(b)√w
I Precise, but costly to calculateI Estimate β0(b) and β1(b) as
functions of bI trade samples (energy
consumption) with precision
I 6 samples → error below 5.21%
Cost of modellingFeasible only for repeated computations, otherwise too high.Time: at least 6 seconds Energy: about 1300 J
Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 10 / 23
![Page 53: Energy Models in Data Parallel CPU/GPU Computations](https://reader034.fdocuments.us/reader034/viewer/2022051504/58f024671a28abee618b45e3/html5/thumbnails/53.jpg)
Predicting power with regression
Explanatory Variables: b, number of blocks (SM) used; w , number ofwarps (32 threads) for each bPower varies according to
√w for fixed b
P(b,w) = β0(b)+β1(b)√w
I Precise, but costly to calculateI Estimate β0(b) and β1(b) as
functions of bI trade samples (energy
consumption) with precision
I 6 samples → error below 5.21%
Cost of modellingFeasible only for repeated computations, otherwise too high.Time: at least 6 seconds Energy: about 1300 J
Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 10 / 23
![Page 54: Energy Models in Data Parallel CPU/GPU Computations](https://reader034.fdocuments.us/reader034/viewer/2022051504/58f024671a28abee618b45e3/html5/thumbnails/54.jpg)
Heuristic model for power estimation
IdeaI Similar behaviours for map on GPU
I Use a computation as metre to measureothers
I Study the metre M, use to predict theothers
We define αC , increment in power of C divided by increment of metre
αC (b,w) =PC (b,w)− (Pbase +Pblockb)
PM(b,w)− (Pbase +Pblockb)
Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 11 / 23
![Page 55: Energy Models in Data Parallel CPU/GPU Computations](https://reader034.fdocuments.us/reader034/viewer/2022051504/58f024671a28abee618b45e3/html5/thumbnails/55.jpg)
Heuristic model for power estimation
IdeaI Similar behaviours for map on GPU
I Use a computation as metre to measureothers
I Study the metre M, use to predict theothers
60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
Pow
er (W
)
Warps
Average Power for different computations using four blocks and varying the amount of warps on Nvidia K20c
Vector AddMatrix Add
Matrix Multiplication
We define αC , increment in power of C divided by increment of metre
αC (b,w) =PC (b,w)− (Pbase +Pblockb)
PM(b,w)− (Pbase +Pblockb)
Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 11 / 23
![Page 56: Energy Models in Data Parallel CPU/GPU Computations](https://reader034.fdocuments.us/reader034/viewer/2022051504/58f024671a28abee618b45e3/html5/thumbnails/56.jpg)
Heuristic model for power estimation
IdeaI Similar behaviours for map on GPUI Use a computation as metre to measure
others
I Study the metre M, use to predict theothers
60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
Pow
er (W
)
Warps
Average Power for different computations using four blocks and varying the amount of warps on Nvidia K20c
Vector AddMatrix Add
Matrix Multiplication
62 64 66 68 70 72 74 76 78 80 82 84 86 88 90 92 94 96 98
100 102 104 106 108 110 112
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
Pow
er (W
)
Warps
Average Power for different computations using eight blocks and varying the amount of warps on Nvidia K20c
Vector AddMatrix Add
Matrix Multiplication
We define αC , increment in power of C divided by increment of metre
αC (b,w) =PC (b,w)− (Pbase +Pblockb)
PM(b,w)− (Pbase +Pblockb)
Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 11 / 23
![Page 57: Energy Models in Data Parallel CPU/GPU Computations](https://reader034.fdocuments.us/reader034/viewer/2022051504/58f024671a28abee618b45e3/html5/thumbnails/57.jpg)
Heuristic model for power estimation
IdeaI Similar behaviours for map on GPUI Use a computation as metre to measure
othersI Study the metre M, use to predict the
others
60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
Pow
er (W
)
Warps
Average Power for different computations using four blocks and varying the amount of warps on Nvidia K20c
Vector AddMatrix Add
Matrix Multiplication
62 64 66 68 70 72 74 76 78 80 82 84 86 88 90 92 94 96 98
100 102 104 106 108 110 112
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
Pow
er (W
)
Warps
Average Power for different computations using eight blocks and varying the amount of warps on Nvidia K20c
Vector AddMatrix Add
Matrix Multiplication
We define αC , increment in power of C divided by increment of metre
αC (b,w) =PC (b,w)− (Pbase +Pblockb)
PM(b,w)− (Pbase +Pblockb)
Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 11 / 23
![Page 58: Energy Models in Data Parallel CPU/GPU Computations](https://reader034.fdocuments.us/reader034/viewer/2022051504/58f024671a28abee618b45e3/html5/thumbnails/58.jpg)
Heuristic model for power estimation
IdeaI Similar behaviours for map on GPUI Use a computation as metre to measure
othersI Study the metre M, use to predict the
others
60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
Pow
er (W
)
Warps
Average Power for different computations using four blocks and varying the amount of warps on Nvidia K20c
Vector AddMatrix Add
Matrix Multiplication
62 64 66 68 70 72 74 76 78 80 82 84 86 88 90 92 94 96 98
100 102 104 106 108 110 112
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
Pow
er (W
)
Warps
Average Power for different computations using eight blocks and varying the amount of warps on Nvidia K20c
Vector AddMatrix Add
Matrix Multiplication
We define αC , increment in power of C divided by increment of metre
αC (b,w) =PC (b,w)− (Pbase +Pblockb)
PM(b,w)− (Pbase +Pblockb)
Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 11 / 23
![Page 59: Energy Models in Data Parallel CPU/GPU Computations](https://reader034.fdocuments.us/reader034/viewer/2022051504/58f024671a28abee618b45e3/html5/thumbnails/59.jpg)
Heuristic model for power estimation
IdeaI Similar behaviours for map on GPUI Use a computation as metre to measure
othersI Study the metre M, use to predict the
others
60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
Pow
er (W
)
Warps
Average Power for different computations using four blocks and varying the amount of warps on Nvidia K20c
Vector AddMatrix Add
Matrix Multiplication
62 64 66 68 70 72 74 76 78 80 82 84 86 88 90 92 94 96 98
100 102 104 106 108 110 112
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
Pow
er (W
)
Warps
Average Power for different computations using eight blocks and varying the amount of warps on Nvidia K20c
Vector AddMatrix Add
Matrix Multiplication
We define αC , increment in power of C divided by increment of metre
αC (b,w) =PC (b,w)− (Pbase +Pblockb)
PM(b,w)− (Pbase +Pblockb)
Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 11 / 23
![Page 60: Energy Models in Data Parallel CPU/GPU Computations](https://reader034.fdocuments.us/reader034/viewer/2022051504/58f024671a28abee618b45e3/html5/thumbnails/60.jpg)
Heuristic model for power estimation
IdeaI Similar behaviours for map on GPUI Use a computation as metre to measure
othersI Study the metre M, use to predict the
others
60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
Pow
er (W
)
Warps
Average Power for different computations using four blocks and varying the amount of warps on Nvidia K20c
Vector AddMatrix Add
Matrix Multiplication
62 64 66 68 70 72 74 76 78 80 82 84 86 88 90 92 94 96 98
100 102 104 106 108 110 112
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
Pow
er (W
)
Warps
Average Power for different computations using eight blocks and varying the amount of warps on Nvidia K20c
Vector AddMatrix Add
Matrix Multiplication
We define αC , increment in power of C divided by increment of metre
αC (b,w) =PC (b,w)− (Pbase +Pblockb)
PM(b,w)− (Pbase +Pblockb)
Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 11 / 23
![Page 61: Energy Models in Data Parallel CPU/GPU Computations](https://reader034.fdocuments.us/reader034/viewer/2022051504/58f024671a28abee618b45e3/html5/thumbnails/61.jpg)
Heuristic model for power estimation
IdeaI Similar behaviours for map on GPUI Use a computation as metre to measure
othersI Study the metre M, use to predict the
others
60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
Pow
er (W
)
Warps
Average Power for different computations using four blocks and varying the amount of warps on Nvidia K20c
Vector AddMatrix Add
Matrix Multiplication
62 64 66 68 70 72 74 76 78 80 82 84 86 88 90 92 94 96 98
100 102 104 106 108 110 112
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
Pow
er (W
)
Warps
Average Power for different computations using eight blocks and varying the amount of warps on Nvidia K20c
Vector AddMatrix Add
Matrix Multiplication
We define αC , increment in power of C divided by increment of metre
αC (b,w) =PC (b,w)− (Pbase +Pblockb)
PM(b,w)− (Pbase +Pblockb)
Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 11 / 23
![Page 62: Energy Models in Data Parallel CPU/GPU Computations](https://reader034.fdocuments.us/reader034/viewer/2022051504/58f024671a28abee618b45e3/html5/thumbnails/62.jpg)
Heuristic model for power estimation
IdeaI Similar behaviours for map on GPUI Use a computation as metre to measure
othersI Study the metre M, use to predict the
others
60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
Pow
er (W
)
Warps
Average Power for different computations using four blocks and varying the amount of warps on Nvidia K20c
Vector AddMatrix Add
Matrix Multiplication
62 64 66 68 70 72 74 76 78 80 82 84 86 88 90 92 94 96 98
100 102 104 106 108 110 112
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
Pow
er (W
)
Warps
Average Power for different computations using eight blocks and varying the amount of warps on Nvidia K20c
Vector AddMatrix Add
Matrix Multiplication
We define αC , increment in power of C divided by increment of metre
αC (b,w) =PC (b,w)− (Pbase +Pblockb)
PM(b,w)− (Pbase +Pblockb)
Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 11 / 23
![Page 63: Energy Models in Data Parallel CPU/GPU Computations](https://reader034.fdocuments.us/reader034/viewer/2022051504/58f024671a28abee618b45e3/html5/thumbnails/63.jpg)
Heuristic model for power estimation
IdeaI Similar behaviours for map on GPUI Use a computation as metre to measure
othersI Study the metre M, use to predict the
others
60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
Pow
er (W
)
Warps
Average Power for different computations using four blocks and varying the amount of warps on Nvidia K20c
Vector AddMatrix Add
Matrix Multiplication
62 64 66 68 70 72 74 76 78 80 82 84 86 88 90 92 94 96 98
100 102 104 106 108 110 112
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
Pow
er (W
)
Warps
Average Power for different computations using eight blocks and varying the amount of warps on Nvidia K20c
Vector AddMatrix Add
Matrix Multiplication
We define αC , increment in power of C divided by increment of metre
αC (b,w) =PC (b,w)− (Pbase +Pblockb)
PM(b,w)− (Pbase +Pblockb)
Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 11 / 23
![Page 64: Energy Models in Data Parallel CPU/GPU Computations](https://reader034.fdocuments.us/reader034/viewer/2022051504/58f024671a28abee618b45e3/html5/thumbnails/64.jpg)
Heuristic model for power estimation
IdeaI Similar behaviours for map on GPUI Use a computation as metre to measure
othersI Study the metre M, use to predict the
others
60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
Pow
er (W
)
Warps
Average Power for different computations using four blocks and varying the amount of warps on Nvidia K20c
Vector AddMatrix Add
Matrix Multiplication
62 64 66 68 70 72 74 76 78 80 82 84 86 88 90 92 94 96 98
100 102 104 106 108 110 112
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
Pow
er (W
)
Warps
Average Power for different computations using eight blocks and varying the amount of warps on Nvidia K20c
Vector AddMatrix Add
Matrix Multiplication
We define αC , increment in power of C divided by increment of metre
αC (b,w) =PC (b,w)− (Pbase +Pblockb)
PM(b,w)− (Pbase +Pblockb)
Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 11 / 23
![Page 65: Energy Models in Data Parallel CPU/GPU Computations](https://reader034.fdocuments.us/reader034/viewer/2022051504/58f024671a28abee618b45e3/html5/thumbnails/65.jpg)
αC analysis and power estimator
I αC concentrated near αC
I over more than 2000 samples, 87% ofvalues of αC within ±1 from the average
Power predictor using αC
PC (b,w) = Pbase +αC (Pwarp+P ′warpb)√w
Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 12 / 23
![Page 66: Energy Models in Data Parallel CPU/GPU Computations](https://reader034.fdocuments.us/reader034/viewer/2022051504/58f024671a28abee618b45e3/html5/thumbnails/66.jpg)
αC analysis and power estimator
I αC concentrated near αC
I over more than 2000 samples, 87% ofvalues of αC within ±1 from the average
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4253997603
0.9211608897
1.4169220192
1.9126831486
2.408444278
2.9042054075
3.3999665369
3.8957276664
4.3914887958
4.8872499252
5.3830110547
Freq
uenc
y
values
Empirical distribution of alpha for vector addition
Empirical frequency
Power predictor using αC
PC (b,w) = Pbase +αC (Pwarp+P ′warpb)√w
Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 12 / 23
![Page 67: Energy Models in Data Parallel CPU/GPU Computations](https://reader034.fdocuments.us/reader034/viewer/2022051504/58f024671a28abee618b45e3/html5/thumbnails/67.jpg)
αC analysis and power estimator
I αC concentrated near αC
I over more than 2000 samples, 87% ofvalues of αC within ±1 from the average
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4253997603
0.9211608897
1.4169220192
1.9126831486
2.408444278
2.9042054075
3.3999665369
3.8957276664
4.3914887958
4.8872499252
5.3830110547
Freq
uenc
y
values
Empirical distribution of alpha for vector addition
Empirical frequency
Power predictor using αC
PC (b,w) = Pbase +αC (Pwarp+P ′warpb)√w
Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 12 / 23
![Page 68: Energy Models in Data Parallel CPU/GPU Computations](https://reader034.fdocuments.us/reader034/viewer/2022051504/58f024671a28abee618b45e3/html5/thumbnails/68.jpg)
αC analysis and power estimator
I αC concentrated near αC
I over more than 2000 samples, 87% ofvalues of αC within ±1 from the average
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4253997603
0.9211608897
1.4169220192
1.9126831486
2.408444278
2.9042054075
3.3999665369
3.8957276664
4.3914887958
4.8872499252
5.3830110547
Freq
uenc
y
values
Empirical distribution of alpha for vector addition
Empirical frequency
Power predictor using αC
PC (b,w) = Pbase +αC (Pwarp+P ′warpb)√w
Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 12 / 23
![Page 69: Energy Models in Data Parallel CPU/GPU Computations](https://reader034.fdocuments.us/reader034/viewer/2022051504/58f024671a28abee618b45e3/html5/thumbnails/69.jpg)
Validation of the heuristic
Heuristic is convenient if a single sample returns accurate estimation. Wecan model power with low energy footprint!
Validation
I Sample power for 500 random (b, w) pairs, calculate αC (b,w)
I try to predict power for random (b’, w’) using the achieved αC (b,w)
Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 13 / 23
![Page 70: Energy Models in Data Parallel CPU/GPU Computations](https://reader034.fdocuments.us/reader034/viewer/2022051504/58f024671a28abee618b45e3/html5/thumbnails/70.jpg)
Validation of the heuristic
Heuristic is convenient if a single sample returns accurate estimation. Wecan model power with low energy footprint!
ValidationI Sample power for 500 random (b, w) pairs, calculate αC (b,w)
I try to predict power for random (b’, w’) using the achieved αC (b,w)
Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 13 / 23
![Page 71: Energy Models in Data Parallel CPU/GPU Computations](https://reader034.fdocuments.us/reader034/viewer/2022051504/58f024671a28abee618b45e3/html5/thumbnails/71.jpg)
Validation of the heuristic
Heuristic is convenient if a single sample returns accurate estimation. Wecan model power with low energy footprint!
ValidationI Sample power for 500 random (b, w) pairs, calculate αC (b,w)
I try to predict power for random (b’, w’) using the achieved αC (b,w)
Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 13 / 23
![Page 72: Energy Models in Data Parallel CPU/GPU Computations](https://reader034.fdocuments.us/reader034/viewer/2022051504/58f024671a28abee618b45e3/html5/thumbnails/72.jpg)
Validation of the heuristic
Heuristic is convenient if a single sample returns accurate estimation. Wecan model power with low energy footprint!
ValidationI Sample power for 500 random (b, w) pairs, calculate αC (b,w)
I try to predict power for random (b’, w’) using the achieved αC (b,w)
0
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
0.1
0.11
0.12
0.13
0.14
0.15
-15.81%
-14.48%
-13.15%
-11.82%
-10.49%
-9.16%-7.84%
-6.51%-5.18%
-3.85%-2.52%
-1.19%0.13%
1.46%2.79%
4.12%5.45%
6.78%8.10%
9.43%10.76%
12.09%
13.42%
14.74%
16.07%
17.40%
18.73%
20.06%
21.39%
22.71%
24.04%
25.37%
26.70%
28.03%
29.36%
30.68%
32.01%
33.34%
34.67%
36.00%
37.32%
Erro
r pro
babi
lity
Error
Error distribution using the heuristic over 500 samples. Nvidia K40m
Error distribution
Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 13 / 23
![Page 73: Energy Models in Data Parallel CPU/GPU Computations](https://reader034.fdocuments.us/reader034/viewer/2022051504/58f024671a28abee618b45e3/html5/thumbnails/73.jpg)
Validation of the heuristic
Heuristic is convenient if a single sample returns accurate estimation. Wecan model power with low energy footprint!
ValidationI Sample power for 500 random (b, w) pairs, calculate αC (b,w)
I try to predict power for random (b’, w’) using the achieved αC (b,w)
0
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
0.1
0.11
0.12
0.13
0.14
0.15
-15.81%
-14.48%
-13.15%
-11.82%
-10.49%
-9.16%-7.84%
-6.51%-5.18%
-3.85%-2.52%
-1.19%0.13%
1.46%2.79%
4.12%5.45%
6.78%8.10%
9.43%10.76%
12.09%
13.42%
14.74%
16.07%
17.40%
18.73%
20.06%
21.39%
22.71%
24.04%
25.37%
26.70%
28.03%
29.36%
30.68%
32.01%
33.34%
34.67%
36.00%
37.32%
Erro
r pro
babi
lity
Error
Error distribution using the heuristic over 500 samples. Nvidia K40m
Error distribution
Less than 10% error with high probability (over 85% of the cases), lessthan 15% error with probability over 95%!Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 13 / 23
![Page 74: Energy Models in Data Parallel CPU/GPU Computations](https://reader034.fdocuments.us/reader034/viewer/2022051504/58f024671a28abee618b45e3/html5/thumbnails/74.jpg)
Estimating GPU energy consumption
Energy = PC (b,w)×T (b,w)
T (b,w) = d Nb×w×32e×TGPU
f ' T (1,1)b×w
I Error below 15% using a single sample with over 95% probability!
I Same for other computations and devices until threads map to CUDA cores.
Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 14 / 23
![Page 75: Energy Models in Data Parallel CPU/GPU Computations](https://reader034.fdocuments.us/reader034/viewer/2022051504/58f024671a28abee618b45e3/html5/thumbnails/75.jpg)
Estimating GPU energy consumption
Energy = PC (b,w)×T (b,w)
T (b,w) = d Nb×w×32e×TGPU
f ' T (1,1)b×w
I Error below 15% using a single sample with over 95% probability!
I Same for other computations and devices until threads map to CUDA cores.
Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 14 / 23
![Page 76: Energy Models in Data Parallel CPU/GPU Computations](https://reader034.fdocuments.us/reader034/viewer/2022051504/58f024671a28abee618b45e3/html5/thumbnails/76.jpg)
Estimating GPU energy consumption
Energy = PC (b,w)×T (b,w)
T (b,w) = d Nb×w×32e×TGPU
f ' T (1,1)b×w
0 0.01
0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
0.1 0.11 0.12
0.13 0.14 0.15 0.16
-14.82%
-12.34%
-9.86%-7.38%
-4.90%-2.43%
0.05%2.53%
5.01%7.49%
9.96%12.44%
14.92%
17.40%
19.87%
22.35%
24.83%
27.31%
29.79%
32.26%
34.74%
Erro
r pro
babi
lity
Error
Error in energy estimation distribution, using the heuristic over 500 samples. Nvidia K40m
Error distribution
I Error below 15% using a single sample with over 95% probability!
I Same for other computations and devices until threads map to CUDA cores.
Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 14 / 23
![Page 77: Energy Models in Data Parallel CPU/GPU Computations](https://reader034.fdocuments.us/reader034/viewer/2022051504/58f024671a28abee618b45e3/html5/thumbnails/77.jpg)
Estimating GPU energy consumption
Energy = PC (b,w)×T (b,w)
T (b,w) = d Nb×w×32e×TGPU
f ' T (1,1)b×w
0 0.01
0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
0.1 0.11 0.12
0.13 0.14 0.15 0.16
-14.82%
-12.34%
-9.86%-7.38%
-4.90%-2.43%
0.05%2.53%
5.01%7.49%
9.96%12.44%
14.92%
17.40%
19.87%
22.35%
24.83%
27.31%
29.79%
32.26%
34.74%
Erro
r pro
babi
lity
Error
Error in energy estimation distribution, using the heuristic over 500 samples. Nvidia K40m
Error distribution
I Error below 15% using a single sample with over 95% probability!
I Same for other computations and devices until threads map to CUDA cores.
Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 14 / 23
![Page 78: Energy Models in Data Parallel CPU/GPU Computations](https://reader034.fdocuments.us/reader034/viewer/2022051504/58f024671a28abee618b45e3/html5/thumbnails/78.jpg)
Estimating GPU energy consumption
Energy = PC (b,w)×T (b,w)
T (b,w) = d Nb×w×32e×TGPU
f ' T (1,1)b×w
0 0.01
0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
0.1 0.11 0.12
0.13 0.14 0.15 0.16
-14.82%
-12.34%
-9.86%-7.38%
-4.90%-2.43%
0.05%2.53%
5.01%7.49%
9.96%12.44%
14.92%
17.40%
19.87%
22.35%
24.83%
27.31%
29.79%
32.26%
34.74%
Erro
r pro
babi
lity
Error
Error in energy estimation distribution, using the heuristic over 500 samples. Nvidia K40m
Error distribution
I Error below 15% using a single sample with over 95% probability!
I Same for other computations and devices until threads map to CUDA cores.
Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 14 / 23
![Page 79: Energy Models in Data Parallel CPU/GPU Computations](https://reader034.fdocuments.us/reader034/viewer/2022051504/58f024671a28abee618b45e3/html5/thumbnails/79.jpg)
Section 3
CPU Energy Model
Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 15 / 23
![Page 80: Energy Models in Data Parallel CPU/GPU Computations](https://reader034.fdocuments.us/reader034/viewer/2022051504/58f024671a28abee618b45e3/html5/thumbnails/80.jpg)
Predicting power of a map computation
Again: Energy = (Average) Power × TimeI Explanatory variable for power is n, number of cores usedI Power grows linearly in the number of workers used in the computation
40
50
60
70
80
90
100
110
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Pow
er (W
)
Workers
Average power varying the number of processing units allocated to different map computations
Vector AddMap Atan
Matrix add
Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 16 / 23
![Page 81: Energy Models in Data Parallel CPU/GPU Computations](https://reader034.fdocuments.us/reader034/viewer/2022051504/58f024671a28abee618b45e3/html5/thumbnails/81.jpg)
Predicting power of a map computation
Again: Energy = (Average) Power × TimeI Explanatory variable for power is n, number of cores usedI Power grows linearly in the number of workers used in the computation
40
50
60
70
80
90
100
110
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Pow
er (W
)
Workers
Average power varying the number of processing units allocated to different map computations
Vector AddMap Atan
Matrix add
Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 16 / 23
![Page 82: Energy Models in Data Parallel CPU/GPU Computations](https://reader034.fdocuments.us/reader034/viewer/2022051504/58f024671a28abee618b45e3/html5/thumbnails/82.jpg)
Energy consumption using regression
Power predictor (with regression):
Pc(n) = β0+β1×n
No need for an heuristic!
I More precise measures, no data transferI 2 executions with different parallelism degrees are enough to find
regression parametersI Negligible footprint (e.g.: ∼ 0.3s, ∼ 30J)
Energy Predictor:
Ec(n) = Pc(n)×dN
neTf ' Pc(n)×
T (1)n
Relative energy error below 5% with probability over 85%.
Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 17 / 23
![Page 83: Energy Models in Data Parallel CPU/GPU Computations](https://reader034.fdocuments.us/reader034/viewer/2022051504/58f024671a28abee618b45e3/html5/thumbnails/83.jpg)
Energy consumption using regression
Power predictor (with regression):
Pc(n) = β0+β1×n
0
0.05
0.1
0.15
0.2
0.25
-3.68%-3.14%
-2.60%-2.06%
-1.52%-0.98%
-0.44%0.10%
0.64%1.18%
1.72%
Prob
abili
ty
Error
Error distribution for estimating cpu power consumption with regression
Empirical distributionN(mu, sigma)
No need for an heuristic!I More precise measures, no data transfer
I 2 executions with different parallelism degrees are enough to findregression parameters
I Negligible footprint (e.g.: ∼ 0.3s, ∼ 30J)
Energy Predictor:
Ec(n) = Pc(n)×dN
neTf ' Pc(n)×
T (1)n
Relative energy error below 5% with probability over 85%.
Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 17 / 23
![Page 84: Energy Models in Data Parallel CPU/GPU Computations](https://reader034.fdocuments.us/reader034/viewer/2022051504/58f024671a28abee618b45e3/html5/thumbnails/84.jpg)
Energy consumption using regression
Power predictor (with regression):
Pc(n) = β0+β1×n
0
0.05
0.1
0.15
0.2
0.25
-3.68%-3.14%
-2.60%-2.06%
-1.52%-0.98%
-0.44%0.10%
0.64%1.18%
1.72%
Prob
abili
ty
Error
Error distribution for estimating cpu power consumption with regression
Empirical distributionN(mu, sigma)
No need for an heuristic!I More precise measures, no data transferI 2 executions with different parallelism degrees are enough to find
regression parameters
I Negligible footprint (e.g.: ∼ 0.3s, ∼ 30J)
Energy Predictor:
Ec(n) = Pc(n)×dN
neTf ' Pc(n)×
T (1)n
Relative energy error below 5% with probability over 85%.
Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 17 / 23
![Page 85: Energy Models in Data Parallel CPU/GPU Computations](https://reader034.fdocuments.us/reader034/viewer/2022051504/58f024671a28abee618b45e3/html5/thumbnails/85.jpg)
Energy consumption using regression
Power predictor (with regression):
Pc(n) = β0+β1×n
0
0.05
0.1
0.15
0.2
0.25
-3.68%-3.14%
-2.60%-2.06%
-1.52%-0.98%
-0.44%0.10%
0.64%1.18%
1.72%
Prob
abili
ty
Error
Error distribution for estimating cpu power consumption with regression
Empirical distributionN(mu, sigma)
No need for an heuristic!I More precise measures, no data transferI 2 executions with different parallelism degrees are enough to find
regression parametersI Negligible footprint (e.g.: ∼ 0.3s, ∼ 30J)
Energy Predictor:
Ec(n) = Pc(n)×dN
neTf ' Pc(n)×
T (1)n
Relative energy error below 5% with probability over 85%.
Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 17 / 23
![Page 86: Energy Models in Data Parallel CPU/GPU Computations](https://reader034.fdocuments.us/reader034/viewer/2022051504/58f024671a28abee618b45e3/html5/thumbnails/86.jpg)
Energy consumption using regression
Power predictor (with regression):
Pc(n) = β0+β1×n
0
0.05
0.1
0.15
0.2
0.25
-3.68%-3.14%
-2.60%-2.06%
-1.52%-0.98%
-0.44%0.10%
0.64%1.18%
1.72%
Prob
abili
ty
Error
Error distribution for estimating cpu power consumption with regression
Empirical distributionN(mu, sigma)
No need for an heuristic!I More precise measures, no data transferI 2 executions with different parallelism degrees are enough to find
regression parametersI Negligible footprint (e.g.: ∼ 0.3s, ∼ 30J)
Energy Predictor:
Ec(n) = Pc(n)×dN
neTf ' Pc(n)×
T (1)n
Relative energy error below 5% with probability over 85%.
Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 17 / 23
![Page 87: Energy Models in Data Parallel CPU/GPU Computations](https://reader034.fdocuments.us/reader034/viewer/2022051504/58f024671a28abee618b45e3/html5/thumbnails/87.jpg)
Energy consumption using regression
Power predictor (with regression):
Pc(n) = β0+β1×n
0
0.05
0.1
0.15
0.2
0.25
-3.68%-3.14%
-2.60%-2.06%
-1.52%-0.98%
-0.44%0.10%
0.64%1.18%
1.72%
Prob
abili
ty
Error
Error distribution for estimating cpu power consumption with regression
Empirical distributionN(mu, sigma)
No need for an heuristic!I More precise measures, no data transferI 2 executions with different parallelism degrees are enough to find
regression parametersI Negligible footprint (e.g.: ∼ 0.3s, ∼ 30J)
Energy Predictor:
Ec(n) = Pc(n)×dN
neTf ' Pc(n)×
T (1)n
Relative energy error below 5% with probability over 85%.Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 17 / 23
![Page 88: Energy Models in Data Parallel CPU/GPU Computations](https://reader034.fdocuments.us/reader034/viewer/2022051504/58f024671a28abee618b45e3/html5/thumbnails/88.jpg)
Energy consumption using regression
Power predictor (with regression):
Pc(n) = β0+β1×n
0
0.05
0.1
0.15
0.2
0.25
-3.68%-3.14%
-2.60%-2.06%
-1.52%-0.98%
-0.44%0.10%
0.64%1.18%
1.72%
Prob
abili
ty
Error
Error distribution for estimating cpu power consumption with regression
Empirical distributionN(mu, sigma)
No need for an heuristic!I More precise measures, no data transferI 2 executions with different parallelism degrees are enough to find
regression parametersI Negligible footprint (e.g.: ∼ 0.3s, ∼ 30J)
Energy Predictor:
Ec(n) = Pc(n)×dN
neTf ' Pc(n)×
T (1)n
Relative energy error below 5% with probability over 85%.Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 17 / 23
![Page 89: Energy Models in Data Parallel CPU/GPU Computations](https://reader034.fdocuments.us/reader034/viewer/2022051504/58f024671a28abee618b45e3/html5/thumbnails/89.jpg)
Section 4
Aggregated model and usage
Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 18 / 23
![Page 90: Energy Models in Data Parallel CPU/GPU Computations](https://reader034.fdocuments.us/reader034/viewer/2022051504/58f024671a28abee618b45e3/html5/thumbnails/90.jpg)
Aggregated Model
Consider n CPU Cores; b used blocks (SM) with w warps each and gfraction of task transferred to device. Then
E (n,b,w ,g) =PCPU(n)×TCPU(n,1−g)+ PGPU(b,w)×TGPU(b,w ,g)+
+2×Psend ×Tsend(g)+Estatic
Estatic minimized calculating g such that:
TCPU(n,1−g)' TGPU(b,w ,g)+2×Tsend(g)
Average energy cost of a task for (n, b, w):
E (n,b,w) = (1−g)PCPU(n)TCPUf +gPGPU(b,w)TGPU
f
To save energySelect (n,b,w) with minimum E (n,b,w) for executing!
Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 19 / 23
![Page 91: Energy Models in Data Parallel CPU/GPU Computations](https://reader034.fdocuments.us/reader034/viewer/2022051504/58f024671a28abee618b45e3/html5/thumbnails/91.jpg)
Aggregated Model
Consider n CPU Cores; b used blocks (SM) with w warps each and gfraction of task transferred to device. Then
E (n,b,w ,g) =PCPU(n)×TCPU(n,1−g)+ PGPU(b,w)×TGPU(b,w ,g)+
+2×Psend ×Tsend(g)+Estatic
Estatic minimized calculating g such that:
TCPU(n,1−g)' TGPU(b,w ,g)+2×Tsend(g)
Average energy cost of a task for (n, b, w):
E (n,b,w) = (1−g)PCPU(n)TCPUf +gPGPU(b,w)TGPU
f
To save energySelect (n,b,w) with minimum E (n,b,w) for executing!
Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 19 / 23
![Page 92: Energy Models in Data Parallel CPU/GPU Computations](https://reader034.fdocuments.us/reader034/viewer/2022051504/58f024671a28abee618b45e3/html5/thumbnails/92.jpg)
Aggregated Model
Consider n CPU Cores; b used blocks (SM) with w warps each and gfraction of task transferred to device. Then
E (n,b,w ,g) =PCPU(n)×TCPU(n,1−g)+ PGPU(b,w)×TGPU(b,w ,g)+
+2×Psend ×Tsend(g)+Estatic
Estatic minimized calculating g such that:
TCPU(n,1−g)' TGPU(b,w ,g)+2×Tsend(g)
Average energy cost of a task for (n, b, w):
E (n,b,w) = (1−g)PCPU(n)TCPUf +gPGPU(b,w)TGPU
f
To save energySelect (n,b,w) with minimum E (n,b,w) for executing!
Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 19 / 23
![Page 93: Energy Models in Data Parallel CPU/GPU Computations](https://reader034.fdocuments.us/reader034/viewer/2022051504/58f024671a28abee618b45e3/html5/thumbnails/93.jpg)
Aggregated Model
Consider n CPU Cores; b used blocks (SM) with w warps each and gfraction of task transferred to device. Then
E (n,b,w ,g) =PCPU(n)×TCPU(n,1−g)+ PGPU(b,w)×TGPU(b,w ,g)+
+2×Psend ×Tsend(g)+Estatic
Estatic minimized calculating g such that:
TCPU(n,1−g)' TGPU(b,w ,g)+2×Tsend(g)
Average energy cost of a task for (n, b, w):
E (n,b,w) = (1−g)PCPU(n)TCPUf +gPGPU(b,w)TGPU
f
To save energySelect (n,b,w) with minimum E (n,b,w) for executing!
Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 19 / 23
![Page 94: Energy Models in Data Parallel CPU/GPU Computations](https://reader034.fdocuments.us/reader034/viewer/2022051504/58f024671a28abee618b45e3/html5/thumbnails/94.jpg)
Aggregated Model
Consider n CPU Cores; b used blocks (SM) with w warps each and gfraction of task transferred to device. Then
E (n,b,w ,g) =PCPU(n)×TCPU(n,1−g)+ PGPU(b,w)×TGPU(b,w ,g)+
+2×Psend ×Tsend(g)+Estatic
Estatic minimized calculating g such that:
TCPU(n,1−g)' TGPU(b,w ,g)+2×Tsend(g)
Average energy cost of a task for (n, b, w):
E (n,b,w) = (1−g)PCPU(n)TCPUf +gPGPU(b,w)TGPU
f
To save energySelect (n,b,w) with minimum E (n,b,w) for executing!
Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 19 / 23
![Page 95: Energy Models in Data Parallel CPU/GPU Computations](https://reader034.fdocuments.us/reader034/viewer/2022051504/58f024671a28abee618b45e3/html5/thumbnails/95.jpg)
Aggregated Model
Consider n CPU Cores; b used blocks (SM) with w warps each and gfraction of task transferred to device. Then
E (n,b,w ,g) =PCPU(n)×TCPU(n,1−g)+ PGPU(b,w)×TGPU(b,w ,g)+
+2×Psend ×Tsend(g)+Estatic
Estatic minimized calculating g such that:
TCPU(n,1−g)' TGPU(b,w ,g)+2×Tsend(g)
Average energy cost of a task for (n, b, w):
E (n,b,w) = (1−g)PCPU(n)TCPUf +gPGPU(b,w)TGPU
f
To save energySelect (n,b,w) with minimum E (n,b,w) for executing!
Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 19 / 23
![Page 96: Energy Models in Data Parallel CPU/GPU Computations](https://reader034.fdocuments.us/reader034/viewer/2022051504/58f024671a28abee618b45e3/html5/thumbnails/96.jpg)
Saving energy - Matrix Multiplication
I Matrix multiplication: partition in rowsI Calculate 3 tasks (with n=1,2) on CPU to calculate regression
parameters
I Calculate 1 task on the GPU to calculate αMM
I Remove uninteresting configurations
I For each triple (n, b, w) calculate averageenergy for task (row)
I Select triple with minimum task energyconsumption as execution configuration
Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 20 / 23
![Page 97: Energy Models in Data Parallel CPU/GPU Computations](https://reader034.fdocuments.us/reader034/viewer/2022051504/58f024671a28abee618b45e3/html5/thumbnails/97.jpg)
Saving energy - Matrix Multiplication
I Matrix multiplication: partition in rowsI Calculate 3 tasks (with n=1,2) on CPU to calculate regression
parametersI Calculate 1 task on the GPU to calculate αMM
I Remove uninteresting configurations
I For each triple (n, b, w) calculate averageenergy for task (row)
I Select triple with minimum task energyconsumption as execution configuration
Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 20 / 23
![Page 98: Energy Models in Data Parallel CPU/GPU Computations](https://reader034.fdocuments.us/reader034/viewer/2022051504/58f024671a28abee618b45e3/html5/thumbnails/98.jpg)
Saving energy - Matrix Multiplication
I Matrix multiplication: partition in rowsI Calculate 3 tasks (with n=1,2) on CPU to calculate regression
parametersI Calculate 1 task on the GPU to calculate αMM
I Remove uninteresting configurations
I For each triple (n, b, w) calculate averageenergy for task (row)
I Select triple with minimum task energyconsumption as execution configuration
Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 20 / 23
![Page 99: Energy Models in Data Parallel CPU/GPU Computations](https://reader034.fdocuments.us/reader034/viewer/2022051504/58f024671a28abee618b45e3/html5/thumbnails/99.jpg)
Saving energy - Matrix Multiplication
I Matrix multiplication: partition in rowsI Calculate 3 tasks (with n=1,2) on CPU to calculate regression
parametersI Calculate 1 task on the GPU to calculate αMM
I Remove uninteresting configurationsPower for vector addition, with different (b, w) parameters
Power
1 2 3 4 5 6 7 8 9 10 11 12 13Blocks 0 2
4 6
8 10
12 14
16 18
20 22
24 26
28 30
32
Warps 55 60 65 70 75 80 85 90 95
100 105 110 115 120 125 130 135 140
Power (W)
I For each triple (n, b, w) calculate averageenergy for task (row)
I Select triple with minimum task energyconsumption as execution configuration
Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 20 / 23
![Page 100: Energy Models in Data Parallel CPU/GPU Computations](https://reader034.fdocuments.us/reader034/viewer/2022051504/58f024671a28abee618b45e3/html5/thumbnails/100.jpg)
Saving energy - Matrix Multiplication
I Matrix multiplication: partition in rowsI Calculate 3 tasks (with n=1,2) on CPU to calculate regression
parametersI Calculate 1 task on the GPU to calculate αMM
I Remove uninteresting configurations
I For each triple (n, b, w) calculate averageenergy for task (row)
I Select triple with minimum task energyconsumption as execution configuration
Power for vector addition, with different (b, w) parameters
Power
1 2 3 4 5 6 7 8 9 10 11 12 13Blocks 0 2
4 6
8 10
12 14
16 18
20 22
24 26
28 30
32
Warps 55 60 65 70 75 80 85 90 95
100 105 110 115 120 125 130 135 140
Power (W)
Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 20 / 23
![Page 101: Energy Models in Data Parallel CPU/GPU Computations](https://reader034.fdocuments.us/reader034/viewer/2022051504/58f024671a28abee618b45e3/html5/thumbnails/101.jpg)
Saving energy - Matrix Multiplication
I Matrix multiplication: partition in rowsI Calculate 3 tasks (with n=1,2) on CPU to calculate regression
parametersI Calculate 1 task on the GPU to calculate αMM
I Remove uninteresting configurations
I For each triple (n, b, w) calculate averageenergy for task (row)
I Select triple with minimum task energyconsumption as execution configuration
Power for vector addition, with different (b, w) parameters
Power
1 2 3 4 5 6 7 8 9 10 11 12 13Blocks 0 2
4 6
8 10
12 14
16 18
20 22
24 26
28 30
32
Warps 55 60 65 70 75 80 85 90 95
100 105 110 115 120 125 130 135 140
Power (W)
Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 20 / 23
![Page 102: Energy Models in Data Parallel CPU/GPU Computations](https://reader034.fdocuments.us/reader034/viewer/2022051504/58f024671a28abee618b45e3/html5/thumbnails/102.jpg)
Saving energy - Matrix Multiplication (results)
Execution parameters:
I n = 1
I b = 14, w = 32I g = 0.9931
For 9000x9000 matrix multiplication:Device Time EnergyCPU 25.79s 1035.15JGPU 27.01s 3064.32J
Total: 4147.92J
Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 21 / 23
![Page 103: Energy Models in Data Parallel CPU/GPU Computations](https://reader034.fdocuments.us/reader034/viewer/2022051504/58f024671a28abee618b45e3/html5/thumbnails/103.jpg)
Saving energy - Matrix Multiplication (results)
Execution parameters:
I n = 1I b = 14, w = 32
I g = 0.9931
For 9000x9000 matrix multiplication:Device Time EnergyCPU 25.79s 1035.15JGPU 27.01s 3064.32J
Total: 4147.92J
Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 21 / 23
![Page 104: Energy Models in Data Parallel CPU/GPU Computations](https://reader034.fdocuments.us/reader034/viewer/2022051504/58f024671a28abee618b45e3/html5/thumbnails/104.jpg)
Saving energy - Matrix Multiplication (results)
Execution parameters:
I n = 1I b = 14, w = 32I g = 0.9931
For 9000x9000 matrix multiplication:Device Time EnergyCPU 25.79s 1035.15JGPU 27.01s 3064.32J
Total: 4147.92J
Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 21 / 23
![Page 105: Energy Models in Data Parallel CPU/GPU Computations](https://reader034.fdocuments.us/reader034/viewer/2022051504/58f024671a28abee618b45e3/html5/thumbnails/105.jpg)
Saving energy - Matrix Multiplication (results)
Execution parameters:
I n = 1I b = 14, w = 32I g = 0.9931
For 9000x9000 matrix multiplication:Device Time EnergyCPU 25.79s 1035.15JGPU 27.01s 3064.32J
Total: 4147.92J
Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 21 / 23
![Page 106: Energy Models in Data Parallel CPU/GPU Computations](https://reader034.fdocuments.us/reader034/viewer/2022051504/58f024671a28abee618b45e3/html5/thumbnails/106.jpg)
Saving energy - Matrix Multiplication (results)
Execution parameters:
I n = 1I b = 14, w = 32I g = 0.9931
For 9000x9000 matrix multiplication:Device Time EnergyCPU 25.79s 1035.15JGPU 27.01s 3064.32J
Total: 4147.92J
Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 21 / 23
![Page 107: Energy Models in Data Parallel CPU/GPU Computations](https://reader034.fdocuments.us/reader034/viewer/2022051504/58f024671a28abee618b45e3/html5/thumbnails/107.jpg)
Saving energy - Matrix Multiplication (results)
Execution parameters:
I n = 1I b = 14, w = 32I g = 0.9931
For 9000x9000 matrix multiplication:Device Time EnergyCPU 25.79s 1035.15JGPU 27.01s 3064.32J
Total: 4147.92J
Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 21 / 23
![Page 108: Energy Models in Data Parallel CPU/GPU Computations](https://reader034.fdocuments.us/reader034/viewer/2022051504/58f024671a28abee618b45e3/html5/thumbnails/108.jpg)
Conclusions
I Structured parallel applications have similar energy footprint on CPUand GPU
I We used this similarity to develop accurate and efficient models forenergy prediction
I Using the models we can save 14% energy w.r.t time optimized, 26%w.r.t only GPU
I We also devised a methodology to develop energy consumption models
Future Works:
I Enhance (time) model: consider the case in which threads are morethan physical units
I Develop other models: study other data-parallel patterns (reduce,scan)
I Practical usage: development of an autonomic manager forminimizing energy consumption
Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 22 / 23
![Page 109: Energy Models in Data Parallel CPU/GPU Computations](https://reader034.fdocuments.us/reader034/viewer/2022051504/58f024671a28abee618b45e3/html5/thumbnails/109.jpg)
Conclusions
I Structured parallel applications have similar energy footprint on CPUand GPU
I We used this similarity to develop accurate and efficient models forenergy prediction
I Using the models we can save 14% energy w.r.t time optimized, 26%w.r.t only GPU
I We also devised a methodology to develop energy consumption models
Future Works:
I Enhance (time) model: consider the case in which threads are morethan physical units
I Develop other models: study other data-parallel patterns (reduce,scan)
I Practical usage: development of an autonomic manager forminimizing energy consumption
Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 22 / 23
![Page 110: Energy Models in Data Parallel CPU/GPU Computations](https://reader034.fdocuments.us/reader034/viewer/2022051504/58f024671a28abee618b45e3/html5/thumbnails/110.jpg)
Conclusions
I Structured parallel applications have similar energy footprint on CPUand GPU
I We used this similarity to develop accurate and efficient models forenergy prediction
I Using the models we can save 14% energy w.r.t time optimized, 26%w.r.t only GPU
I We also devised a methodology to develop energy consumption models
Future Works:
I Enhance (time) model: consider the case in which threads are morethan physical units
I Develop other models: study other data-parallel patterns (reduce,scan)
I Practical usage: development of an autonomic manager forminimizing energy consumption
Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 22 / 23
![Page 111: Energy Models in Data Parallel CPU/GPU Computations](https://reader034.fdocuments.us/reader034/viewer/2022051504/58f024671a28abee618b45e3/html5/thumbnails/111.jpg)
Conclusions
I Structured parallel applications have similar energy footprint on CPUand GPU
I We used this similarity to develop accurate and efficient models forenergy prediction
I Using the models we can save 14% energy w.r.t time optimized, 26%w.r.t only GPU
I We also devised a methodology to develop energy consumption models
Future Works:
I Enhance (time) model: consider the case in which threads are morethan physical units
I Develop other models: study other data-parallel patterns (reduce,scan)
I Practical usage: development of an autonomic manager forminimizing energy consumption
Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 22 / 23
![Page 112: Energy Models in Data Parallel CPU/GPU Computations](https://reader034.fdocuments.us/reader034/viewer/2022051504/58f024671a28abee618b45e3/html5/thumbnails/112.jpg)
Conclusions
I Structured parallel applications have similar energy footprint on CPUand GPU
I We used this similarity to develop accurate and efficient models forenergy prediction
I Using the models we can save 14% energy w.r.t time optimized, 26%w.r.t only GPU
I We also devised a methodology to develop energy consumption models
Future Works:
I Enhance (time) model: consider the case in which threads are morethan physical units
I Develop other models: study other data-parallel patterns (reduce,scan)
I Practical usage: development of an autonomic manager forminimizing energy consumption
Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 22 / 23
![Page 113: Energy Models in Data Parallel CPU/GPU Computations](https://reader034.fdocuments.us/reader034/viewer/2022051504/58f024671a28abee618b45e3/html5/thumbnails/113.jpg)
Conclusions
I Structured parallel applications have similar energy footprint on CPUand GPU
I We used this similarity to develop accurate and efficient models forenergy prediction
I Using the models we can save 14% energy w.r.t time optimized, 26%w.r.t only GPU
I We also devised a methodology to develop energy consumption models
Future Works:I Enhance (time) model: consider the case in which threads are more
than physical units
I Develop other models: study other data-parallel patterns (reduce,scan)
I Practical usage: development of an autonomic manager forminimizing energy consumption
Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 22 / 23
![Page 114: Energy Models in Data Parallel CPU/GPU Computations](https://reader034.fdocuments.us/reader034/viewer/2022051504/58f024671a28abee618b45e3/html5/thumbnails/114.jpg)
Conclusions
I Structured parallel applications have similar energy footprint on CPUand GPU
I We used this similarity to develop accurate and efficient models forenergy prediction
I Using the models we can save 14% energy w.r.t time optimized, 26%w.r.t only GPU
I We also devised a methodology to develop energy consumption models
Future Works:I Enhance (time) model: consider the case in which threads are more
than physical unitsI Develop other models: study other data-parallel patterns (reduce,
scan)
I Practical usage: development of an autonomic manager forminimizing energy consumption
Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 22 / 23
![Page 115: Energy Models in Data Parallel CPU/GPU Computations](https://reader034.fdocuments.us/reader034/viewer/2022051504/58f024671a28abee618b45e3/html5/thumbnails/115.jpg)
Conclusions
I Structured parallel applications have similar energy footprint on CPUand GPU
I We used this similarity to develop accurate and efficient models forenergy prediction
I Using the models we can save 14% energy w.r.t time optimized, 26%w.r.t only GPU
I We also devised a methodology to develop energy consumption models
Future Works:I Enhance (time) model: consider the case in which threads are more
than physical unitsI Develop other models: study other data-parallel patterns (reduce,
scan)I Practical usage: development of an autonomic manager for
minimizing energy consumption
Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 22 / 23
![Page 116: Energy Models in Data Parallel CPU/GPU Computations](https://reader034.fdocuments.us/reader034/viewer/2022051504/58f024671a28abee618b45e3/html5/thumbnails/116.jpg)
Conclusions
I Structured parallel applications have similar energy footprint on CPUand GPU
I We used this similarity to develop accurate and efficient models forenergy prediction
I Using the models we can save 14% energy w.r.t time optimized, 26%w.r.t only GPU
I We also devised a methodology to develop energy consumption models
Future Works:I Enhance (time) model: consider the case in which threads are more
than physical unitsI Develop other models: study other data-parallel patterns (reduce,
scan)I Practical usage: development of an autonomic manager for
minimizing energy consumption
Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 22 / 23
![Page 117: Energy Models in Data Parallel CPU/GPU Computations](https://reader034.fdocuments.us/reader034/viewer/2022051504/58f024671a28abee618b45e3/html5/thumbnails/117.jpg)
Thank you!Questions?
Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 23 / 23