Optimal Dynamic Frequency Scaling for Energy - Performance...

Optimal Dynamic Frequency Scaling forEnergy - Performance of Parallel MPIPrograms

Jean-Claude Charr, Raphaël Couturier, Ahmed Fanfakh andArnaud Giersch

FEMTO-ST - DISC Department - AND Team

June 3th, 2014

Outline

1. Definitions and objectives

2. Energy and performance models

3. Performance and energy reduction trade-off

4. Experimental results and comparison

5. Conclusions and future works

DISC Department - AND Team 2 / 18

Definitions

• Modern processors provide Dynamic Voltage and FrequencyScaling (DVFS) technique.

• DVFS is used to reduce the frequency and thus to reduce theenergy consumption by a CPU while computing.

• But scaling the frequency to lower level reduces theperformance (execution time) of parallel program.

• Energy consumption for individual processor depends on twopower metrics: the static power Pstatic and the dynamic powerPdyn.


Definitions

• Pdyn = α · CL · V 2 · F .

• Pstatic = V · Ntrans · Kdesign · Ileak .

• Energy consumption by individual processor of a synchronousparallel program:Eind = Pdyn · TComp + Pstatic · (TComp + TComm).

• The frequency scaling factor is the ratio between the maximumand the new frequency, S = Fmax

Fnew.


Objectives• Study the effect of the scaling factor S on energy consumption

of parallel iterative applications such as NAS Benchmarks.

• Study the effect of the scaling factor S on performance of thesebenchmarks.

• Discovering the energy-performance trade-off relation whenchanging the frequency.

• We propose an algorithm for selecting the scaling factor Sproducing optimal trade-off between the energy andperformance.

• Improving Rauber and Rünger’s1 method that our method beston.

1Thomas Rauber and Gudula Rünger. Analytical modeling and simulation of the energy consumptionof independent tasks. In Proceedings of the Winter Simulation Conference, 2012.


Energy model for homogeneous platform

The dynamic power is exponentially related to the scaling factor Sand the static consumed energy is linearly related to this factor.

Rauber and Rünger’s energy model

E = Pdyn · S−21 ·

(T1 +

∑Ni=2

T 3i

T 21

)+ Pstatic · S1 · T1 · N

S1: is the max. scaling factor, TI : is the time of the slower task, Ti : isthe time of the other tasks and N: is the number of nodes.

Rauber and Rünger’s optimal scaling factor

Sopt =3

√2N · Pdyn

Pstatic·(

1 +∑N

i=2T 3

iT 3

1

)They reduce degradation of the performance by setting the highestfrequency to the slowest task.


Slack times of the sync. parallel program

Communication Time

Time

Barrier

Idle time

Barrier

Task 3

Task 2

Task 1

Task 4

Task N

(a) Sync. imbalanced communications

Computation Time

Time

Barrier

Idle time

Barrier

Task 3

Task 2

Task 1

Task 4

Task N

(b) Sync. imbalanced computations

ProgramTime = maxi=1,2,...,N Ti


Performance evaluation of MPI programs

Execution time prediction model

Tnew = TMaxCompOld · S + TMaxCommOld

1

1.2

1.4

1.6

1.8

1 1.5 2 2.5 3

No

rma

lize

d t

ime

Frequency scaling factors

CG Class B

Normalized predicted timeNormalized real time

1.2

1.5

1.8

2.1

2.4

2.7

1 1.5 2 2.5 3N

orm

aliz

ed

tim

e


LU Class B

Normalized predicted timeNormalized real time

The maximum normalized error for CG=0.0073 (the smallest) andLU=0.031 (the worst).


Performance and energy reduction trade-off

0.5

1

1.5

2

2.5

1 1.5 2 2.5 3Norm

aliz

ed e

nerg

y a

nd e

xecution tim

e


Normalized execution timeNormalized energy

(c) Real relation.

0.4

0.5

0.6

0.7

0.8

0.9

1

1.1

1.2

1 1.5 2 2.5 3

Norm

aliz

ed e

nerg

y a

nd p

erf

orm

ance


Optimal scaling factor

Normalized performanceNormalized energy

(d) Converted relation.

Performance = 1execution time

Our objective function

MaxDist = maxj=1,2,...,F (

Maximize︷︸︸︷PNorm(Sj)−

Minimize︷︸︸︷ENorm(Sj))


Scaling factor selection algorithm

Enumerate the available scaling factors and find Soptimal forwhich PNorm − ENorm is maximal.

Where:

ENorm = EReducedEOriginal

=Pdyn·S−2

1 ·(

T1+∑N

i=2T3

iT2

1

)+Pstatic ·T1·S1·N

Pdyn·(

T1+∑N

i=2T 3

iT 2

1

)+Pstatic ·T1·N

PNorm = ToldTnew

=TMaxCompOld+TMaxCommOld

TMaxCompOld ·S+TMaxCommOld


Scaling factor selection algorithm

Algorithm characteristics• It works online.

• It predicts both the energy consumption and performance.

• It is simultaneously reduces the energy consumption andmaintaining performance of iterative algorithm.

• It takes into account the communication time.

• It is well adapted to imbalanced tasks. Fi =Fmax ·Ti

Soptimal ·Tmax

• It has a very small overhead. It takes 6.65 µs for 32 nodes.


Experimental results

• Our experiments are executed on the simulator SimGrid/SMPIv3.10.

• Our algorithm is applied to NAS parallel benchmarks.

• Each node in the cluster has 18 frequency values from 2.5GHzto 800MHz.

• We run the classes A, B and C on 4, 8 or 9 and 16 nodesrespectively.

• The dynamic power with the highest frequency is equal to 20 Wand the power static is equal to 4 W .


Experimental results

0.2

0.4

0.6

0.8

1

1.2

1 1.5 2 2.5 3

No

rma

lize

d e

ne

rgy a

nd

pe

rfo

rma

nce


Optimal scaling factor=1.04

EP Class C Normalized performanceNormalized energy

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

1.1

1.2

1 1.5 2 2.5 3

No

rma

lize

d e

ne

rgy a

nd

pe

rfo

rma

nce



CG Class C Normalized performanceNormalized energy

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

1.1

1.2

1 1.5 2 2.5 3

No

rma

lize

d e

ne

rgy a

nd

pe

rfo

rma

nce



BT Class C Normalized performanceNormalized energy

CG MG EP LU BT SP FT0

5

10

15

20

25

30

35

40

45

39.23

34.97

22.14

35.83

29.6

33.4834.72

14.88

21.69 20.7322.49

21.28 21.3519

Energy Saving % Performance Degradation %

NAS Benchmarks Class C


Results comparison


5

10

15

20

25

30

35

40

45

Energy Saving %Performance Degradation %

Our Method for NAS benchmarks Class C


10

20

30

40

50

60

Energy Saving %Performance Degradation %

Rauber and Rünger for NAS benchmarks Class C


Results comparison

EP CG MG BT LU SP FT

-20

-15

-10

-5

0

5

10

15

20

25

30Rauber E-P EPSA

Comparing our method with Rauber and Rünger method for NAS benchmarks class C

En

ergy

to

Per

form

ance

Dis

tan

ce


Conclusions

• We have presented a new online scaling factor selection methodthat optimizes simultaneously the energy and performance.

• It predicts the energy consumption and the performance ofthe parallel applications.

• Our algorithm saves more energy when the communicationand the other slacks times are big.

• It gives the best trade-off between energy reduction andperformance.

• Our method outperforms Rauber and Rünger’s method interms of energy-performance ratio.


Future works

• We will apply the proposed algorithm to a heterogeneousplatform.

• While the nodes of a heterogeneous platform are different in:

- Dynamic and static power.- Individual energy consumption.- The available frequencies.- Performance capabilities.

• We will apply the proposed algorithm to a real cluster.

• We will apply the proposed algorithm to a real applications.


Thanks for Listening

To appear

This work will be appear in ISPA conference proceedings,August 2014

Questions?


Optimal Dynamic Frequency Scaling for Energy - Performance...

Documents

Transcript of Optimal Dynamic Frequency Scaling for Energy - Performance...