1570175193

4

Click here to load reader

description

hh

Transcript of 1570175193

Page 1: 1570175193

The Impact of On-Chip Interconnection Noise on Dynamic Thermal Management Efficiency

Abstract— Dynamic Thermal Management (DTM) emerged as a solution to address the reliability challenges with thermal hotspots and unbalanced temperatures. DTM efficiency is highly affected by the accuracy of the temperature information presented to the DTM manager. This work aims to investigate the effect of inaccuracy caused by the deep sub-micron (DSM) noise during the transmission of temperature information to the manager on DTM efficiency. A simulation framework has been developed and results show up to 62% DTM performance degradation under DSM noise. The finding highlights the importance of further research in providing reliable on-chip data transmission in DTM application.

Keywords—DTM; DSM noise; performance.

I. INTRODUCTION

In deep sub-micron (DSM) technologies with high power and temperature densities, dynamic thermal management (DTM) techniques is needed to maintain safe temperature levels during execution [1, 2]. DTM techniques usually rely on the temperature information obtained directly from on-chip thermal monitors or indirect techniques which estimate the temperature based on the power consumption of functional units [3,4]. DTM manager regulates the operating temperature based on the provided temperature information from thermal monitors that is transmitted using bus interconnection in single core or network on chip (NoC) in multi cores systems [5,6].

Accuracy of thermal data is the key feature in DTM and directly affects its efficiency [7]. Inaccurate temperature profile lower than the actual temperature can result in late activation of DTM techniques, which may result in physical damage. On the other hand, inaccurate temperature profile higher than actual temperature can result in early activation of DTM, which degrades system performance [4].

Temperature sensing inaccuracies are caused by variety of factors including monitor placement, monitor device imprecision and interconnection DSM noise. Optimized allocation of thermal monitors is an open problem that aims to assign monitors efficiently to cover the different hotspots [8]. Monitor device imprecision are due to inaccurate calibration, supply voltage fluctuations, process variation and others [9].

The last factor that impacts the thermal data accuracy is the interconnection DSM noise. DSM noise can considerably

affect the thermal data transmitted from monitors to DTM manager over bus interconnect or NoC. Interconnection DSM noises include IR drops, capacitive and inductive crosstalk, charge sharing, charge leakage, and process variations [10]. DSM noises are increasing with technology scaling and achieving reliable transmission is a major challenge in future technology nodes. This work proposes a simulator framework to investigate the impact of interconnection DSM noise on temperature data accuracy and DTM efficiency. To the best of our knowledge, this issue has not been investigated yet.

Section II describes the investigation methodology and introduces the proposed simulator. Section III discusses the results and findings. Section IV summarizes the paper and draws conclusions.

II. INVESTIGATION METHODOLOGY

The increasing of noise in DSM technology results in the unreliable data transmission through the on-chip interconnections. Hence, accurate thermal data collected from the monitors are prone to faults when it reaches the DTM manager. DTM efficiency is highly dependent on accurate input thermal data, as system performance slowdown increases in consequence of unnecessary invokes. In addition, inaccurate temperature profile may wrongly report a unit in emergency temperature as a normal one that results in leaving this unit unattended for several cycles.

To investigate the effects of DSM noise on DTM efficiency, we proposed a simulator framework named DSM-DTM simulator that is composed of power, temperature, interconnection with DSM noise and DTM simulators as illustrated in Fig.1.

The interconnection simulator transports the thermal data from the thermal monitors to the DTM manager via a parallel interconnection bus and DSM noise block injects noise modeled in Gaussian noise model [11-13] to the bus [14]. The inaccuracy of thermal data impacted by DSM noise is quantified through the Bit Error Rate (BER) that is computed using the following equation:

(1)

Page 2: 1570175193

(2)

where the VSW and σN are the voltage swing and noise deviation respectively [15].

Figure 1 Proposed DSM-DTM simulator consist of power, temperature,

interconnection DSM noise and DTM simulator.

0

10

20

30

40

50

60

70

80

90

100

0 1 2 3 4 5 6 7 8

Temperature

Intervals

ReN

IFU‐L

LSU‐Top

Icache

Dcache

FPU

ALU

EXE BR

BER=1E‐02

Sensor Location

(a)

0

20

40

60

80

100

120

140

160

180

200

0 1 2 3 4 5 6 7 8

Temperature

Intervals

ReN

IFU‐L

LSU‐Top

Icache

Dcache

FPU

ALU

EXE BR

BER=1E‐02

Sensor Location

(b) Figure 2 (a) Temperature profile without DSM noise (reference) and (b)

temperature profile under the effect of DSM noise

DTM simulator block implements DTM manager, which

dynamically control the chip temperature, based on DTM policy given the input thermal data. Different DTM policies have been proposed such as dynamic voltage frequency scaling (DVFS) [16], task scheduling [17], fetch toggling [18], clock gating [19], and task migration [20].

Temperature simulator block models the thermal data from monitors based on the chip floorplan and power consumption of each units inside the chip. In our proposed framework, HotSpot version 2.0 is used to generate the chip’s temperature profile [6,22]. The Wattch simulator [23] inside the power simulator provides cycle-accurate dynamic power based on the chip’ activity.

In order to evaluate the impact of DSM noise on DTM efficiency, two thermal profiles are considered; temperature profile without DSM noise (i.e. reference) and temperature profile under the effect of DSM noise as shown in Fig 2, for 10 intervals of test-direct benchmark with noise deviation of

0.18V. Two metrics are used namely system performance

slowdown, SPSL and percentage of unattended cycles in emergency temperature define as following:

executiontimewithDSMnoise

executiontimewithoutDSMnoise (3)

System performance slowdown is caused by increase in execution time due to the temperatures below the threshold reported as emergency temperatures that lead to unnecessary invoke of DTM. Meanwhile, the unattended cycles in emergency temperature is caused by late activation of DTM techniques due to inaccurate temperature profile lower than the actual temperature which may result in physical damage because DTM manager is unaware of the difference "or errors" and assumes the received data is error free.

III. EXPERIMENTAL RESULTS

In this work, our experimental results are based on a single core processor with configuration parameters shown in Table I, similar to [21,25]. Standard benchmarks from Wattch simulator are executed on the processor chip [26]. A simple dynamic frequency scaling (DFS) has been chosen as a DTM policy. DFS is a reactive technique that is activated only when an emergency temperature is reached [2]. We choose 85°C as the emergency temperature, similar to [1,21]. For implementing the DFS, we assume two built-in frequency settings [1]. All jobs run at full speed ( ) unless an emergency temperature value is observed. If a unit temperature reaches the critical value, the frequency level of the particular unit is reduced to the lower setting ( ) until the current task terminates. In our experiments, is half of

. In addition, a transition penalty of 10μs is considered for every changed in frequency [3].

Table I Design parameters for modeled processor Technology Parameters

Process Technology 90 nm Supply Voltage 1.0 V Clock Rate 3.6 GHz

Core Configuration Reservation Stations Mem/Int queue (2x20), FP queue (2x5)Functional Units 2 FXU, 2 FPU, 2 LSU, 1 BXU Physical Registers 120 GPR, 108 FPR, 90 SPR

Branch Predictor 16K-entry bimodal,16K-entry gshare,16K-entry selector

Memory Hierarchy L1 Dcache 32 KB, 2-way, 128 byte blocks, 1-cycle latency L1 Icache 64 KB, 2-way, 128 byte blocks, 1-cycle latencyL2 cache 4 MB, 4-way LRU, 128 byte blocks, 9-cycle latencyMain Memory 100-cycle latency

DFS Parameters Transition penalty 10 μsMinimum freq scale 50%

In this work, only interconnection noise are considered

while other sources of inaccuracies including monitor placement and monitor device imprecision are omitted. In modelling the on-chip interconnect noise, this work considers noise deviation, σ range from 0.10V to 0.18V [12, 27, 28]. With voltage swing, of 1V, the BER varying between 10 and 10 . Sensor data of 12 bits [29] captured every 10 clock cycles. A dedicated bus of 24-bit size similar to [30] is

Page 3: 1570175193

used to transport the thermal data; with 12 bits data and 12 bits for address and control signals.

The results present system performance slowdown and percentage of unattended cycles in emergency temperature for the “hot” and “warm” benchmarks for DFS technique. “Anagram” and “test-arg” are considered as hot benchmarks and the rest are warm benchmarks. No cold benchmarks are used since they are unaffected by DTM [7].

1.0

1.1

1.2

1.3

1.4

1.5

Slowdown

Noise Deviation

0.10

0.12

0.14

0.16

0.18

No.Sensor=8

fscale=50%Benchmarks

Figure 3 System performance slowdown, SPSL as a function of noise

deviation, σN for different benchmarks.

Fig.3 shows the system performance slowdown as a function of noise deviation. In this experiment, thermal information of 8 monitors located across the chip are collected and sent to DTM manager. It is observed that the increase in σN results in further slowdown of the system and reaches up to 41% in worst case noise conditions (i.e. σN =0.18V).

Fig.4 illustrates the percentage of unattended cycles in emergency temperature during execution time as a function of noise deviation. It is observed that the number of unattended cycles in emergency temperature is higher in larger σN. The percentage reaches to 46% in worst case noise conditions (i.e. σN=0.18V).

0%

10%

20%

30%

40%

50%

% Unattended

 cycles in

emergency tem

perature

Noise Deviation

0.10

0.12

0.14

0.16

0.18

No.Sensor=8fscale=50%

Benchmarks

Figure 4 Percentage of unattended cycles in emergency temperature as a

function of noise deviation, for different benchmarks.

As illustrated in Fig.3 and Fig.4, the system performance slowdown is up to 33% more in warm benchmarks than the hot benchmarks while the percentage of unattended cycles in emergency temperature is up to 39% more in hot benchmarks than the warm ones. This is due to the fact that hot benchmarks have more cycles in emergency temperature, when affected by DSM noise, the number of cycles that are in emergency temperature but reported as normal one is higher than the warm benchmarks. Meanwhile the number of cycles in non-emergency temperature is higher in warm benchmarks

than the hot benchmarks, when affected by DSM noise, the number of cycles in non-emergency temperature that reported as emergency temperature and lead to unnecessary invoke of DTM is more in warm benchmarks as compared to hot benchmarks.

1.00

1.10

1.20

1.30

1.40

1.50

1.60

1.70

1.80

Slowdown

Benchmarks

8

16

Noise Deviation=0.18

fscale=50%

Number of 

sensors

Figure 5 Comparison of system performance slowdown for a processor chip

with 8 and 16 Thermal Monitors

Fig.5 and Fig.6 show the comparison of system performance slowdown and percentage of unattended cycles in emergency temperature for a processor with 8 and 16 thermal monitors located across the chip. Higher number of thermal monitors is used for a finer grain thermal management. In this experiment, worst case noise deviation of 0.18V is considered as to have all the benchmarks affected. It can be seen that system performance degrades up to 62% and the percentage of unattended cycles in emergency temperature increase up to 69% with the increase of number of monitors. This is due to the fact that more number of interconnection links is affected by DSM noise and results in larger number of inaccurate thermal data presented to the DTM manager.

0%

10%

20%

30%

40%

50%

60%

70%

80%

% Unattended cycles in 

emergency temperature 

Benchmarks

8

16

Noise Deviation=0.18V

fscale=50%

Number of 

sensors

Figure 6 Comparison of unattended cycles in emergency temperature for a

processor chip with 8 and 16 Thermal Monitors

IV. CONCLUSION

This work investigates the effect of DSM noise on DTM efficiency in terms of percentage of unattended cycles in emergency temperature and system performance slowdown. In general, percentage of unattended cycles in emergency temperature increases and system performance further slowdown with increase of noise deviation. This DTM efficiency further deteriorates with increase number of monitors on the processor chip. This highlights the significant effect of interconnection DSM noise in degrading the DTM

Page 4: 1570175193

efficiency and expected to worsen in multicore system on chip in advance technology node.

REFERENCES [1] A. K. Coskun, T. S. Rosing, and Keith Whisnant. “Temperature Aware

Task Scheduling in MPSoCs,” In Proc. Design Autom. and Test in Europe (DATE) , Apr. 2007, pp. 1659-1664.

[2] S. Gunther, F. Binns, D. Carmean, and J. Hall. Managing the impact of increasing microprocessor power consumption. Intel Technology Journal, 2001.

[3] D. Brooks and M. Martonosi. Dynamic thermal management for high performance microprocessors. In Proceedings of the Seventh International Symposium on High-Performance Computer Architecture, pages 171–82, Jan. 2001.

[4] S. Sharifi and T. S. Rosing, “Accurate Direct and Indirect On-Chip Temperature Sensing for Efficient Dynamic Thermal Management, ” TCAD, 2010.

[5] M. S. Floyd, et al., “System Power Management Support in the IBM Power6 Microprocessor,” IBM Journal of Research and Development, vol. 51, pp. 733-746, Nov 2007.

[6] Bart Vermeulen and Kees Goossens. A Network-on-Chip Monitoring Infrastructure for Communication-centric Debug of Embedded MultiProcessor SoCs. In Proc. Int’l Symposium on VLSI Design, Automation and Test (VLSI-DAT), 2009.

[7] K. Skadron, M. Stan, W. Huang, S. Velusamy, K. Sankaranarayanan, and D. Tarjan, “Temperature aware microarchitecture,” in Proc. of the Int. Symp. on Computer Architecture, Jun. 2003, pp. 2–13.

[8] R. Mukherjee and S. O. Memik, “Systematic temperature sensor allocation and placement for microprocessors,” Proc. Design Automation Conference, Jul. 2006, pp. 542 – 547.

[9] E. Rotem, J. Hermerding, A. Cohen, and H. Cain, “Temperature measurement in the Intel® CoreTM Duo Processor,”, In Proc. of Int. Workshop on Thermal investigations of ICs, 2006. pp. 23-27.

[10] R. Hedge and N. Shanbhag, “Toward achieving energy efficiency in presence of deep submicron noise,” IEEE Transactions on VLSI Systems, pp. 379–391, vol. 8, no. 4, August 2000.

[11] L. Benini and G. De Micheli, Networks on Chips: Technology and Tools. San Francisco, CA, USA: Morgan Kaufmann, 2006.

[12] W. N. Flayyih, K. Samsudin, S. J. Hashim, F. Z. Rokhani. “Crosstalk-Aware Multiple Error Detection Scheme Based on Two-Dimensional Parities for Energy Efficient Network on Chip,” IEEE Transactions on circuits and systems, vol. 61, no. 7, July 2014

[13] D. H. K. Hoe, ‘‘The use of error correcting codes for nanoelectronic systems: Overview and future prospects,’’ in Proc. 45th Southeastern Symp. Syst. Theory, 2013, pp. 51–54.

[14] A. Abtahi Forooshani, F. Z. Rokhani, K. Samsudin, and S. A. Aziz, “A Process Variation Aware System-level Framework to Model On-chip Communication System in Support of Fault Tolerant Analysis,” Student Conference on Research and Development (SCOReD), pp.97-100, 2009.

[15] R. Hegde and N. R. Shanbhag, “Toward achieving energy efficiency in presence of deep submicron noise,” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 8, no. 4, pp. 379-391, 2000.

[16] S. Dighe, et al., “Within-die variation-aware dynamicvoltage- frequency-scaling with optimal core allocation and thread hopping for the 80-core TeraFLOPS processor”, IEEE Journal of Solid-State Circuits, 46(1):184–193, Jan. 2011.

[17] Jun Yang, Xiuyi Zhou, Marek Chrobak, Youtao Zhang, and Lingling Jin, “Dynamic Thermal Management through Task Scheduling,” ISPASS, pp.191-201, 2008.

[18] K. Skadron, T. Abdelzaher, and M. R. Stan. Control-Theoretic Techniques and Thermal-RC Modeling for Accurate and Localized Dynamic Thermal Management. In Proceedings of the Eighth International Symposium on High Performance Computer Architecture, pages 17–28, Feb. 2002.

[19] S. Gunther, F. Binns, D. M. Carmean, and J. C. Hall, “Methods and apparatus for thermal management of an integrated circuit die,” In Intel Technology Journal, Q1 2001.

[20] B Salami, M Baharani, H Noori, “Proactive task migration with a self-adjusting migration threshold for dynamic thermal management of multi-core processors,” The Journal of Supercomputing, 2014– Springer.

[21] J. Donald and M. Martonosi, “Techniques for Multicore Thermal Management: Classification and New Exploration,” In Proceedings of the 33th International Symposium on Computer Architecture (ISCA-33), 2006.

[22] W. Huang, M. R. Stan, K. Skadron, K. Sankaranarayanan, S. Ghosh, and S. Velusam, “Compact Thermal Modeling for Temperature-Aware Design,” In Proc. of the 41st Design Automation Conf., June 2004.

[23] D. Brooks, V. Tiwari and M. Martonosi, ”Wattch: A Framework for Architectural-Level Power Analysis and Optimizations”, Proc. ISCA, 2000, pp. 83–94.

[24] J. Donald and M. Martonosi, “Leveraging Simultaneous Multithreading for Adaptive Thermal Control,” In Proc. of the Second Workshop on Temperature-Aware Computer Systems, June 2005.

[25] Y. Li, E.Y. Li, D. Brooks, Z. Hu, and K. Skadron, “Performance, Energy, and Thermal Considerations for SMT and CMP Architectures,” In Proc. of the 11th Intl. Symp. on High-Performance Computer Architecture, Feb. 2005.

[26] http://www.eecs.harvard.edu/~dbrooks/wattch-form.html

[27] A. Ejlali, B. M. Al-Hashimi, P. Rosinger et al., “Performability/energy tradeoff in error-control schemes for on-chip networks,” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 18, no. 1, pp. 1-14, 2010.

[28] Q. Yu, and P. Ampadu, “Adaptive error control for nanometer scale NoC links,” IET Computers & Digital Techniques-Special issue on advances in nanoelectronics circuits and systems, vol. 3, no. 6, pp. 643-659, 2009.

[29] http://www.moortec.com/downloads/embedded-analog-temperature-sensor-ip.pdf

[30] H. Chiueh, J. Draper, and J. Choma, “A dynamic thermal management circuit for system-on-chip designs,” in Proc. 8th IEEE Int. Conf. Electron., Circuits Syst. (ICECS), Sep. 2001, pp. 577–580.