Evaluation of thermal-aware design prototype tools · Evaluation of thermal-aware design prototype...
Transcript of Evaluation of thermal-aware design prototype tools · Evaluation of thermal-aware design prototype...
Confidential
ICT-2009.3.2-248603-IP
Modelling, Control and Management of Thermal Effects in Circuits of the Future
WP no. Deliverable no. Lead participant
WP7 D7.3.1 NXP-NL
Evaluation of thermal-aware design prototype tools Prepared by M. Willemsen
Issued by THERMINATOR Project Office
Document Number THERMINATOR/D7.3.1/v1
Dissemination Level Confidential
Date 12/02/2013
© Copyright 2010-2013 STMicroelectronics, Intel Mobile Communication, NXP
Semiconductors, GRADIENT DESIGN AUTOMATION , MUNEDA, SYNOPSYS ,
BUDAPESTI MUSZAKI ES GAZDASAGTUDOMANYI EGYETEM , CSEM,
FRAUNHOFER , IMEC, CEA-LETI, OFFIS, Politecnico di Torino, ALMA MATER
STUDIORUM -Universita’ Di Bologna, ST-Polito.
This document and the information contained herein may not be copied, used or disclosed
in whole or in part outside of the consortium except with prior written permission of the
partners listed above.
THERMINATOR FP7 ICT – 2009.3.2 - 28603 D7.3.1
Confidential Page 2 12/02/2013
Document Title Evaluation of thermal-aware design prototype tools
Type Deliverable CO
Ref D7.3.1
Target version V1_1
Current issue V0_1
Status Released
File
Author(s) Wilhelm Moering (NXP-D), A. Calimera (POLITO), A. Macii (POLITO),
A. Timar (BME), A. Szalai (BME), G. Nagy (BME), P. Knocke (OFFIS),
S. Rosinger (OFFIS), V. Melikyan (SNPS-AM), A. Ripp (MUN), H.
Oprins (IMEC), S. Stoffels (IMEC), A.Bartolini(POLITO)
Reviewer(s) S. Holland (NXP-D), D. Rossi (UNIBO)
Approver(s) G.Gangemi (ST)
Approval date 12/02/2013
Release date 12/02/2013
Distribution of the release Dissemination level CO
Distribution list
History Rev. DATE Comment
0.1 22-01-2013 Initial version
1.0 30-01-2013 Revised version, approved by all partners
1.1 12/02/2013 Check and ship out
THERMINATOR FP7 ICT – 2009.3.2 - 28603 D7.3.1
Confidential Page 3 12/02/2013
References [1] Garrou, Ph., “Handbook of 3D-Integration: Technology and Applications of 3D
Integrated Circuits”, Wiley-VCH (Weinheim, 2008).
[2] Chanchani, R. “3D Integration Technologies – An Overview”, in Materials for
Advanced Packaging edited by D. Lu, C.P. Wong, Springer (2009), pp. 1-50.
[3] Beyne, E. “Through-Silicon via Technology for 3D IC” in Ultra-thin Chip
Technology and Applications, edited by J.N. Burghartz, Springer (2011).
[4] Marchal, P. et al., “3D technology roadmap and status”, Proc. IITC 2011, pp. 1-3.
[5] Gu, S. et al., “Stackable memory of 3D chip integration for mobile applications”,
Proc. IEDM 2008, pp 1-4.
[6] Brunschwiler, T.; Michel, B. ; "Thermal Management of Vertically Integrated
Packages," in Handbook of 3D Integration: Technology and Applications of 3D
Integrated Circuits, edited by P. Garrou, C. Bower and P. Ramm. Wiley-VCH
Verlag GmbH (Weinheim, 2008) Vol. 2, Part IV, pp. 635-649.
[7] Agonafer, D. et al., “Thermo-Mechanical Challenges in Stacked Packaging”, Heat
Transfer Engineering, Vol. 29 No. 2 (2008), pp. 134 – 148.
[8] J. Kim et al., “A 1.2V 12.8GB/s 2Gb mobile Wide-I/O DRAM with 4×128 I/Os
using TSV-based stacking,” in IEEE International Solid- State Circuits
Conference (ISSCC), Feb 2011.
[9] Alpha Company Ltd, “Passive heat sinks,” http://www.micforg.co.jp/en/cat
pass.html, 2012
[10] “Active heat sinks,” http://www.micforg.co.jp/en/cat fe.html, 2012.
[11] JEDEC Solid State Technology Association, JEDEC Standard JESD51-12:
Guidelines for Reporting and Using Electronic Package Thermal Information,
www.jedec.org, May 2005
[12] Intel, Ball Grid Array (BGA) Packaging, Intel Packaging Databook, Chapter
14, 2010
[13] Intel, Performance Characteristics of IC Packages, Intel Packaging Databook,
Chapter 4, 2010
[14] Intel, Physical Constants of IC Package Materials, Intel Packaging Databook,
Chapter 5, 2010
[15] Therminator Consortium, Deliverable D6.2.1: Framework overview for an all
level thermal simulation of 3D SiP stacks and 2D SoCs, 2010
[16] Therminator Consortium, Deliverable D6.2.2: Presentation and evaluation of
an all level thermal simulator of 3D SiP stacks and 2D SoCs, 2012
[17] Therminator Consortium, Deliverable D6.3.1: Specification and
standardization of the thermal aware design, optimization and exploration flow
and preliminary presentation of design techniques, 2010
[18] Therminator Consortium, Deliverable D6.3.2: Presentation and evaluation of
thermal-aware design techniques for 3D SiP stacks and 2D SoCs, 2012
[19] Therminator Consortium, Deliverable D6.3.3: Report on integration of the
individual optimization techniques, 2012
[20] Therminator Consortium, Deliverable D1.2.1: Specification of internal design
flows and environments and existing tool interfaces, 2010
[21] Therminator Consortium, Deliverable D1.3.1: Technical specification of test
cases and distribution to partners of concern, 2010
[22] Reef Eilers, Malte Metzdorf, Sven Rosinger, Domenik Helms, Wolfgang Nebel
Phase space based NBTI model Proc. of International Workshop on Power and
Timing Modeling, Optimization and Simulation (PATMOS), 2012
[23] Sven Rosinger, Malte Metzdorf, Domenik Helms, Wolfgang Nebel Behavioral-
THERMINATOR FP7 ICT – 2009.3.2 - 28603 D7.3.1
Confidential Page 4 12/02/2013
Level Thermal- and Aging-Estimation Flow Proc. Of 12th
Latin-American Test
Workshop (LATW), p. 1-6, 2011
[24] G. Gangemi, FP7-Funding Projects THERMINATOR, SMAC, MANON
Overview, MUGM MunEDA User Group Meeting 2012, October 2012, Munich,
Germany
[25] Z. Abbas, M. Olivieri, A. Ripp, G. Strube, M. Yakupov, Yield optimization for
low power current controlled current conveyor, SBCCI 2012, September 2012,
Brasília, Brazil
[26] A. Colaci, G. Boarin, A. Roggero, L. Civardi, C. Roma, A. Ripp, M. Pronath,
G. Strube: Systematic Analysis & Optimization of Analog/Mixed-Signal Circuits
Balancing Accuracy and Design Time, SBCCI 2011 Brazil, September 2011, Sao
Paolo, Brazil
[27] N. Seller, Optimization of a 2.133GHz level shifter in 28nm, MUGM
MunEDA User Group Meeting 2011, Munich, Germany
[28] U. Trautner, M. Pronath Synopsys Custom and Analog Mixed-Signal
Overview & MunEDA WiCkeD Integration, MUGM MunEDA User Group
Meeting 2010, Munich, Germany
THERMINATOR FP7 ICT – 2009.3.2 - 28603 D7.3.1
Confidential Page 5 12/02/2013
This page was intentionally left blank.
THERMINATOR FP7 ICT – 2009.3.2 - 28603 D7.3.1
Confidential Page 6 12/02/2013
Contents Document ................................................................................................................................... 2 Distribution of the release .......................................................................................................... 2 References .................................................................................................................................. 3 1 Introduction ........................................................................................................................... 9 2 Evaluation of thermal models for three-dimensional integrated circuits (IMEC) .............. 10
2.1 Introduction ................................................................................................................ 10 2.2 Technical results ........................................................................................................ 11
2.2.1 Evaluation of thermal accuracy for the DRAM-on-logic test vehicle .................. 11 2.2.1.1 Thermal test vehicle description ....................................................................... 11 2.2.1.2 Thermal model DRAM-on-Logic stack ............................................................ 12
2.2.1.3 Experimental model validation ......................................................................... 13 2.2.2 Evaluation of the thermal-aware design prototype tools ...................................... 16
2.2.2.1 Design ............................................................................................................... 16
2.2.2.2 Experiments and results .................................................................................... 17 2.2.2.3 Measurable objectives ....................................................................................... 20
2.3 Conclusions ................................................................................................................ 21
3 Evaluation thermal-aware synthesis and optimization tools (POLITO, together with SNPS-
AM) .......................................................................................................................................... 22 3.1 Introduction ................................................................................................................ 22
3.2 Benchmark description .............................................................................................. 22 3.3 Thermal-Aware Optimization Techniques ................................................................. 24
3.3.1 ITD-Aware Dual-Vth Assignment ....................................................................... 25
3.3.2 Tunable Clock Tree .............................................................................................. 26
3.4 Results on the testbench ............................................................................................. 28 3.4.1 Validating the ITD-Aware Dual-Vth Assignment ............................................... 28
3.4.2 Validating the Tunable Clock-Tree Methodology ............................................... 30 3.5 Conclusions ................................................................................................................ 31
4 Evaluation of thermal and aging aware optimization flow for two-dimensional systems on
chips (OFFIS, together with UNIBO/ST, and CEA-LETI) ..................................................... 32
4.1 Introduction to evaluation of high-level thermal and degradation estimation and
optimization ........................................................................................................................... 32 4.2 Technical results ........................................................................................................ 32
4.2.1 Evaluation of green-function based thermal estimation and optimization based on
use case 5 ............................................................................................................................ 32
4.2.1.1 Introduction to use case 5 motion detection design .......................................... 32 4.2.1.2 Custom ASIC hardware accelerators power and area determination ................ 32
4.2.1.3 Use case 5 IC package properties ..................................................................... 36 4.2.1.4 Evaluation of developed estimation/analysis and optimization flow and tools of
WP6 37 4.2.1.5 Thermal- and degradation-aware optimization evaluation ............................... 41
4.2.2 Evaluation of green-function based thermal estimation based on Genepy design
of CEA-LETI ...................................................................................................................... 42 4.3 Conclusions ................................................................................................................ 44
5 Verification of simulator engines (BME together with POLITO) ...................................... 45 5.1 Introduction ................................................................................................................ 45
5.1.1 The simulator engines developed at BME............................................................ 45 5.1.1.1 Logi-thermal simulation .................................................................................... 45 5.1.1.2 Electro-thermal simulation ................................................................................ 45
5.1.2 Measurable objectives .......................................................................................... 45
THERMINATOR FP7 ICT – 2009.3.2 - 28603 D7.3.1
Confidential Page 7 12/02/2013
5.2 Detailed description of the verification ...................................................................... 45
5.2.1 Verification of the logi-thermal simulator engine ................................................ 45 5.2.2 Comparison of the two logi-thermal simulator engines ....................................... 46 5.2.3 Real-world evaluation designs ............................................................................. 49
5.2.3.1 Ring oscillator containing 1000 inverter cells .................................................. 49 5.2.3.2 Test circuit from POLITO ................................................................................. 50
5.2.4 Verification of the electro-thermal engine ........................................................... 52 5.3 Conclusion ................................................................................................................. 59
6 Thermal effects in identification applications (NXP-D) ..................................................... 61 6.1 Introducion ................................................................................................................. 61 6.2 Technical results ........................................................................................................ 61
6.3 Conclusions ................................................................................................................ 66 The measurements of the NXP test chip and the results from Synopsys TCAD simulation
and modelling simulation tools match sufficiently (MO7.3.11). The characterization of the
diode-voltage over temperature is well in line with the theoretical expectation. The impact
of the encapsulation on the thermal behaviour with respect to the self-heating in silicon
could be demonstrated. .......................................................................................................... 66 7 Evaluation of simulation-based verification, optimization and RSM model generation
methodologies (MUN, together with NXP-D, and ST) ........................................................... 67 7.1 Introduction ................................................................................................................ 67
7.2 Technical results ........................................................................................................ 67 7.3 Conclusions ................................................................................................................ 69
8 Conclusions ......................................................................................................................... 70
9 Measurable objectives ......................................................................................................... 71
10 Publications and presentations ............................................................................................ 74
THERMINATOR FP7 ICT – 2009.3.2 - 28603 D7.3.1
Page 9
1 Introduction
The main objective of WP7 is to validate the models, design techniques, and tools developed
within the Therminator project. WP7 is divided into three tasks. In T7.1, the validation of
thermal models of new devices, materials, and technologies has been done. The focus in T7.1
is on the device level, and the level of elementary building blocks to be used in large(r)
circuits. The effectiveness and usability of design techniques has been addressed in T7.2. In
T7.2, larger building blocks, parts of circuits, and test chips are used as test cases. The final
task, T7.3, is to benchmark and demonstrate the effectiveness of the developed EDA tools. In
this task, the validation addresses test chips and prototypes.
The evaluation of the developed EDA tools is the main topic of T7.3, and is addressed in this
deliverable. The evaluation and demonstration activities are done with the tools developed in
WP3, 4, and 6 of Therminator. For demonstration, the test cases of WP1 and examples
provided by individual partners are used. The test cases come from different fields of the
semiconductor industry, e.g. digital, analog, and RF. The prototype tools of WP3, 4, and 6
have been developed by Research institutes, universities, and EDA vendors. Demonstration is
mainly done on examples provided by or of interest to the industrial partners, i.e. ST and
NXP-D. These collaborations within T7.3 have been very useful in creating an effective
innovation infrastructure, in which novel ideas from universities, Research institutes, and
EDA vendors are applied in test cases of interest for the European semiconductor industry.
Evaluation of the prototype tools in Therminator is done using typical figures-of-merit such as
ease of use, accuracy, and integration within existing flows. The effectiveness of these tools is
demonstrated in the form of improvements such as less temperature-sensitive designs, higher
yield, and reduction in design times. Specific examples of these improvements will be given
in this report.
More specifically, the thermal models for the three-dimensional integrated circuits of IMEC,
developed in WP6, are evaluated by demonstrating their accuracy and ease of use on a two-
layer DRAM logic stack in chapter 2. In chapter 3, POLITO applies their thermal-aware
synthesis tools, developed in WP3, to digital parts of an MCU provided by ST. Improvements
to temperature-induced delays and clock screw rotation obtained with their tool are reported.
Next, OFFIS shows results on thermal- and aging-aware optimization on an example provided
by UNIBO/ST in chapter 4. The OFFIS tool is also compared against measurement data
provided by CEA-LETI from test-case 4. BME validates the accuracy of their logi-thermal
simulation tool with respect to other tools and measurements in chapter 5. The impact of self-
heating from encapsulation in identification applications is addressed in chapter 6. NXP-D
compares measurements from their test chip with simulation. Finally, MUN demonstrates the
effectiveness of their tools developed in WP6 by demonstrating improvements in yield on a
test case provided by NXP-D, and improvements in design time on a test case from ST.
The results of T7.3 are made tangible in terms of measurable objectives. An overview of all of
these measurable objectives is given in chapter 9 of this report. In this chapter, the measurable
objectives are also linked to Therminator’s project objectives. The novelty of the work is
shown in chapter 10, where all of the output in terms of journal papers and conference
contributions is collected.
THERMINATOR FP7 ICT – 2009.3.2 - 28603 D7.3.1
Page 10
2 Evaluation of thermal models for three-dimensional integrated
circuits (IMEC) 2.1 Introduction
Three-dimensional (3D) integration of integrated circuits is considered a promising
technology for circuit design. It allows decreasing the form factor of today's systems and
eases the interconnect performance limitation. And it makes it possible to interconnect
multiple stacked dies, made in different process technologies [1],[2]. The cornerstones of this
technology are through-Si vias (TSVs) and microbumps, for which process solutions,
reliability and design rules are now becoming available [3],[4]. One of the most likely
applications of 3D technology is the integration DRAM-on-logic [5]. Thermal management
issues are considered one of the main potential showstoppers for 3D-integration [6],[7]. In
WP6, innovative methodologies for thermal modelling (T6.2) and thermal-aware design
optimization (T6.3) for systems and packages have been developed. In this deliverable, those
developed automated thermal-aware design capabilities are evaluated for a real demonstrator
using 3D-TSV technology. The test case selected for this evaluation is a 2 layer DRAM-on-
logic stack in a BGA package (test case 6).
Innovation metric:
Evaluation of one integrated tool allowing early system floor planning and exploration of
many system and physical options and their impact on thermal behaviour.
This is a novel thermal-aware design optimization tool that allows:
avoidance hot-spots and delay degradation,
electro-thermal coupling
reliability modelling
mechanical stress reduction
Measurable objective:
Thermal modelling accuracy within 5% with thermal measurements (MO7.3.1)
Ease of use of the design flow (MO7.3.2)
Speed of the complete design flow, that is from RTL to virtual layout (MO7.3.3)
Accuracy of the design flow within 15% (MO7.3.4)
Selected test case
2 layer DRAM-on-logic stack in a BGA package (test case 6)
Deliverable content
The evaluation of the thermal-aware design optimization of task T6.3 for the DRAM-on-logic
test case consists of two parts. In the first part (section 2.2.1), the thermal accuracy of the
underlying thermal models is evaluated for the packaged DRAM-on-Logic stack. In the
second part (section 2.2.2), the overall design flow is evaluated using the design of an
OpenSPARC processor at RTL level and of a wide IO DRAM chip to demonstrate the
capabilities of the tool chain and evaluate the speed of the flow.
THERMINATOR FP7 ICT – 2009.3.2 - 28603 D7.3.1
Page 11
2.2 Technical results 2.2.1 Evaluation of thermal accuracy for the DRAM-on-logic test vehicle
2.2.1.1 Thermal test vehicle description
This deliverable reports the validation of the thermal models developed in T6.2, using a
packaged DRAM on logic stack. Such a package is schematically shown in Figure 1 (a). A
heterogeneous DRAM-on-logic chip stack is designed and fabricated to assess the technology
and design challenges for 3DIC applications. The logic die, with a thickness of 25µm, is
manufactured using 130nm technology in which Cu TSVs are integrated [4]. They have a
diameter of ø=5µm and a height of 25µm. The backside of the thinned wafer consists of a
thick polymer layer, serving as a passivation layer, a 10µm-pitch backside redistribution layer
(RDL). On top of the logic die, a thicker DRAM is stacked using TSVs and microbumps
50µm pitch. The standoff height between the logic and DRAM die is typically 13µm. A no-
flow underfill is used to cover the gaps between the micro-bumps. Figure 2 (left) shows the
stack of the DRAM and logic die and a detail of the CuSn microbump between the logic and
DRAM is shown in Figure 1(b). The entire stack is integrated into a FCBGA package
substrate, with the thinned die face down (Figure 3).
Figure 1. Left: Schematic cross section of the packaged DRAM on logic stack. The
arrows indicate the heat paths from the heat generation in the logic die to the cooling
solution; either through the top side of the package Qt or through the bottom side of the
package Qb. – Right: Schematic of the µbump geometry.
Figure 2. Left: Picture of the fabricated DRAM on logic stack – Center: Detail of the
thermal modules in the logic chip layout – Right: Detail of the layout of one heater
module on the logic test chip revealing the location of the 3 heaters and the 5 integrated
temperature sensors.
THERMINATOR FP7 ICT – 2009.3.2 - 28603 D7.3.1
Page 12
Figure 3. Left – BGA package containing the DRAM on logic stack – Right: test socket
for thermal measurements.
The logic chip contains test structures for monitoring thermo-mechanical stress and
temperatures in a 3D stack, electrostatic discharge hazards, electrical characteristics of TSVs
and micro-bumps, fault models for TSVs, etc. Figure 2 (center) shows the layout of the logic
chip. As hotspots and the impact thereof on DRAM performance are a particular concern,
dedicated thermal test structures are integrated on the logic die to test the impact of hotspots
on DRAM refresh times. The thermal structures include resistive heaters and diodes as
temperature sensors and are grouped in 3 modules. Figure 2 (right) shows a detail of the
layout of the 3 thermal modules. Each module includes 5 temperature sensors and 3 heaters
with dimensions 50x50, 150x150 and 500x500µm2, mimicking logic switching. The heaters
have been made using BEOL resistors and are placed below the sensitive circuits of the
DRAM. Figure 2(right) shows the location of the logic temperature sensors in the center of
the heaters and in a corner of the larger heaters of 150 and 500µm. To assess the impact of the
CuSn microbumps on the heat transfer between the logic and DRAM, dummy microbumps
are added below the heaters modules 2 and 3, whereas no microbumps are present below
heater module 1
Table 1. Power dissipation in the heater modules during the experimental
characterization of the package DRAM on logic stack using the first experimental
configuration.
2.2.1.2 Thermal model DRAM-on-Logic stack
For the thermal experiment, the cooling is applied from the topside of the package, creating a
main heat flow path for the dissipated heat from the logic, through the DRAM to the external
heat sink. This case is representative for medium and high power applications. Since the
performance of the memory technology degrades rapidly above 105ºC, the additional hot spot
power dissipation in the logic is superimposed to a background temperature of 85ºC applied
the package. In this way, high local temperatures are created in the logic and the impact
thereof on the DRAM temperature can be evaluated. This test condition is realized by putting
Power dissipation Total power
Scenario W W/mm2 W W/mm2 W W/mm2
H1-50 H1-50 H1-150 H1-150 H1-500 H1-500
Heater Module 1, 9V 0.027 10.88 0.291 12.93 0.632 2.53 0.95
Heater Module 1, 11V 0.027 10.88 0.417 18.53 0.905 3.62 1.35
Heater Module 2, 9V 0.027 10.88 0.264 11.73 0.562 2.25 0.85
Heater Module 2, 11V 0.027 10.88 0.374 16.62 0.814 3.26 1.22
Heater Module 3, 9V 0.04 15.84 0.251 11.16 0.511 2.04 0.80
Heater Module 3, 11V 0.04 15.84 0.349 15.51 0.738 2.95 1.13
THERMINATOR FP7 ICT – 2009.3.2 - 28603 D7.3.1
Page 13
the socket below an air streamer with a fixed temperature of 85ºC. During the test in this
configuration, one heater module is activated at a time. During the test all three heaters of that
module are powered with the same voltage. Table 1 shows the power dissipation and power
density values of all the heaters during an experiment with respectively 9V and 11V. The
experiment is repeated for each of the 3 modules. In this setup, the temperature is monitored
in the diodes of the logic die and at certain locations in the DRAM die, in the steady state
regime.
For the thermal simulations, the thermal model described in deliverable D6.2.2 is used. Figure
4 and Figure 5 show the results of the thermal model for the temperature distribution in the
logic and DRAM die respectively for the power dissipation specified in Table 1, the
modelling results are computed using the thermal compact developed in T6.2 and applied for
the 2 layer DRAM on logic stack.
Figure 4. Simulation results of the temperature distribution in the logic die.
Figure 5. Simulation results of the temperature distribution in the DRAM die.
2.2.1.3 Experimental model validation
Figure 6 and Figure 7 show the comparison between the modelling results and the
measurement results for the DRAM die and logic die respectively for the 2 different power
levels specified in Table 1. The results are shown for both the case with and without µbumps.
In Figure 8, the normalized temperature increase (normalized with respect to power
THERMINATOR FP7 ICT – 2009.3.2 - 28603 D7.3.1
Page 14
dissipation) is shown for the logic and DRAM die. The normalization allows comparing the
impact of the µbumps using the measurement results for slightly different power dissipations.
Figure 6. Comparison between the experimental (markers) and model results (solid
lines) in DRAM chip for heater module 1 without dummy CuSn bumps and heater
module 2 with CuSn bumps.
Figure 7. Comparison between the experimental (markers) and model results (solid
lines) in Logic chip for heater module 1 without dummy CuSn bumps and heater
module 2 with CuSn bumps.
85
90
95
100
105
110
0 2 4 6 8
Tem
per
atu
re (º
C)
Distance (mm)
Mod1-9V-EXP
Mod1-11V-EXP
Mod1-9V-CTM
Mod1-11V-CTM
85
90
95
100
105
110
0 2 4 6 8
Tem
per
atu
re (º
C)
Distance (mm)
Mod2-9V-EXP
Mod2-11V-EXP
Mod2-9V-CTM
Mod2-11V-CTM
85
105
125
145
165
185
205
225
0 0.5 1 1.5
Logi
c te
mp
erat
ure
(ºC
)
Distance from H50 center (mm)
Heater module 1
Exp 9VExp 11VFEM 9VFEM 11V
85
105
125
145
165
185
205
225
0 0.5 1 1.5
Logi
c te
mp
erat
ure
(ºC
)
Distance from H50 center (mm)
Heater module 2
Exp 9VExp 11VFEM 9VFEM 11V
d50 d150c d150 d500 d500cd50 d150c d150 d500c
THERMINATOR FP7 ICT – 2009.3.2 - 28603 D7.3.1
Page 15
Figure 8. Comparison between the experimental (markers) and calibrated model results
(lines) in the DRAM (left) and logic (right) for heater module 1 without dummy CuSn
bumps and heater module 2 with CuSn bumps.
From the data in Figure 6 and Figure 7, the accuracy of the thermal model can be evaluated.
Table 2 shows the difference between the maximum simulation and experimental results for
both DRAM and logic chip, with and without µbumps. The results of Table 2 demonstrate
that the measurable objective MO7.3.1 of modelling accuracy within 5% / 5ºC has been
achieved for the thermal models applied to the DRAM on logic test case.
Table 2. Relative difference between the modelling and experimental results for the
maximum DRAM and Logic temperature in the heater modules.
6
8
10
12
14
16
0 2 4 6 8
Ther
mal
resi
stan
ce (
ºC/W
)
Distance (mm)
µbump impact - DRAM temperature
Mod1-11V-EXP
Mod1-11V-CTM
Mod2-11V-EXP
Mod2-11V-CTM
0
20
40
60
80
100
120
0 0.5 1 1.5
Ther
mal
resi
stan
ce (º
C/W
)Distance from H50 center (mm)
µbump impact - Logic temperature
Mod1- EXP
Mod2 - EXP
Mod1 - CTM
Mod2 - CTM
d500d50 d150c d150 d500c
Tamb (ºC) T_exp (ºC) T_model (ºC) rel. error (%) Abs. error (ºC)
DRAM chip
Heater Module 1 85 99.8 100.4 -3.9 0.6
Heater Module 2 85 98.18 98.73 -4.0 0.55
Logic chip
Heater Module 1 85 207 202.12 4.2 -4.88
Heater Module 2 85 211 208.65 1.9 -2.35
THERMINATOR FP7 ICT – 2009.3.2 - 28603 D7.3.1
Page 16
2.2.2 Evaluation of the thermal-aware design prototype tools
In this section, we will:
First, shortly describe the design that will be used to perform the evaluation of the
design prototyping tools;
We will then discuss the experiments that have been performed and the results
obtained;
Finally, we will discuss the measurable objectives for this particular experiment.
2.2.2.1 Design
To demonstrate the design capabilities of the flow, we chose as working example a fairly
complex SoC: the OpenSPARC T2 processor. The RTL sources of this design are available
through open-source license (although not all the features are available, namely memories
etc.). Lot of supplementary information about this design (e.g. floor plan descriptions for
global physical placement constraints, absolute power values and power breakdowns among
components, etc.) is available in the literature.
The floor plan of the processor is shown on the Figure 9. The chip is built around 8 core
subsystems (with L1 and L2 tag memories) clustered in 2 times 4 core regions: upper and
bottom cluster. In the middle of the chip we can find the crossbar, the main communication
infrastructure of the chip. On lateral stripes we can find actual L2 data memories and buffers
as well as the corresponding memory controllers.
The floor plan shows also some peripheral devices, placed on the bottom and in the middle of
the chip.
Figure 9. OpenSPRAC T2 processor
Wide IO memory is modeled as black box. In this example we were using the existing
information of the Samsung wide IO memory, available in [8] and shown on Figure 10. Note
that this particular memory is not following a JEDEC standard, but the design could be very
easily adapted to accommodate whatever DRAM configuration one might want to explore,
including JEDEC spec.
THERMINATOR FP7 ICT – 2009.3.2 - 28603 D7.3.1
Page 17
Figure 10. Wide IO DRAM from [8].
To the OpenSPARC core we add a Wide IO DRAM controller, modeled as black box and
allowing logical/physical connection with the Wide IO DRAM.
2.2.2.2 Experiments and results
Obtaining design geometry
The design is first synthesized and then partitioned. In this particular case design partitioning
is fairly simple because there is only one entity to be moved on the top tier (WideIO DRAM).
After this operation, and by applying the appropriate scripts, we can extract all inter-die nets.
These nets will have, according to physical implementation scenario (in this case a memory
on the top of the logic, both dies oriented face down), an appropriate physical net model
attached. This model is mandatory to enable the correct physical interpretation of the net (this
model will be very different for the Silicon Interposer implementation for example).
Figure 11 shows the inter-die net model corresponding to the face-to-back integration of the
Wide IO DRAM on the top of the logic die. The pin on the logic die (e.g. the one from the
Wide IO DRAM controller) is connected through bottom die metal layers to the TSV, and
then the TSV is connected to the bottom-die bump-pin (actually ubump in this case). This
bump-pin is on the back-side of the bottom die and it is connected to the front bump-pin (also
ubump in this case) of the top die using back-side (redistribution) metallization layers
(RDLs). Note that this bump-pin is now on the front side (of the top, memory die in this case).
This bump-pin connects to the component of the top die through a certain number of the top-
die metal layers.
Figure 11. Inter-die net model for 3D-Stacked circuit: cross section (on the left) and the
appropriate physical model of the inter-die net (on the right)
After this stage we have 3D partitioned netlist of the design. This is illustrated in Figure 12,
where we show on the left, global physical constraints for the system (available in literature)
and on the right various system blocks (only the first level of the hierarchy is shown not to
clutter the figure).
TSVTier0/Die0
Tier1/Die1
U_A
U_B Tier0/Die0 Tier1/Die1
THERMINATOR FP7 ICT – 2009.3.2 - 28603 D7.3.1
Page 18
The design can be now floor planned and then placed and routed at standard cell level. The
Figure also indicates the connectivity analysis to understand correctly the wiring requirements
of the circuit.
Figure 12. Physical constraints defining placement regions (on the left) and 1st
hierarchical level blocks for the OpenSPARC T2 processor (on the right) showing also
the connectivity
After placement and routing we dispose of all circuit parameters to feed power extraction
process before final thermal simulation (Figure 13).
Figure 13. Placed and routed design: logic (left) Wide IO DRAM (right) and backside
routing (right)
Obtaining power
Once the design is placed and routed, we can proceed in establishing power dissipation values
of the various system sub-components.
The power information could be brought to the design using the following techniques:
1. Back-annotation — if power dissipation values are known, design modules (at
whatever abstraction level) can be annotated with absolute static and dynamic power
values. This information is then propagated throughout the flow and used at later
design stages.
2. Power estimation based on gate/flop count — the tool can be used to make an
estimate of both static/dynamic power dissipation components based on gate/flop
count and activity assumptions (using the gate/flop information from the technology
files).
3. Functional/power simulation and accurate annotation — if appropriate functional
test benches have been written and power simulation performed using some dedicated
power simulation tools, the accurate information of the switching activity can be
provided to the tool using standardized activity file formats (namely SAIF).
DRAM
bank0
l2d
l2b
l2d
l2b
mcu
l2d
l2b
l2d
l2b
c2 c3 c7 c8
c0 c1 c5 c4
l2d
l2d
l2b
mcu
l2d
l2b
l2d
l2b
f
s
r
f
s
r
l2t l2t l2t l2t
l2t l2t l2t l2t
sii ccx sio
rdpdmu
fsr macesr
TSV arrays
THERMINATOR FP7 ICT – 2009.3.2 - 28603 D7.3.1
Page 19
In the case of this particular experiment we used a combination of the first two techniques.
The third one has not been used since it can be very time consuming in data preparation and
simulation time. Anyhow, the choice of such method at this stage would be an overkill,
because the accuracy using SAIF files would be at the gate-level, while thermal simulations
that we will be using are much less accurate, typically 100x100um size.
Once the power simulation phase is completed, set of scripts developed for this experiment
are used to extract relevant geometry and power data from the design database. This data is
then formatted into appropriate file suitable for input to the Compact Thermal Model (CTM).
Thermal simulation
Once the input for the CTM, presented in D6.2.2, is ready we can, for a given stack
configuration, generate the thermal profile. The overall process of thermal simulation is
illustrated on Figure 14, where we show: a) power spatial distribution (top of the figure) of
the logic die (on the left) and the WideIO DRAM (on the right) and b) the corresponding
thermal spatial distribution.
Figure 14. Power distribution of the bottom (logic) and top (Wide IO DRAM) dies (top
of the figure), and corresponding thermal profiles
The above example indicates one design point, that is: for one power dissipation distribution
and stack configuration we generate one thermal profile. It is interesting to note that for a
given power distribution and the stack configuration, the temperature profile and the
maximum temperature in both dies will depend on the cooling solution.
Analysis of the thermal hotspots in the design indicates more elevated temperature in the zone
of the system crossbar. This is expected, because of the high power density in this area, on
both logic and memory dies. Also in the upper CPU cluster we also have a more elevated
temperature. This is because the boundary (mirror) effect is to be observed on circuit edges.
Since in the upper cluster the CPUs are actually closer to the boundary then the L2 tag
memories, the temperature in the upper CPU cluster will be higher then in the bottom CPU
cluster (see floor plan Figure 9).
THERMINATOR FP7 ICT – 2009.3.2 - 28603 D7.3.1
Page 20
In order to study the impact of the cooling solution choice, we select five different junction-
to-air thermal resistance values starting at 1.1°C/W and going down to 0.3°C/W in 0.2 steps
— that is 5 different thermal resistance values.
For each thermal resistance value we report the maximum temperature in the Wide IO
DRAM, because the maximum temperature in the stack will be limited with the maximum
temperature in the DRAM. This temperature can't exceed 90°C without serious deterioration
of the DRAM performance (DRAM refresh rate is proportional to the maximum temperature
— higher the temperature, higher the refresh rate is).
Figure 15. Maximum temperatures in the Wide IO DRAM die for 3D stacked circuit
By analyzing the results of this experiment, we can draw the following conclusions:
Because of the high total power dissipation in the logic die (around 80W, this is still
high-performance computing example), compared to the Wide IO DRAM (300mW)
the maximum temperature in the stack is quite high and requires forced cooling to
keep the temperature bellow 90°C.
Using the appropriate cooling solution, — with the thermal resistance bellow
0.5°C/W —, it is possible to keep the maximum Wide IO DRAM temperature bellow
90°C.
Although the above mentioned thermal resistance falls in the category of high-
performance cooling systems ([9][10]), we do not need very advanced cooling often
mentioned in the literature for the high-performance 3D stacked circuits.
2.2.2.3 Measurable objectives
Here is the list of measurable objectives for this task:
Ease of use (MO7.3.2) — the flow is simple to use: the adaptation of the existing
template scripts in the case of the OpenSPARC took 2 days to complete (counting
certain synthesis time necessary to see if the design set-up is correct). Note that the
design is described using couple of hundred of VHDL files, and some amount of
manual editing is required to build the design environment.
Speed (MO7.3.3) — the complete design flow (that is from RTL to virtual layout)
takes:
THERMINATOR FP7 ICT – 2009.3.2 - 28603 D7.3.1
Page 21
o Synthesis = 3h — this process understand the full design synthesis, not
counting memory macros. Note that for most of the prototyping purposes,
some of the design modules could be abstracted and the process could be
significantly faster.
o Place & Route = 1h — note that these steps are approximated: the placement
is not legalized and the routing uses approximate Steiner router. But as said
earlier on, at this stage this is more then enough.
o 1 thermal analysis point = few minutes — allowing many design points
calculations
Design flow accuracy (MO7.3.4) — Accuracy of the results obtained is ~15%. This
value is obtained by comparing design parameters after die prototyping and final
layout generation (for example area, max delay, power etc.). It is important to
understand that for design planning purposes this precision is more than enough.
2.3 Conclusions
This deliverable presents the evaluation of the integrated tool allowing early system floor
planning that was developed in task T6.3 and reported in deliverable D6.3.2. In the first part
of the deliverable, it is demonstrated that the measurable objective MO7.3.1 of model accuracy
within 5% for thermal models has been achieved for the thermal models applied to the
packaged DRAM-on-Logic test case (test case 6).
In the second part of this deliverable, the thermal-aware design prototype tools are evaluated
for a DRAM on logic design. It is demonstrated that the following measurable objectives have
been met:
Ease of use of the design flow (MO7.3.2)
Speed of the complete design flow (synthesis + place & route + thermal analysis).
(MO7.3.3)
The obtained design flow accuracy of the design parameters after prototyping is within
15% compared to the final layout generation. (MO7.3.4)
THERMINATOR FP7 ICT – 2009.3.2 - 28603 D7.3.1
Page 22
3 Evaluation thermal-aware synthesis and optimization tools
(POLITO, together with SNPS-AM) 3.1 Introduction
Purpose of this part of the deliverable is to assess the quality of the prototype thermal-aware
synthesis and optimization tools developed by POLITO in T3.3 and T3.4. These tools are
tested on the benchmark provided by ST in T1.3 of WP1 in Therminator. For the sake of
completeness and for ease the reading of the document, we include a brief section that
describes the benchmark and a section that recalls the thermal-aware optimization techniques
described in deliverables D3.3.1-D3.3.3 and D3.4.1-D3.4.3 of Therminator.
3.2 Benchmark description
The test-case provided by ST is a subset skeleton of a typical MCU suitable for a wide range
of applications such as motor drives, application control, medical and handheld equipment,
industrial applications, inverters, printers, etc. It includes several general purpose IPs (DMA,
I2C, Timers, USART, SPI, USB, I/Os, etc.) and all the interconnect infrastructure, plus
memory controllers for embedded Flash, embedded RAM and external NVM (Figure 16). ST
delivered a gtech representation of the overall design (generic technology, a technology
independent netlist, good enough to apply any kind of synthesis flow on it) without the analog
IPs, not relevant for the purpose of the test-case.
THERMINATOR FP7 ICT – 2009.3.2 - 28603 D7.3.1
Page 23
Figure 16: Test Case Block Diagram
All the external and internal pins are modelled in terms of input and output delays. The test-
case is made of a single voltage domain but support various power modes, depending on the
external power manager, and clock domain controls; a test aiding logic is also present as part
of the normal IP set of the MCU.
The main clock domain (see Figure 17) is a fast AHB one, running at full speed, pacing the
CPU, DMAs, embedded memories, external memory controllers and the Clock Controller
itself; this is interfaced to two APB domains (by means on two bridges) running at a lower
(ratio n) speed where all the remaining IP are instantiated.
THERMINATOR FP7 ICT – 2009.3.2 - 28603 D7.3.1
Page 24
Figure 17: Test Case Clock Tree
3.3 Thermal-Aware Optimization Techniques
The thermal-aware design methodologies developed within the work package WP3 target two
levels of abstraction, the Gate-level/RTL and the architectural level. At the gate-level
POLITO developed a dual-Vt assignment algorithm that guarantees temperature-insensitive
operation of the circuits together with a significant reduction of both leakage and total power
consumption (task T3.3), while at the architectural level, it implemented a design framework
for post-silicon compensation of thermally-induced delays on the clock distribution network
(task T3.4). Both the solutions have been integrated with standard EDA tools provided by
Synopsys within the Galaxy Implementation Platform, and in particular, Design Compiler for
dual-Vth synthesis and IC-Compiler for adaptive clock trees. The next two sections provide a
brief overview of the twos. SNSP-AM has given support to POLITO, according to MO7.3.8.
THERMINATOR FP7 ICT – 2009.3.2 - 28603 D7.3.1
Page 25
3.3.1 ITD-Aware Dual-Vth Assignment
It has been proven that CMOS technologies below the 65nm node, as those used in the
THERMINATOR project, suffer of Inverted Temperature Dependence (ITD). ITD manifests
on High-Threshold voltage cells (HVT cells), which show a delay reduction with temperature,
that is, they get faster as they warm up. This behaviour is in contrast with that of LVT cells,
which show a standard thermal dependence, that is, they get slower as they warm up.
The presence of ITD seriously complicates low-power design flows where dual-Vt
technologies are adopted to reduce leakage power consumption while keeping delay
overheads under control. The main limitation of standard synthesis approaches is that they do
not consider temperature as a direct variable in the optimization loop; in fact they follow a
more conservative approach in which cell libraries characterized under classical worst-case
temperature conditions (typically, 125°C). However, due to ITD effects on HVT gates, the
worst-case delay path may occur at the opposite corner, i.e., room-temperature conditions. By
ignoring such variables, standard synthesis tools can produce incorrect designs that do not
guarantee timing compliance over the full range of operating temperature.
To overcome this issue, POLITO implemented a new ITD-aware dual-Vt selection algorithm
(MO7.3.5) that achieves temperature-insensitive of digital circuit. With this solution, designers
are able to automatically synthesize circuits that meet the given timing constraints for all
allowable operating temperatures with a significant reduction of leakage power w.r.t. circuits
for which temperature-insensitivity is achieved by over-constraining the logic synthesis
process with thinner delay constraints.
Our proposed synthesis flow is illustrated in Figure 16. The flow has been set up with the
objective of achieving maximum compliance with existing commercial tools. We first
synthesize the target circuit using the nominal timing constraint Dnom and standard dual-Vt
libraries characterized at high-temperature (i.e., 125◦C). Synthesis at high temperature
guarantees worst-case parasitic extraction. We then estimate the worst-case delay of the
circuit at both 125◦C and 25
◦C using static timing analysis. Next, we re-synthesize the circuit
using the same libraries, but with a tighter timing constraint (i.e., D′nom= α·Dnom, with α < 1).
We choose α to be small enough such that the new circuit is timing compliant at both 125◦C
and 25◦C (i.e., worst case delay is less than Dnom). We note that the value of α is circuit
dependent and, the larger the ITD effect, the smaller the value of must be. At this point, we
have a solution which is compliant from a timing viewpoint, but it represents an upper-bound
for the leakage optimization problem. Using our temperature-insensitive dual-Vt assignment
algorithm, we attempt to recover some of this leakage power by searching for an optimal
threshold voltage assignment for all the cells in the circuit.
THERMINATOR FP7 ICT – 2009.3.2 - 28603 D7.3.1
Page 26
Figure 18: Temperature-Aware Dual-Vt Synthesis Flow.
The proposed dual-Vt assignment strategy, whose details are reported in deliverable D3.3.3, is
thus based on over-constraining the synthesis using a delay constraint smaller than the
nominal one; the difference with respect to a conventional over-constrained solution is that
temperature dependence is considered during the Vt assignment process.
3.3.2 Tunable Clock Tree
On chip operating temperature variations have a significant impact on the performance of
global interconnects. It is well known that high temperatures increase interconnect delays,
further degrading circuit performance. This is mainly due to the linear dependency that exists
between temperature and the electrical resistance of metal wires. Temperature-induced delay
variations on interconnects are extremely critical for clock distribution networks (CDNs)
which typically span the entire die crossing different thermal regions. It is well known that
devices working at different temperatures may show significant performance mismatches.
This induces different branches of the CDN to have unbalanced delays, that is, branches
crossing hot regions get slower, while those crossing cold regions get faster. The resulting
difference generates clock-skews which may vary, dynamically, depending on the workloads.
POLITO proposed a dynamically tunable clock tree architecture that self-adapts the delays in
the clock tree under time-varying thermal profiles. Tunability is achieved by means of
Tunable Delay Buffers (TDB), made up of pair of inverters with a set of capacitive loads in
between them. The loads are implemented using transmission gates and NMOS transistors
connected to them, which can be activated using the dedicated control signals. Each activated
control signals add the corresponding load to the critical path, thus achieving variable delays
in discrete steps. We devised a hardware mechanism that allows selecting the appropriate
tuning of clock buffers so as to thermally compensate the skew variations induced by a given
thermal gradient. Two are essential elements of the architecture:
A set of on-chip temperature sensors that detect thermal variations.
An hardware Thermal Management Unit (TMU) that translates this variation into the
proper tuning of the buffers for compensating the possible increase of the clock skew.
Figure 19 depicts the scenario showing the buffers and the TDBs (shown as buffers with cross
arrows), the sensors (shown as diamonds), and the TMU.
THERMINATOR FP7 ICT – 2009.3.2 - 28603 D7.3.1
Page 27
Figure 19: On-Line Skew Compensation Architecture.
Regarding sensors, we assume that their number and placement on the die is given.
Concerning the design of the TMU, it requires the identification of the number of buffers that
are tunable and the calculation of the amount of tuning required by each buffer. The TMU is
therefore designed based on data obtained from an off-line characterization step. The
characterization consists of the application of a set of thermal profiles (representing typical
operating conditions of the design); for each applied thermal profile, the optimal tuning values
are calculated solving different, independent instances of the optimization problem using a
software implementation of the algorithm. Once a reasonable number of profiles have been
applied, the various solutions thus obtained are merged using some criterion so to achieve an
overall set of tuning values. More implementation details can be found in the document
D3.4.2. In order to achieve minimum design overheads, the TMU has been physically
implemented by means of a lookup table filled with the tuning values computed for each of
the thermal profiles applied during the characterization process.
Figure 20 shows the conceptual architecture of the TMU, object of our validation. Upon
detection of a temperature variation, N sensors will route their corresponding readings
(properly encoded on a given number of bits) to the TMU. The input values will address the
corresponding entry of the table (one for each of the S thermal profiles), which contains a
tuning configuration of each of the m tunable buffers. The latter are then driven (properly
encoded on a given number of bits) to the required TDBs.
Figure 20: Architecture of the TMU
THERMINATOR FP7 ICT – 2009.3.2 - 28603 D7.3.1
Page 28
3.4 Results on the testbench 3.4.1 Validating the ITD-Aware Dual-Vth Assignment
The proposed temperature-insensitive dual-Vth technique has been applied and benchmarked
on the arithmetic logic unit of the industrial microcontroller proposed as fourth test-case in
D1.3.1.
The circuit was synthesized using Synopsys Design Compiler using the low-power
technology library provided by STMicroelectronics and consisting of HVT and LVT cells.
We used HSPICE to characterize the standard libraries. The characterization has been done
for LVT and HVT cells at the two boundary cases (25°C and 125°C), and for two other
intermediate temperatures (75°C and 105°C). This allowed us to obtain a complete overview
of the temperature-induced effects along the entire temperature range.
After obtaining the netlist of the over-constrained synthesized circuit, we performed the
temperature-insensitive dual-Vth assignment. Accurate timing and power analysis were
carried out using Synopsys Prime Time in which we annotated signal statistics of the internal
nodes obtained through a post-synthesis gate-level logic simulation.
We compare the outputs of four different synthesis flows in order to show the limitations of
standard synthesis tools, as well as to demonstrate the superior potential of the temperature-
insensitive design methodology w.r.t. to a simple over-constrained approach. The first two
design flows (syn-125C and syn-25C) use classical dual-Vth synthesis, where it was assumed
a single worst-case temperature condition, 125°C and 25°C respectively. The third one, syn-
oc-125, is the over-constrained case, where the circuit was synthesized using a timing
constraint smaller than the nominal one, and for a worst-case operating condition of 125°C.
Finally, temp-ins uses the proposed methodology. The target timing constraint is 9ns, as given
in the specification.
Figure 21 plots, for each synthesis flow, the length of the critical paths normalized w.r.t the
target timing constraint given by the specification. The timing analysis was done at room
temperature (blue bars), two intermediate temperatures (75°C – orange bars, and 105°C –
yellow bars), and 125°C (green bars). The first observation concerns the results of standard
synthesis tools. Using the syn-125C approach, the timing constraint is met at 125C by
construction, but when the circuit operates at room temperature, a timing violation occurs
(7.5% of Dnom). This is due to the fact that a path which was non-critical at 125°C, and then
mapped with a majority of HVT cells, becomes slower at 25°C, causing a timing fault. A
similar problem can occur if we synthesized the circuit at room temperature (the syn-25C
approach). Now the timing violation occurs at high temperature, where the low-threshold cells
are slower. An obvious solution for the timing-fault problem is to over-constraining the circuit
(syn-oc-125). In this case, since the syn-125C approach generates timing violations, we
performed a 125°C synthesis under a timing constraint equal to 0.93 ·Dnom. As shown in the
plot, the available slack increases, and we can assure that even if the path delays increase at
room temperature, they will still remain below the nominal delay Dnom. At 25°C the critical
path violates the dummy constraint of 0.93 ·Dnom, but still meets the true nominal delay of
Dnom. The main shortcoming of this approach is that an over-constrained circuit consumes
more area and dissipates more power. Our methodology (temp-ins in the figure) helps to
address the problem of timing violation due to the ITD effect. At each temperature the critical
path is below the nominal delay and the temperature induced delay variation is below 5%,
assuring timing compliance at any operating condition. Note that the proposed approach does
not eliminate the ITD effect, but gives an effective way to account for it during the synthesis
THERMINATOR FP7 ICT – 2009.3.2 - 28603 D7.3.1
Page 29
process. This result demonstrates the measurable objective MO7.3.6 “Comparison with circuits
obtained by traditional synthesis methodologies: Temperature induced Delay variation kept
below the variation reported in the simulation results in the range 25-125°C (less than 5%)”
Figure 21: Critical Path distribution over different operating temperatures
Figure 22 reports the leakage power (on the left) and total power consumption (on the right).
The over-constrained approach syn-oc has the largest leakage power dissipation at each of the
four temperatures considered. In the worst-case (125°C), the leakage overhead compared with
the syn-125C case is around 7% on average. In contrast, using the proposed approach, temp-
ins, leakage power is much smaller than all the other cases. For instance, at 125°C, we have a
leakage saving of around 29% w.r.t. the over-constrained case, and a 22% savings compared
to the syn-125C case. Total power dissipation is also reduced in the proposed dual-Vth flow.
As highlighted in the figure, the temp-ins solution shows total power dissipation that is, on
average, 4% lower than the syn-oc case. The nominal case syn-125C assures the minimum
energy consumption, but its functionality is not guaranteed in the entire temperature range.
Figure 22 Leakage power consumption (left) and total power consumption (right)
0.85 0.90 0.95 1.001.05
1.10
syn-25C
syn-125C
syn-oc-125C
temp-ins
125C
105C
75C
25C
normalized worst-case delay
nominal delay constraint
Timing faults
THERMINATOR FP7 ICT – 2009.3.2 - 28603 D7.3.1
Page 30
3.4.2 Validating the Tunable Clock-Tree Methodology
In order to test and validate the effectiveness of the proposed TMU design and optimization,
we applied the flow depicted in Figure 23.
Figure 23: Characterization Flow.
The clock tree description has been formatted using the DEF format obtained using the
Synopsys Galaxy Platform IC; delay and skew calculation is done after parasitics extraction
using the Standard Delay Format (SDF) file. The various thermal profiles have been
generated by applying different realistic workloads to the benchmark. Temperature has been
obtained by jointly using placement and power consumption information. First, we compute a
breakdown of power on a block-by-block basis. The term ``block'' refers to an entity in the
top-level RTL description. Area and placement information, together with switching power
distribution, were then fed into a logi-thermal simulator which produced a thermal map for
each different workload. Maps correspond to the thermal states. From a given profile, we
calculated the thermal-dependent delay to each sink; the nominal delay (with uniform
temperature distribution) was extracted from the SDF parasitics file generated during physical
design. Insertion delay values were then used to determine the skew constraints to be solved
by ILP package. We have interfaced this solver with Matlab, where all the experiments were
executed.
Figure 24: Normalized results for skew compensation.
Figure 24 reports the results we have obtained in terms of skew compensation normalized
w.r.t. the original skew of the clock-tree, namely, the one obtained performing timing analysis
under a flat temperature distribution. Bars labelled as Thermal Skew refer to the circuit
designed w/o any thermal-aware strategy, whereas those labelled as Compensated Skew refer
to the skew of the clock distribution network optimized with our methodology. In the plot,
bars above 1.0 represent a skew violation. As one can observe with the proposed adaptive
strategy we can substantially compensate the skew induced by uneven temperature
0.8 0.9 1.0 1.1 1.2
Thermal Map 1
Thermal Map 2
Thermal Map 3
Thermal Map 4
Compensated Skew
Thermal Skew
THERMINATOR FP7 ICT – 2009.3.2 - 28603 D7.3.1
Page 31
distributions, 14.4% on average, carrying it even below the nominal case. This result
demonstrates the measurable objective MO7.3.7 “Clock skew comparison prior and post
optimization 15% of clock skew reduction in the range 25-125°C”.
3.5 Conclusions
In this part of the deliverable, we have benchmarked the thermal-aware optimization
techniques and prototype tools developed within WP3. Results confirm that by applying the
proposed synthesis methodology we obtain a circuit that is temperature insensitive, with a
delay variation (induced by the temperature) that is less than 5% (Figure 21 in Section 3.4.1
demonstrates the measurable objective MO7.3.6 “Comparison with circuits obtained by
traditional synthesis methodologies: Temperature induced Delay variation kept below the
variation reported in the simulation results in the range 25-125°C (less than 5%)”) and a clock
skew reduction about 15% in the temperature range that goes from 25°C to 125°C (Figure 24
in Section 0 demonstrates the measurable objective MO7.3.7 “Clock skew comparison prior
and post optimization 15% of clock skew reduction in the range 25-125°C”).
THERMINATOR FP7 ICT – 2009.3.2 - 28603 D7.3.1
Page 32
4 Evaluation of thermal and aging aware optimization flow for
two-dimensional systems on chips (OFFIS, together with
UNIBO/ST, and CEA-LETI)
4.1 Introduction to evaluation of high-level thermal and
degradation estimation and optimization The evaluation of the high-level thermal and degradation estimation and optimization is two-
fold.
On the one hand it bases on the motion detection use-case 5 that has been provided by
UNIBO/ST within the project. This part of evaluation is presented in Section 4.2.1. At first,
Section 4.2.1.1 gives a short introduction and motivation to this use-case. Next, the synthesis
and analysis of the containing custom ASIC hardware accelerators is presented in Section
4.2.1.2. Together with the floor plan this leads to the overall power distribution of the use-
case. Section 4.2.1.3 then describes the BGA IC package that has been modelled and used for
the subsequent analysis of Section 4.2.1.4. This analysis first presents the green-function that
has been characterized in order to cover the package characteristics in terms of the thermal
expansion being dependent on applied materials, layer thicknesses, and cooling equipment.
Afterwards, the green-based thermal simulation is applied and the results are compared to
low-level FDM-based thermal simulations that are obtained by HotSpot. The two approaches
are quantitatively compared and the results are discussed. Later on, in Section 4.2.1.5, the
thermal-aware optimization capabilities of the use-case and of the flow in general are
presented.
Secondly, in Section 4.2.2 the evaluation of the green-function aware thermal simulation is
presented for the Genepy Multi-Processor System-on-a-Chip platform of use case 4 being
provided by CEA-LETI. Therefore CEA-LETI provided the necessary data for the use case 4
design including the datasheet of the BGA IC package, a block-level floor plan and transient
power traces for the 4 cores within the test chip.
4.2 Technical results 4.2.1 Evaluation of green-function based thermal estimation and optimization based
on use case 5
This Section addresses the evaluation of the green-function based thermal estimation and
optimization flow developed by OFFIS in WP6 task T6.2 and T6.3.
4.2.1.1 Introduction to use case 5 motion detection design
The overall use case structure has been described in the “Technical specification of test cases”
deliverable D1.3.1 [21]. The power and area figures of included system-level IP components
have been provided by UNIBO to OFFIS. This includes estimates for the processor, a DMA,
an external memory controller, a hardware accelerator wrapper, a bus and multiple memory
cuts.
The power and area demand of the ASIC hardware implementation of the motion detection
algorithms have been characterized by OFFIS. Therefore, C-algorithms have been provided
by UNIBO that have been synthesized to RTL and estimated with the OFFIS PowerOpt
synthesis and estimation engine. Of course, any other high-level synthesis could have been
used for this analysis. In the following the synthesis constraints and results are presented in
detail.
4.2.1.2 Custom ASIC hardware accelerators power and area determination
THERMINATOR FP7 ICT – 2009.3.2 - 28603 D7.3.1
Page 33
The custom ASIC part of the use case 5 consists of 9 different C processes, each of them
being synthesized to a separate design entity with a dedicated controller. As resource sharing
is not applied at process level, some of these processes are instantiated multiple times. In
addition, a top-level process exists for the overall program- and data-flow. For high-level
synthesis all of the processes have been passed to PowerOpt with the following set of
synthesis constraints:
- Generic 65nm semiconductor technology
- Ambient temperature: 25°C
- Supply voltage: 1.1V
- Frequency: 200MHz
- Optimization effort: Smallest area
- No constraints on resources, no pipelining or chaining
- No algorithmic optimization such as loop unrolling
Each of the processes is quickly synthesized to Verilog, simulated with a defined testbench
using Cadence NCVerilog, and the simulation results in terms of data pattern are back-
annotated to the operations of the algorithmic representation. Then, based on these realistic
data pattern, the synthesis is executed a second time in order to meet all synthesis constraints
and to optimize for the given constraints. The outcome of this synthesis is a RT-level datapath
for each process that is used for predicting the power and area and that can be exported to be
used in a subsequent RT- to gate-level synthesis. For estimation the internal component
library of a 65nm semiconductor technology is used that contains soft-macro models for all
available RT-level components.
Figure 25 gives an overview on the synthesis results for all processes including their power
and area estimates. In addition, the amount of controller-steps is given for each design entity.
THERMINATOR FP7 ICT – 2009.3.2 - 28603 D7.3.1
Page 34
Figure 25: PowerOpt High-Level Synthesis results of motion detection algorithms
THERMINATOR FP7 ICT – 2009.3.2 - 28603 D7.3.1
Page 35
In addition to the exported Verilog design, PowerOpt outputs an area and power report for
each of the design entity as described in deliverable D1.2.1 [20].
As described in D6.3.3 [19], these reports are read by the floor planner and the block-level
placing is performed. The resulting rectangle of Figure 26 is then placed next to the hardware
accelerator wrapper of the global floor plan provided by UNIBO leading to the final floor
plan of Figure 27.
Figure 26: ASIC hardware block-level floor plan
Figure 27: Block diagram of final floor plan of use case 5
THERMINATOR FP7 ICT – 2009.3.2 - 28603 D7.3.1
Page 36
The overall die size is ~3.7x3.7mm and the ASIC hardware part amounts to about only 1% of
it.
Together with the provided power figures of the IP components of UNIBO, we obtained the
power distribution within the die shown in Figure 28 that is used for the analysis.
Figure 28: Power distribution of the use case 5 design
4.2.1.3 Use case 5 IC package properties
Since the IC package is not defined in the use case description but has a crucial impact for the
thermal performance, this subsection describes the applied package. In general a Plastic Ball
Grid Array (PBGA) with a lead count of 256 has been used as it is described in the Intel
Packaging Databook. PBGAs have become the most popular packaging alternatives for high
I/O devices in the industry [12].
The PBGA is assumed to have a square size with 17mm edge length. Thicknesses of layers
and properties of applied materials are as they are described in the Intel data book [12], [13],
and [14]. For example, the seating plane thickness (BGA layer thickness) is set to 0.4mm, the
molding compound thickness is set to 0.8mm, and the substrate thickness to 0.4mm. Further,
a natural convection without any active cooling properties is assumed.
Figure 29: PBGA package scheme
THERMINATOR FP7 ICT – 2009.3.2 - 28603 D7.3.1
Page 37
Figure 29 shows a cross-section through the PBGA package as it has been modelled within
the thermal estimation flow using the package modelling technique proposed in [15].
The overall package is attached to a 2s2p PCB containing 2 power and 2 signal layers as
defined in the JEDEC guidelines [11]. The PCB layer thickness is 0.96mm in total. The
package is surrounded by air at an ambient temperature of 323K (~25°C).
4.2.1.4 Evaluation of developed estimation/analysis and optimization flow and tools of
WP6
The important measurable objective MO7.3.9 in this analysis is the comparison of the newly
created estimation results with results from well known low level tools and techniques. In
order to cover this MO, the green-function based approach has been evaluated against the low
level tool HotSpot, which is widely used in the scientific field. Further, HotSpot has been
adjusted to silicon measurements for the hardcoded IC package. Beside deviations in the
thermal prediction the runtime of both approaches are compared later on.
Figure 30: Error [K] of green-based vs. HotSpot thermal estimation in Kelvin for a
128x128 blocks grid
At first, a homogeneous power distribution has been assumed and the two approaches have
been compared. Figure 30 shows the error of the green-based thermal estimation in
comparison to the HotSpot internal discrete FDM. The main inaccuracies occur due to the
simplification to homogeneous layers in x- and y-direction. Further, heat is also dissipated
though the borders of the package and not solely in z-direction. As a main consequence the
error is marginal in the centre (below 0.1K) of the die and has its maximum at the corners
with a deviation of 0.25K. The average temperature in this analysis is 10K above ambient and
thus the resulting accuracy is very high.
The runtime of the low-level thermal simulation in HotSpot compared to the green-approach
is presented in Table 3. As it can be seen, the green outperforms low-level FDMs by far.
THERMINATOR FP7 ICT – 2009.3.2 - 28603 D7.3.1
Page 38
XY-Resolution (at 7 Z-Layers) Runtime Hotspot [s] Runtime Green [s]
32 x 32 1.53 0.0003
64 x 64 8.41 0.0008
128 x 128 29.16 0.0049
256 x 256 80.55 0.0163
512 x 512 191.85 0.0706
1024 x 1024 360.06 0.1655
2048 x 2048 944.34 0.6278 Table 3: Comparison of thermal estimation runtime HotSpot vs. Green
The use case 5 analysis is based on the package-characteristic green-function presented in
Figure 31. In contrast to the general evaluation of the approach that is based on the HotSpot
internal and hardcoded IC package this green-function has been characterized for the package
described in Section 4.2.1.3. It shows a high temperature peak of about 23K above ambient
temperature as an impulse response. Further it shows the temperature distribution in the
neighborhood of the sample power of 0.02W that is placed in the center of the die.
Figure 31: Green function for used PBGA package
Based on this impulse response characteristic of the PBGA package, Figure 32 presents the
steady state temperature distribution within the die of use case 5 including the effect of
electrothermal coupling. Since the overall power consumption is limited to below 140mW
and the selected PBGA package can dissipate up to 1.5 Watts at a low ambient temperature of
298K or about 0.8W for an ambient temperature of 323K, the thermal simulation results in a
maximum temperature increase of about 16K to the ambient temperature.
The variance within the die temperature distribution is very high, so there are also regions
close to the ambient temperature. This is because the 3.7x3.7mm die is attached to a
17x17mm PCB substrate and is covered by molding compound dissipating the temperature to
a large area. In average, the temperature is about 7.4K above the ambient temperature.
THERMINATOR FP7 ICT – 2009.3.2 - 28603 D7.3.1
Page 39
In comparison to the temperature estimates of UNIBO presented in D6.3.3 [19], the
temperature predictions as well as the temperature gradients vary because of a different IC
package. The modelled package of this work has a characteristic thermal resistances θCA
between IC case and ambience of about 60°C/W. In accordance to [13] this value represents a
middle-class package without any active cooling. Heat spreaders, heat sinks or air flow would
reduce the θCA parameter by a factor of up to 6 but then the dissipated power would not be
sufficient for a significant temperature increase.
Figure 32: Steady state temperature distribution of use case 5 design
The temperature distribution of Figure 32 has also been evaluated against a low-level explicit
FDM simulation. Figure 33 shows the deviation map for the use case 5 design. The green
simulation shows maximum errors of 2.7% at the top corners at an average error of 1.52%.
THERMINATOR FP7 ICT – 2009.3.2 - 28603 D7.3.1
Page 40
Figure 33: Deviation of green-function based thermal estimation in comparison to discrete
FDM simulation for use case 5
In order to analyze the long-term degradation of the use case 5 design, the NBTI models
described in [16] and published in [22] have been applied. Therefore, the floor plan is cut into
11x11 blocks and the maximum temperature and supply voltage at each location is fed into
the NBTI models. These models then compute transient traces for the duration of two years.
As the models are characterized for a time resolution of 1h, they are applied 17088 times for
each block. The analysis runs for about 28 hours and results in the following maximum
threshold voltage increase map. Of course, the granularity can be reduced to a finer resolution
at the costs of higher runtime.
Figure 34: Threshold voltage increase after two year degradation
As it can be seen the degradation correlates with the temperature map. In maximum, the
threshold voltage increases by 17.2mV while the mean degradation is 12.5mV.
THERMINATOR FP7 ICT – 2009.3.2 - 28603 D7.3.1
Page 41
The threshold voltage increase describes the worst case degradation that can occur for every
single PMOS transistor within the considered block. Although it is a pessimistic assumption
to have no phases of relaxation for the transistors, it is a common approach in literature to
assume that at least one transistor exists which will operate under this condition.
4.2.1.5 Thermal- and degradation-aware optimization evaluation
As described in D6.3.2 the developed optimization techniques impact the high-level synthesis
as well as the ASIC HW floor planner by constraints in order to provide thermal- and
degradation-aware results. The potential of these approaches have also been demonstrated at
an example design investigated in deliverable D6.3.2 [18].
In use case 5 the custom ASIC hardware part occupies only a small fraction (about 1%) of the
total die size. Further, all different C processes are very similar to each other because each of
them consists of a two-folded loop with little array-computation inside. The resulting power
density is thus also very similar and in consequence temperature gradients within the custom
ASIC hardware block are small.
For these reasons the evaluation of the thermal- and degradation-aware optimization focuses
on the global floor plan and is not limited to the small custom ASIC block. Figure 35 shows
the optimized block diagram of the UNIBO floor plan optimization presented in D6.3.3 [19].
The main difference is that the memory cuts are placed in the centre of the die, leading to a
more homogeneous power and temperature distribution. With this floor plan the peak
temperature is predicted to be 3K lower than the one of the non-optimized floor plan.
Figure 35: Block diagram and steady state temperature distribution of use case 5 design with
optimized floor plan of UNIBO
THERMINATOR FP7 ICT – 2009.3.2 - 28603 D7.3.1
Page 42
Figure 36 shows the degradation map of the optimized design. The maximum threshold
voltage increase is 15.6mV and thus about 9.3% lower than in the non-optimized case while
the mean degradation remains at 12.5mV. This result demonstrates the measurable objective
MO7.3.9 of quantification of the improvements reached through the thermal and aging aware
optimizations.
Figure 36: Threshold voltage increase after two year degradation with optimized floor plan
4.2.2 Evaluation of green-function based thermal estimation based on Genepy design
of CEA-LETI
In order to evaluate the thermal estimation approach based on real-life measurements, CEA-
LETI provided the necessary data for the use case 4 design. This includes the datasheet of the
BGA IC package, a block-level floor plan, as well as transient power traces for the targeted
FFT execution. Regrettably, no detailed power values are available for the surrounding
components.
THERMINATOR FP7 ICT – 2009.3.2 - 28603 D7.3.1
Page 43
Figure 37: Steady state temperature distribution of use case 4 design.
Figure 37 shows the predicted steady state temperature distribution of the use case 4 design.
As the power dissipation for the FFT execution is far below 1W and the BGA package can
dissipate multiple Watt without the need for active cooling, the peak temperature increase is
very limited and the overall IC package will not heat up significantly due to the FFT
execution.
The real-life measurements were obtained for the steady state in which the FFT is repeatedly
executed in order to compare the results to the simulation. The temperature plot in Figure 38
has been provided by CEA-LETI. It can be separated into three phases: During the first and
third 600 second frame the cores are idling, while the FFT is repeatedly executed in the
second phase. As it can be seen, the idle temperature is approximately 34°C (307K) and the
measurements show a temperature increase of about 1.5K (up to ~35.5°C) during the phase of
FFT execution. This fits very well to the predicted temperature increase of 1.5K in Figure 37,
addressing MO7.3.14 of comparing measurements data against simulations.
Figure 38: Temperature measurements provided by CEA
The following list summarizes remaining inaccuracies between estimation and real-life
measurements and discusses the impact:
THERMINATOR FP7 ICT – 2009.3.2 - 28603 D7.3.1
Page 44
Only power consumption of SMEP cores known and taken into account: As no
detailed power values are available for the neighbouring IP level components of the
die, the idle temperature of 34°C has been used as ambient temperature for the
simulations.
Power consumption is assumed to be distributed homogeneously throughout the
SMEP cores: The temperature sensors are placed closed to the centre of the core and
measure the temperature only at this specific point. As a result, the real temperature
might be slightly higher or even lower than the estimated one dependent on the exact
spatial power distribution.
Electrothermal coupling not considered for target technology: As the technology
data of the semiconductor technology that has been used for manufacturing the
samples is not available and distributed to the partners, temperature-dependent leakage
models have not been characterised. As a consequence, electrothermal coupling can
only roughly be estimated. Further, a power breakdown, differing between static and
dynamic power is not available. As the power dissipation is very low in this use case
the temperature increase is also very limited. Thus, electrothermal coupling is
neglectable in this design.
IC package material uncertainties: The package modelling bases on available
datasheets that lack of precise material characteristics. Thus, during modelling, typical
thermal conductivities have been assumed. This assumption will lead to realistic
results and will only have a minor impact.
4.3 Conclusions The proposed multiphysics estimation flow for 2D SoCs taking into account the power
dissipation and place & route information has not been available before THERMINATOR.
The results of this multiphysics estimation flow for 2D SoCs has been presented in numerous
deliverables ([15]-[19]) and publications [22][23]. The measurable objective MO7.3.9 of
comparison of thermal estimations to estimations done by well known low-level tools has
been demonstrated in Figure 30.
The speed up of the proposed flow in comparison to a low-level analysis is significant. Once
the package has been characterized the thermal simulation estimate can be obtained multiple
times a second and the electro-thermal coupling effect results in a steady state very quickly.
The speed up factor of the pure thermal simulation is about 1500 in comparison to a low-level
FDM simulation. In addition the proposed NBTI models have been evaluated to show a
speedup of 600 in comparison to the explicit reaction-diffusion solving. Thus, in total, the
flow enables multiple design tradeoffs regarding the synthesis parameters and the floor plan
as well as long-term reliability analyses.
Use case 5 has shown a peak temperature of 16K above ambient while the mean temperature
increase is 7.4K. Assuming this as a steady state for a duration of two years, the NBTI
degradation models show a maximum increased threshold voltage of PMOS devices of
17.2mV. This degradation highly correlates to the temperature distribution and has its peak at
the point of highest temperatures. The models have been evaluated to have relative errors
below 10% in comparison to an explicit calculation in [22].
The measurable objective MO7.3.9 of quantification of the improvements reached through the
thermal and aging aware optimizations is demonstrated by a 9.3% lower maximum threshold
voltage shift (Figure 36). Furthermore, use case 4 has shown the compliance between steady-
state simulation and silicon measurements (MO7.3.14). The measured temperature trace shows
only small variations of about 15% around the predicted value.
THERMINATOR FP7 ICT – 2009.3.2 - 28603 D7.3.1
Page 45
5 Verification of simulator engines (BME together with POLITO)
5.1 Introduction 5.1.1 The simulator engines developed at BME
5.1.1.1 Logi-thermal simulation
Logi-thermal simulation is a novel method that is capable of determining the thermal
behaviour of digital systems at the gate level.
Due to the uneven event density on the chip surface, digital blocks normally experience time-
variable temperature gradients. Due to the fact that digital gates show a dependence on
temperature, a merely digital, gate-level simulation may strongly deviate from the actual
simulation results when the surface temperature profile of the chip is considered in calculating
the actual delays of the individual gates. Self-consistency between the thermal behaviour and
the digital behaviour of the chip has to be maintained: logic and thermal operation have to be
traced together. When this is assured: the simulation is logi-thermal simulation. The major
application of such a simulation is to make sure that during timing analysis thermal effects are
considered (thermal-aware signal integrity check). In other words, to make sure that the
digital circuit will properly function under all allowed thermal conditions.
The Celltherm engine developed in Work package 3 (T3.1) uses commercial engines and
glues them together to enable them to perform a logi-thermal simulation. It uses a Fourier
based method for thermal compact model generation. Using these compact models during the
relaxation process the logic simulator iterates with a Fourier based analog thermal solver
which presents the results of the thermal domain.
5.1.1.2 Electro-thermal simulation
The electro-thermal simulator engine developed in Work package 4 (T4.1) is a Spice-
compatible electrical simulator with a thermal extension. Every circuit element's model is
extended with a thermal node. These nodes connect the electrical and thermal parts of the
circuit. Thermal behaviour is modelled by electric equivalent RC circuits: compact models
created using package models and the layout.
The engine is based on the direct method thus the iterative solution takes place simultaneously
for the thermal and electrical sub-networks. This makes it possible to consider very fast
changes and to take feedbacks between the two domains into account. Another advantage is
that it permits AC simulation as well.
5.1.2 Measurable objectives
Logi-thermal simulator engines use simplified dissipation models for logic gates and cells to
speed up simulation times. The accuracy of models were verified by transistor-level Spice
simulations. The objective MO7.3.10 was to achieve a matching within 5%.
On the test design from POLITO (done with STMicroelectronics STD cells) we have
successfully performed logi-thermal simulations and determined power density as well as the
evolved hot-spots on the surface.
The electro-thermal engine performs an electric simulation on electric and thermal models, so
it was verified by a commercial Spice simulator. In this case measurement results of two
integrated circuits were also available and the matching between the simulations and the
measurements also showed a good agreement (< 10%).
5.2 Detailed description of the verification 5.2.1 Verification of the logi-thermal simulator engine
The CellTherm simulations have been verified with several approaches. From the SPICE
simulation of the transistor-level netlist of the design, the dissipations and delays of the cells
has been extracted. Dissipation and timing data could have been also extracted from a Liberty
database of the process. A Liberty file in our process node (TSMC 0.35 µm) was not present,
THERMINATOR FP7 ICT – 2009.3.2 - 28603 D7.3.1
Page 46
so we needed to extract the parameters from SPICE simulations manually (Liberty databases
claim to be inside 2% of accuracy compared to SPICE simulations).
The SPICE simulation time was 64 μs. This was enough to reliably sample power, delay and
frequency of the circuit. The average power dissipations were compared to the dissipations
measured by CellTherm. As a next step, the equivalent thermal Foster RC networks that
represent the structure and layout were transformed to a SPICE compatible netlist. The
average cell powers calculated in the previous step has been fed into the Foster RC network
that resulted in the temperature functions of the cells. Finally, these functions simulated with
SPICE and the CellTherm temperature curves were compared. The difference between the
SPICE and CellTherm temperature curves was less than 0.16%. The calculated difference
function can be seen in Figure 39.
Figure 39. Difference between CellTherm and SPICE results
Another validation has been done to further verify the correct operation of CellTherm. A
simulation has been run using CellTherm on the demonstration circuit until steady-state
temperature. The previously mentioned SPICE netlist simulation was executed on the steady-
state temperature measured with CellTherm in the last step. The output frequency of the ring
oscillator was measured in both the CellTherm and SPICE results. The SPICE simulation
resulted in fosc = 1.453 GHz. CellTherm measured fosc = 1.524 GHz. The difference is below
4.82%. This difference could be further eliminated by providing delay values for every type
of transitions (01, 10, 0x,1x, etc.) in the SDF file.
5.2.2 Comparison of the two logi-thermal simulator engines
The two logi-thermal engines (CellTherm and Logitherm) has been verified with a
10 mm × 10 mm standard cell digital circuit with a 4-bit D-flip-flop chain and a ring oscillator
circuit. This design is a fictional design and cell sizes are intentionally enlarged to be able to
demonstrate the effect of temperature variations and evolving hot-spots on cell propagation
delay. Power dissipations for logic transitions in the cells are also fictional values large
enough to spectacularly demonstrate the mentioned effects.
The same physical layout definition and delay descriptor SDF file was used in the engines.
In Figure 40 the schematic layout of the design is shown. In the upper part of the chip the four
D-flip-flops form the exciting circuit. The dissipated powers in the DFFR cells are
intentionally chosen to be 1000-times larger (1mW) than the inverters' dissipated power per
logic transition (1 µW). In the lower part of the layout is the ring oscillator formed by 10
inverters and a kick-in NAND gate.
THERMINATOR FP7 ICT – 2009.3.2 - 28603 D7.3.1
Page 47
Figure 40 The schematic layout of the design
The physical layout of the circuit is shown in Figure 41
Figure 41 The physical layout of the circuit.
Figure 42 shows the comparison of the logic simulation results. The waveforms of the two
engines are plotted together to show the accordance.
THERMINATOR FP7 ICT – 2009.3.2 - 28603 D7.3.1
Page 48
Figure 42 The waveforms of the logic simulation
Figure 43shows that the calculated power dissipation of the logic elements also matches in the
two engines.
Figure 43 Power dissipation diagram of an inverter in the chain.
Finally, Figure 44 shows the transient simulation of an inverter's temperature calculated by
the engines. The temperature as a function of time can be seen on the left hand side of the
figure while the relative error of the results is shown on the right. The initial large error is due
to differences in the numeric algorithms used by the two engines. The error decreases below
5% percent with the time-constant of the system.
THERMINATOR FP7 ICT – 2009.3.2 - 28603 D7.3.1
Page 49
Figure 44 Transient simulation of a logic cell's temperature
Based on the simulation results presented above it can be stated that the two engines produce
the same results within an error of 5%. (MO7.3.10)
5.2.3 Real-world evaluation designs
5.2.3.1 Ring oscillator containing 1000 inverter cells
One of the test circuits was a digital ring oscillator containing 1000 inverter cells and one
kick-in nand cell. The technology node was TSMC 0.35µm. The structural Verilog
description of the circuit was generated by a script and then place & route was achieved in
Mentor Graphics Pyxis environment. The resulting layout is shown in Figure 45. The size of
the die resulted to 2.13x2.19 mm without pad ring.
Figure 45. Layout of the ring oscillator (1000 inverters)
Logi-thermal simulations were run with CellTherm until steady-state temperature of 0.193 °C
(temperature difference compared to initial temperature). The layer structure was silicon (500
µm) on top of glue (50 µm) on top of Kovar (nickel-cobalt ferrous alloy) (500 µm) with
THERMINATOR FP7 ICT – 2009.3.2 - 28603 D7.3.1
Page 50
adiabatic boundary conditions at the sides and the top. Steady state temperature was reached
after 3.86 seconds. Power dissipation and thermal distribution maps are shown in Figure 46
and Figure 47.
Figure 46. Power dissipation (1000 inverters)
Figure 47. Temperature map (1000 inverters)
5.2.3.2 Test circuit from POLITO
CellTherm was also used to simulate a test design from Politectnico di Torino (POLITO) on
a STMicroelectronics 65nm technology node containing DSPs, LFSRs, multipliers and XOR
units. Layout of the synthesized, place & routed design is shown in Figure 48. Synthesis was
THERMINATOR FP7 ICT – 2009.3.2 - 28603 D7.3.1
Page 51
done using Cadence Resource Compiler (RC), placement and routing was done by Cadence
Velocity (formerly Encounter). Die size resulted to 0.12x0.12 mm without pad ring.
Figure 48. Layout of POLITO circuit
Temperature difference in steady-state resulted to 0.188 °C after 4.56 seconds. Power and
temperature maps are shown in Figure 49 and Figure 50. The layer structure was silicone
(500 µm) on top of glue (50 µm) on top of Kovar (nickel-cobalt ferrous alloy) (500 µm) with
adiabatic boundary conditions at the sides and the top.
Figure 49. Power distribution in POLITO circuit
THERMINATOR FP7 ICT – 2009.3.2 - 28603 D7.3.1
Page 52
Figure 50. Temperature distribution in POLITO circuit
In both of the designs the resulting temperature distribution was quite homogenous that is
caused by the evenly distributed cell network and the good heat transfer property of silicon. In
a design with a more scattered layout with different partitions of digital blocks (cache
memory, ALU, etc.) the effect of uneven heat distribution might be better observed. Also in a
mixed signal design with an analog and a digital part, the cross-influence of the analog and
digital parts could be simulated. In CellTherm, this could be achieved by substitution of the
analog part with a black-box where the analog part’s dissipation and placement on the layout
should only be known.
5.2.4 Verification of the electro-thermal engine
The benchmark circuit was the well-known µA741 operational amplifier (Figure 51). With
this the classical benchmark the effect of the thermal feedback from the output stage to the
two transistors of the differential pair of the input stage and how does this effect depend on
the actual physical realization (layout, packaging style). We studied two layout variants (from
two different vendors) as illustrated in Figure 52.
Figure 51 : Schematic of the µA741 operational amplifier
THERMINATOR FP7 ICT – 2009.3.2 - 28603 D7.3.1
Page 53
.
The transistors marked with yellow circles are simulated using their electro-thermal model.
We were able to acquire this integrated circuit with two different layouts. This fact gave us
the ability to verify our electro-thermal solver at different thermal scenarios.
(A)
(B)
Figure 52 : The two different layouts for the µA741
(A)
(B)
Figure 53 Transfer characteristics with different loads
The simulated open-loop transfer characteristics can be seen with different loads (Figure 53).
These results demonstrate well that the structure of the physical layout (see Figure 52) has a
major effect on the electrical behaviour through thermal coupling.
Figure 54 shows the AC simulation of the same circuit in a feedback configuration together
with measurement results.
THERMINATOR FP7 ICT – 2009.3.2 - 28603 D7.3.1
Page 54
(A)
(B)
Figure 54 Behaviour in the frequency domain with different loads
We have multiple information on these figures.
- The simulation and measurement results have an excellent matching. This verifies our
electro-thermal simulator.
- It is an interesting fact, that in certain feedback configurations, at the lower frequency
range we found frequency values where one of the samples behaved as ideal inductor
while the other as ideal capacitor. The only difference between the two was their
physical layout.
Configurable Level 0 Test Case
Figure 55 shows the main elements of our test case. It comprises a configurable RTL level
design and a configurable test bench which contains different scenarios to test the design. The
configuration file defines all important parameters related to the design. This file is used in
the RTL source code, in the test bench and also in the automatic floorplan generation script.
As described in the Error! Reference source not found., after the configuration has been
applied, the RTL design will be synthesized with standard cell library file. We then perform
post-synthesis simulation on the obtained gate-level design to measure switching activity of
the design when imposing different test scenarios to it. Switching activity data will then be
used for power consumption estimation of the design.
The floorplan file indicates the location of each section of the design on the chip. This file is
automatically generated by one of our scripts. The file should mainly be used during place &
route. However, in order to achieve more accurate results we can also synthesize the design in
topographical mode and thus utilize the floorplan during synthesis.
We are able to perform all of our design evaluation, timing/power and temperature analysis
using the generated gate-level design produced by synthesis engine. As a result, the
place&route step is not mandatory and might be done in order to just have more accurate
results.
THERMINATOR FP7 ICT – 2009.3.2 - 28603 D7.3.1
Page 55
As introduced earlier we have developed a computational unit as extension of previous test
cases. This will be more suitable to explore thermal implication in a more realistic scenario
exploring the effects of temperature (such as hot-spot, and thermal gradients) within a larger
area design.
The configurable design is capable of doing basic floating point operations. These operations
are very common and widely used in every computational system. The unit is completely
parametric. The first set of parameters indicates the width of operands used in computations.
The second one specifies how many parallel instances of the computational unit should be
included in the design.
Figure 56 shows the hierarchy of the design and Error! Reference source not found. 57
shows its inputs and outputs.
Figure 55 Test case design process flow.
THERMINATOR FP7 ICT – 2009.3.2 - 28603 D7.3.1
Page 56
Each component of the design, named dspunit in figure is composed by a set of sub-modules
executing different floating point operations. These are two operand addition (Addf), two
operand multiplication (Mulf), two operand division (Divf), exponential value calculation
(Expf) and natural logarithmic calculation (Lnf). These operations are typical of FPU units.
Table 4 shows the list of design important parameters and their description.
Table 4Configuration Design Parameters – Behavioural.
PARAMETER Description
DATA_WIDTH Total width of input operands to the unit. Each input operand is
a floating point number.
EXPONENT_WIDTH Width of exponent part of input floating point operand.
OPCODE_WIDTH Width of OpCode input port. Can be changed based on number
of operations that the computational unit performs.
STATUS_DATA_WIDTH Width of status output port.
NUMBER_OF_UNITS Total number of parallel dspunit instantiated in the design.
5 contains detailed description of the design ports.
Table 5 Input/Output ports description
PORT Direction Width Description
OpCode Input NUMBER_OF_UNITS*
OPCODE_WIDTH
Indicates the operation which should be
performed on the input operands. We use
one-hot encoding for this port. Each
OpCode activates one of (Addf, Divf, ...)
units.
In1 Input NUMBER_OF_UNITS*
DATA_WIDTH
Input floating point operand to the design.
Figure 56 Hierarchy of one
computational block.
Figure 57 Input/output ports of the
design
THERMINATOR FP7 ICT – 2009.3.2 - 28603 D7.3.1
Page 57
In2 Input NUMBER_OF_UNITS*
DATA_WIDTH
Input floating point operand to the design.
Out Output NUMBER_OF_UNITS*
DATA_WIDTH
Output results of the computation as a
floating point value.
StatusOut Output NUMBER_OF_UNITS*
STATUS_DATA_WIDT
H
The status of the last floating point
operation done in the design.
Clk Input 1 Input clock port to the design.
ResetL Input 1 Input Synchronous, Active-Low reset.
Computations are performed in a pipeline manner, meaning that the unit can be fed with new
OpCode and new input values in each clock cycle. All of the floating point operations (Addf,
Divf, ...) will be performed during only one clock cycle.
Table 6 shows the result of synthesis done on the design when it has only one computational
unit but different values for DATA_WIDTH parameter. We use a typical standard cell library
(with a typical corner case) for performing synthesis.
We compare the complexity of the presented test case with a basic design such as an 8 bits
counter. For each selected DATA_WIDTH value we show the number of leaf cells required
for the design and also the critical path delay relative to 8-bits counter design.
Table 6 Design Complexity. The number of leaf cells and the critical path delay is given
relative to the 8-bit counter design
DATA_WIDTH EXPONENT_WIDTH Leaf Cells Critical Path Delay
16 4 353x 21.6x
32 8 736x 26.6x
48 16 1535x 34.4x
As early introduced, it is possible to instantiate as many number of parallel computational
units as required. By changing this number it is possible to evaluate the effect of temperature
gradients and hotspots in comparable area with real digital IPs designs.
The floorplan of the design, for placement and routing or for topographical synthesis will be
done in an automated fashion using the developed Python script. The script receives
parameters of the design and produces a TCL file which contains die size and area constraints
for each of the computational units as well as the sub-modules inside each unit.
Table 7 describes the parameters of the Python script.
Table 7 Configuration Design Parameters – Floorplan
Parameter Description
GENERATE_SUB_FLOORPLAN If true, the script produces area constraints for each
of the sum-modules (Addf, Divf,...) of each
computational unit
THERMINATOR FP7 ICT – 2009.3.2 - 28603 D7.3.1
Page 58
NUMBER_OF_UNITS Number of total parallel computational units in the
design
UNIT_WIDTH Desired width of each computational unit area in
the floorplan
UNIT_HEIGHT Desired height of each computational unit area in
the floorplan
INTER_UNIT_DISTANCE Size of the empty space placed between adjacent
computational units in the floorplan.
Figure 58 – Figure 60 show the floorplan of the design and the sample place&route results. In
this example the floorplan specifies strict boundaries for each of computational units, but it
allows the place&route tool to place each of the internal sub-modules optimizing timing of the
design.
The developed automated test bench generator for the design produces a random stream of
floating point values as the inputs of each computational unit. It then specifies the OpCode
value for each unit. The OpCode value mainly indicates what task each computational unit
Figure 58: Floorplan of one unit of the
test case.
Figure 59: Amoeba view of the placed
design.
Figure 59: Floorplan of the design
containing 4 computational units.
Figure 60: Sample place&route result
for the design containing 4
computational units.
THERMINATOR FP7 ICT – 2009.3.2 - 28603 D7.3.1
Page 59
should do. The test bench is easily configurable and can produce various different scenarios.
Table 8 shows currently available OpCode values and their operation.
Table 8 OpCode Description
OpCode Value Description
0x00 No operation. Data on inputs of computational unit
will not be read at all. No activity will happen.
0x01 Floating point multiplication.
0x02 Floating point addition.
0x04 Floating point exponential calculation.
0x08 Floating point division.
0x10 Floating point natural logarithm calculation.
The logic structure of each computational unit is so that, when it receives a new OpCode,
switching activity will happen only in the registers and gates related to the sub-module which
is responsible for that OpCode. No changes will happen in other sub-modules. Figure 860 61
shows a snapshot of Synopsys DVE tool, while analysing the gate-level simulation results of
the design.
The output of post-synthesis simulation will be compared against the output of functional
simulation to ensure the correctness of the results. The output VCD file produced by post-
synthesis simulation will be used to produce switching activity statistics of the circuit for each
specific time interval. The switching activity statistics will then be used to produce average
power consumption of the design for the target test scenario
5.3 Conclusion We have verified the logi-thermal simulators with SPICE simulations. Standard cell
dissipation and timing data were extracted from SPICE simulations using the equivalent
Figure 860: Figure 61Sample post-synthesis simulation waveforms of one computational unit when
applying different Op-Code values to the design.
THERMINATOR FP7 ICT – 2009.3.2 - 28603 D7.3.1
Page 60
netlists of the cells. With the extracted data on transistor level the logi-thermal simulators
were able to calculate energy dissipations according to the switching activity of the circuit.
Delay values were also extracted from SPICE netlists and were used to determine temperature
dependent timing in the test circuit. We have extracted the average dissipation of the circuit
with SPICE for the first 64 µs. This time limit was chosen as an optimum where the circuit
had already been in every possible state and the average dissipation settled to a steady-state
value. The average power dissipated by each cell in the circuit were the input to the thermal
equivalent Foster RC network representing the die and the package. The Foster RC network’s
response to the input average powers was also simulated with SPICE which resulted in the
surface temperature of the cells. We compared these SPICE temperature results with the logi-
thermal engines’ results and measured a match below 0.16%.
Figure 62. Difference between CellTherm and SPICE results
To validate the logi-thermal simulator with another method we simulated a test circuit with
the logi-thermal simulators until steady-state temperatures were reached. This steady-state
temperature was saved for later use in a SPICE simulation. The test circuit was a ring-
oscillator where the oscillation frequency was dependent on the device temperature. On
steady-state temperature the oscillator frequency resulted to fosc,logi = 1.524 GHz in the logi-
thermal simulation. We then performed simulation of the same circuit in SPICE (using the
transistor level netlist) on the same steady-state temperature measured previously from the
logi-thermal simulator. The resulting frequency in the SPICE simulation resulted in fosc,SPICE =
1.453 GHz. The error in percents is 4.82%. With this method we validated that the
temperature dependent delay calculation in the logi-thermal simulations meet the expected 5%
accuracy.
The two logi-thermal engines developed in the project were also compared and the results
show that their outputs match within an error of 5%. (MO7.3.10)
The electro-thermal simulator was validated with measurements as well – the matching was
within 10%.
THERMINATOR FP7 ICT – 2009.3.2 - 28603 D7.3.1
Page 61
6 Thermal effects in identification applications (NXP-D) 6.1 Introducion
NXP-DE has done the evaluation and validation of the simulation by means of the test chip
designed in WP4. The ambient influence on temperature distribution on chip level was
measured and simulated (see D6.1.2).
The design comprises several structures to force and measure thermal effects, see Figure 63.
The silicon measurements were done on basis of the specification described in deliverable
D4.1.3. It is the first time that the CMOS 40nm process with respect to the thermal behaviour
is verified. The deliverable D4.2.1 gives an overview about the specified parameters with
respect to thermal influence on self heating of analogue circuitries. This is measured and is
evaluated on silicon.
Figure 63: Testchip Layout
Based on these measurements the verification of the modelling and simulation technology
CAD (TCAD) by Synopsys was done.
6.2 Technical results To evaluate the impact of the thermal characteristic on packaged devices 2 types of packages
were evaluated: QFP100 and LCC84.
NXP Therminator test chips investigations were done in the same way on both packages. The
first assessment is to measure the self heating of the test chip itself. Therefore from both
packages QFP100 and LCC84 one device was left open as reference measurement. The open
devices are the reference for evaluating the influence of package on the self heating of the
silicon. The results of the open packages are discussed in WP4 deliverable D4.2.1.
The measurements of the bipolar test structure on the Therminator test chip proved that
measurement and simulations on the tools from Fraunhofer are matching. The self heating
effect can be seen as a local event at the power transistor settings up to 0.25W.
THERMINATOR FP7 ICT – 2009.3.2 - 28603 D7.3.1
Page 62
The reference measurement of the diode voltage difference over temperature is showing a
behaviour fitting a linear equation: Temperature T=6620.431635°C/V*x-276.688°C. The
temperature dependency of the diode voltage is shown in Figure 64. It is obvious that the
constant term is nearly equal to 0K (≈-273°C). The difference of 3°C is related to a
measurement offset of the validation test bench environment. The gradient of the curve is
defined by the factor 6620.431635°C/V. Translated in mV, it means a gradient of 6.62°C/mV.
Figure 64: Measurement Result of Bipolar Difference over Temperature
This model is completely independent of process parameters, thus there is virtually no spread
from device to device. The following evaluations have been done based on this measured
curve between diode difference voltage and the measured temperature.
Figure 65 is shows the real measurement in comparison to the theory. It has to be noted that
the offset which was already discussed in the equation above is also visible in the chart.
THERMINATOR FP7 ICT – 2009.3.2 - 28603 D7.3.1
Page 63
Figure 65: Measured bipolar trend curve in comparison to theoretical calculated curve
The measured curve is a parallel shift of the theoretical calculated curve. The red curve is
used in the next measurements as a reference to calculate the temperatures out of the voltage
differences of the bipolar transistor pairs.
The temperature distribution on silicon is shown in Figure 66.
At different power setting of the heater which is conceptually shown in the middle of the chart
the temperature distribution is having a high influence of self heating of the silicon itself. The
reference measurement where the heater is switched off is demonstrated in the blue curve.
The measured temperature is stable over all bipolar pairs at 39°C.
By switching on the heater, the temperature increases at the first bipolar pair located nearest
to the heater of about 10°C to 13°C. The distribution of the self heating effect that was
measured delivers a temperature decrease of 40C over a distance of 47µm.
It is seen that the self heating effect can be calculated locally on-chip. The influence of self
heating at distances of more than 100µm can be neglected. Also, it is seen that the self heating
effect on silicon is independent on the direction, as it is evenly distributed on the area.
THERMINATOR FP7 ICT – 2009.3.2 - 28603 D7.3.1
Page 64
Figure 66: Measured self heating effects on distribution over silicon
Figure 67 compares the behaviour of the Therminator Test Chips in 2 operation modes: open
LCC84 (green curve) package and encapsulated LCC84 (purple curve).
Figure 67: Measurement results LCC84 open (green) and encapsulated LCC84 (purple)
The encapsulated device has a better heat dissipation in comparison to the open device. This
means that the package LCC84 reduces the self heating. The chart is also indicating that the
difference with respect to self heating is roughly 1°C.
The same measurement with a QFP100 package yields about the same result, see Figure 68.
Figure 68: Measurement results QFP100 open (blue) and encapsulated QFP100 (red)
Figure 69 illustrates the modelling flow of the thermal behaviour of the NXP testchip.
THERMINATOR FP7 ICT – 2009.3.2 - 28603 D7.3.1
Page 65
Figure 69: NXP testchip with transistor heater and various sensor elements for
temperature sensing purpose (left part). 3D TCAD model of the heater and diode sensor
elements (right part).
The verification of the simulation model versus measurement is summarized in Figure 70. It
appears that at lower heating powers the measurement data seem to converge to a fixed value.
A parasitic resistor due to a non-ideal layout causes a ground-shift, which explains this
behaviour. At higher heating powers the effect of heater starts to dominate and a good
concurrence between simulation and measurement can be observed.
Figure 70: Temperature at different sensor locations (refer to Figure 10) as a function of
heater power. The filled circles indicate measurements negatively offset by 6 degrees
and the open diamonds are simulation results.
THERMINATOR FP7 ICT – 2009.3.2 - 28603 D7.3.1
Page 66
6.3 Conclusions
The measurements of the NXP test chip and the results from Synopsys TCAD simulation and
modelling simulation tools match sufficiently (MO7.3.11). The characterization of the diode-
voltage over temperature is well in line with the theoretical expectation. The impact of the
encapsulation on the thermal behaviour with respect to the self-heating in silicon could be
demonstrated.
THERMINATOR FP7 ICT – 2009.3.2 - 28603 D7.3.1
Page 67
7 Evaluation of simulation-based verification, optimization and
RSM model generation methodologies (MUN, together with
NXP-D, and ST) 7.1 Introduction
MUN worked in THERMINATOR project to integrate and improve their tools and
methodologies for thermal-aware design in the design environments and in close cooperation
with the industrial project partners such as ST[24][26][27], NXP [24] and IMC but also
institutes such as FHG and others [25] and EDA partners such as SNPS [28]. Target of this
collaboration has been to improve the capabilities of the design flow of the industrial project
partners to speed up their design process and receive better results. As part of the project
simulation-based methodologies to analyze and to reduce the impact of thermal fluctuations
on the behaviour, yield or reliability of analog/RF blocks have been developed and applied to
industrial test-cases. Simulation-based methodologies to analyze and reduce the impact of
thermal fluctuations on the behaviour, yield or reliability of analog/RF blocks have been
developed. The evaluation of the simulation-based verification, optimization and RSM model
generation methodologies has been evaluated by MunEDA in cooperation with ST and NXP
using different test-cases of the industry partners. Exploiting the test cases defined in Task 1.3
and the device models developed in WP2, the predictability of the impact of operating
conditions such as thermal variation on the behaviour of the circuits under tests has been
successfully assessed, demonstrated and documented.
7.2 Technical results MUN solutions and tools have been applied to several test-cases of the industrial project
partners in different process technologies such as 40nm and 28nm. In the project there have
been two test-cases by ST documented and two test-cases by NXP where MUN technology
has been used to analyse and improve the underlying circuit sizing problems. The test-cases
have been
- a double-ring oscillator consisting of a main PLL and dither PLL (ST) [26]
- a 2.133GHz Level Shifter in 28nm (ST) [27]
- a Sensor Ring Oscillator with power device transistors based on diode pairs in 40nm
(NXP) [24]
- a POR Power-on-Reset in 40nm (NXP) [24]
All test-cases and the underlying design techniques including using MUN methodologies
have been described in detail in deliverable 4.2.2.
Within the project activities could be achieved very good technical results:
- The project partners such as ST and NXP have been able to speed up their design and
sizing time as well received better and more reliable design results
- For several of such test-cases this has also been silicon proven.
- Correlations could be measured between simulation and silicon results that lead to
further enhancements in the evaluated circuit design methodologies especially for
thermal modelling and temperature-influenced effects on the circuits
- Enhanced statistical analysis methods such as Monte-Carlo analysis under worst-case
temperature and corner conditions as well as deterministic Worst-Case-Distance
methods have been applied to the investigated circuits
THERMINATOR FP7 ICT – 2009.3.2 - 28603 D7.3.1
Page 68
The results have been also shown as measurable objectives in the relevant WP7 reports for 7.3
as follows:
- Integration of MunEDA WiCkeD tools within NXP-D design environment (MO7.3.11)
- Integration of MunEDA WiCkeD tools within ST design environment [24][26][27]
- Integration of MunEDA WiCkeD tools with SNPS custom design environment tools
[28]
- Yield improvement by up to 90% (MO7.3.12, as part of THERMINATOR project
objective #4: demonstration of the applicability and effectiveness of the new design
solutions through manufacturing of test-chips featuring leading-edge silicon
technology, as available from some of the project partners).
The above figures show the yield optimization from initial value with not fulfilling the
specification on the left side and the result after using the tool YOP for yield
optimization of all given performances against the process variations. Manual sizing
could not find an appropriate solution as the underlying design problem has been quite
complex and can be followed-after only with huge simulation effort that is very costly.
- Quantified reduction in design-time from 2 weeks to 3 hours (MO7.3.13, as part of
THERMINATOR project objective #5: demonstration of the usability and
effectiveness of the new design methodologies and tools by their application to
industry-strength design cases made available by some of the project partners)
The above picture shows the device sensitivities within the complex sizing problem of
the level shifter in 28nm STMicroelectronics technology [27]. The systematic
automatic approach has the advantage that the optimization algorithm are much more
efficiency to find the best result while taking into account all constraints compared to
THERMINATOR FP7 ICT – 2009.3.2 - 28603 D7.3.1
Page 69
manual sizing that has to follow a step-by-step and trial-and-error strategy that can last
very long time.
7.3 Conclusions The THERMINATOR project has been very fruitful for MUN especially based on the
excellent collaboration with the project partners to enhance the methods and tools for use in
industrial design environments. It has been possible to fulfil a comprehensive, chip-level
assessment of the impact of thermal variations on the behaviour of the heterogeneous
electronic systems. For this reason it has been crucial to include into the analysis the
contribution of analog and RF blocks. MUN provided simulation-based methodologies to
analyze and optimize analog/RF circuits at transistor level. Based on the examination results
of the project partners ST and NXP, MUN has extended these methodologies in close
cooperation with ST and NXP but also with EDA Partner SNPS. The industry partners have
integrated these methodologies into their design environments and applied the circuit analysis
methodologies provided by MUN to carry out sensitivity analysis of their automotive and
identification system designs especially of the analog, power and RF components for their
ICs.
MUN will exploit the achieved results as described in the THERMINATOR dissemination
and exploitation plan.
THERMINATOR FP7 ICT – 2009.3.2 - 28603 D7.3.1
Page 70
8 Conclusions
T7.3 is the final task of WP7, in which the EDA tools developed in WP3, 4, and 6 are
evaluated. These design tools and flows have been developed for different application fields,
from digital to RF. Each of these application domains has of course specific requirements for
their tools, because the size and the functionality of the circuits are different. The evaluation
of the EDA prototype tools has concentrated on two aspects. On one hand, the accuracy, ease
of use in terms of speed, and integration within existing design flows has been addressed. In
particular, this type of evaluation has been done for design flows based on new concepts
and/or tools for complex systems. Examples include the logi-thermal simulation tools of
BME, tools for identification applications of NXP-D, and the thermal models from IMEC for
complex three-dimensionally integrated chips. On the other hand, the effectiveness of the
developed tools has been demonstrated via improvement of the designs. Examples include
control of thermally induced delays, reduction in clock screw by POLITO, smaller threshold
voltage shifts by thermal- and aging-aware optimization by OFFIS, and yield improvement
and design time reduction by MUN. The demonstration has been focused on examples
provided by ST and NXP-D, i.e. the major industrial partners in T7.3. This way of working
has resulted in direct benefits for the European semiconductor industry with respect to the
competition. The advantages for EDA vendors are that they have been able to test their tools
on industrial-strength design cases, and that they have a better understanding of the challenges
that the semiconductor industry is facing. Finally, the results of T7.3 have been published at
many conferences, demonstrating the novelty of the work described in this deliverable.
THERMINATOR FP7 ICT – 2009.3.2 - 28603 D7.3.1
Page 71
9 Measurable objectives The Therminator project objectives are
1. New modelling and simulation capabilities to support accurate circuit thermal analysis
and simulation
2. Innovative thermal-aware design techniques, methodologies and prototype tools for
controlling, compensating and managing thermally-induced effects on parameters
such as timing, (dynamic and leakage) power, reliability and yield
3. Demonstration of the accuracy and ease of integration within existing design flows of
the new models by validation against measured data obtained on ad-hoc silicon
structures
4. Demonstration of the applicability and effectiveness of the new design solutions
through manufacturing of test-chips featuring leading-edge silicon technology, as
available from some of the project partners
5. Demonstration of the usability and effectiveness of the new design methodologies and
tools by their application to industry-strength design cases made available by some of
the project partners
In order to quantify the output of the evaluation work presented in this deliverable,
measurable objectives have been defined. In the table below, all the measurable objectives of
T7.3 are summarized. In this table, it is also shown to which test case of WP1, and to which
Therminator’s project objective the measurable objectives are related. Since T7.3 focuses on
evaluation of EDA-Tools and test-chips, the majority of the measurable objectives are related
to Therminator’s project objectives 3-5.
Area Measurable
objective
Innovation Metric Quantification Test case Project
objective
Evaluation
of
thermal-
aware
design
prototype
tools
MO7.3.1 IMEC Evaluate one
integrated tool allowing
early system floor planning
and exploration of many
system and physical options
and their impact on thermal
behaviour
Accuracy versus
measurement data,
error less than 5C
[IMEC]
2 layer
DRAM-on-
Logic chip
stack inside
FC-BGA
package
Test-case 6
4
MO7.3.2 Thermal-aware Design
prototype tools ease of use
Set-up time: 2 days for
very complex real-
world examples; few
hours for typical high-
level designs [IMEC]
2 layer
DRAM-on-
Logic chip
stack inside
FC-BGA
package
Test-case 6
3
MO7.3.3 Speed of the complete
design flow for thermal-
aware Design prototype
tools
Flow speed: couple of
hours for layout, few
minutes for thermal
[IMEC]
2 layer
DRAM-on-
Logic chip
stack inside
FC-BGA
package
Test-case 6
4
MO7.3.4 Thermal-aware Design prototype Accuracy of prototype 2 layer 3
THERMINATOR FP7 ICT – 2009.3.2 - 28603 D7.3.1
Page 72
tools accuracy tools 15% compared to
final layout generation
DRAM-on-
Logic chip
stack inside
FC-BGA
package
Test-case 6
MO7.3.5 Development of a
temperature-insensitive
multi-Vth
synthesis methodology.
Circuit’s temperature
insensitivity.
[POLITO]
Test-case 4 2
MO7.3.6 Multi-Vth assignment
algorithms for
temperature insensitive
synthesis flow
Comparison with
circuits obtained by
traditional synthesis
methodologies
Temperature induced
Delay variation kept
below the variation
reported in the
simulation results in the
range 25-125°C (less
than 5%)
[POLITO]
Test-case 4 3
MO7.3.7 Clock skew minimization
thanks to mechanisms
compensating for thermally-
induced delays
Clock skew comparison
prior
and post optimization
15% of clock skew
reduction in the range
25-125°C
[POLITO]
Test-case 4 2
MO7.3.8 Support aging simulation
with Synopsys tools
Modifications of aging
model required for
aging simulation with
Synopsys tools (yes/no)
[SNPS-AM]
Test-case 1,
2
3
MO7.3.9 Evaluation of a high-level
thermal and aging aware
estimation and optimization
flow
Comparison of the
thermal estimations
against the estimations
done by well known
low-level tools.
(yes/no)
Quantification of the
improvements reached
through the thermal and
aging aware
optimizations (yes/no)
[OFFIS]
Test-cases
4, 5
3
MO7.3.10 Logithermal simulations on
a sample circuit of ST using
two different simulation
methods
Maximal temperature
values, maximum
temperature locations,
and timing shifts of
both simulation
methods to agree
within 5% [BME]
Sample
circuit made
in
collaboratio
n with
POLITO
with ST
STD
1
THERMINATOR FP7 ICT – 2009.3.2 - 28603 D7.3.1
Page 73
CELLS
library*
MO7.3.11 Create analog test chip and
evaluate measurement
results [NXP-D]
Compare measurement
data from analog test
chip with simulation
results (yes/no) [MUN]
Test Chip
NXP
5
MO7.3.12 Demonstration of the
applicability and
effectiveness of the new
design solutions through
manufacturing of test-chips
featuring leading-edge
silicon technology, as
available from some of the
project partners
- Integration of
MunEDA
WiCkeD tools
within NXP
design
environment
- Yield
improvement
by 90%
POR power
on reset
(40nm
CMOS
Technology)
[NXP,
MunEDA]
4
MO7.3.13 Demonstration of the
usability and effectiveness of
the new design
methodologies and tools by
their application to industry-
strength design cases made
available by some of the
project partners:
- Integration of
MunEDA
WiCkeD tools
within ST
design
environment
- Quantified
reduction in
design-time
from 2 weeks
to 3 hours
2.133GHz
Level
Shifter
(28nm
CMOS
Technology)
[STM,
MunEDA]
5
MO7.3.14 Create test chip with
temperature sensors to
evaluate 2D SOCS, and
evaluate measurement
results [LETI]
Provide floor plan, and
data needed for
simulations of test chip
in March 2012 [LETI]
Compare measurements
data against
simulations(yes/no)
[OFFIS]
Test chip
LETI
Test-case 5
3
*The test case has been used to have common circuit in the collaboration between POLITO
and BME. It was originally also planned that BME evaluate the simulator on testcase 4
provided by ST. All the necessary actions have been made: NDA and setup of a dedicated
environment in ST Catania to guest BME researchers. Unfortunately some procedures took
longer than expected and the opportunity to have BME researchers at ST premises would have
been in January 2013, too late for allowing the BME organisation to ask for the trip
reimbursement.
The substitute of testcase 4 is still an industry like strength testcase, because of its complexity
it cannot be classified as an academic testcase , and has the added value that it
Has fostered the collaboration between BME, POLITO. POLITO has also validated its tools on
the testcase4 and the results are coherent with those obtained on the sample testcase, this lead
us to be very positive also on the BME logithermal simulator.
THERMINATOR FP7 ICT – 2009.3.2 - 28603 D7.3.1
Page 74
10 Publications and presentations
A. Burenkov, J. Lorenz, “Self-heating effects in nano-scaled MOSFETs and thermal aware
compact models”, THERMINIC, 17th International Workshop on Thermal investigations of
ICs and Systems, Paris, 27-29 Sept. 2011, EDA Publishing, pp. 17-18.
FHG, “Tutorial on TCAD Simulations of Nano-CMOS Including Self-Heating” was held at
IMC in September 2011 at IMEC in March 2012
F. Beneventi, A. Bartolini, L. Benini, “Static Thermal Model Learning for High-Performance
Multicore Servers”, Computer Communications and Networks (ICCCN), 2011 Proceedings of
20th International Conference on, Issue Date: July 31 2011-Aug. 4 2011, On page(s): 1 – 6,
Location: Lahaina, HI, USA, ISSN: 1095-2055 Print ISBN: 978-1-4577-0637-0, 2011 IEEE
A. Bartolini, M. Cacciari, A. Tilli, L. Benini, “A distributed and self-calibrating model-
predictive controller for energy and thermal management of high-performance multicores”,
Design, Automation & Test in Europe Conference & Exhibition (DATE), 2011, Grenoble,
France 14-18 March 2011, On page(s): 1 – 6, ISSN : 1530-1591, Print ISBN: 978-1-61284-
208-0. IEEE Press 2011
A. Bartolini, M. Sadri, F. Beneventi, M. Cacciari, A. Tilli, L. Benini, “SCC Thermal Sensor
Characterization and Calibration”, 3rd Many-core Applications Research Community
(MARC) Symposium, Ettlingen, Germany, Issue Date : 5-6 June 2011, On page(s): 7-12, KIT
Scientific Publishing 2011, ISBN 978-3-86644-717-2
A. Bartolini, M. Sadri, F. Beneventi, M. Cacciari, A. Tilli, L. Benini, “A System Level
Approach to Multi-core Thermal Sensors Calibration”, Integrated Circuit and System Design.
Power and Timing Modeling, Optimization, and Simulation, Editor: Ayala J., García-Cámara
B., Prieto M., Ruggiero M., Sicard G., Book Series Title: Lecture Notes in Computer Science,
Page(s): 22- 31, Volume: 6951, Copyright: 2011, Publisher: Springer Berlin / Heidelberg,
ISBN: 978-3-642-24153-6
M. Sadri, A. Bartolini, L. Benini, “Single-Chip Cloud Computer thermal model”, Thermal
Investigations of ICs and Systems (THERMINIC), 2011 17th International Workshop on,
Paris, France 27-29 Sept. 2011, On page(s): 1 – 6, Print ISBN: 978-1-4577-0778-0, IEEE
Press 2011
A. Sassone, A. Calimera, A. Macii, E. Macii, M. Poncino, R. Goldman, V. Melikyan, E.
Babayan, S. Rinaudo, “Investigating the Effects of Inverted Temperature Dependence (ITD)
on Clock Distribution Networks”, Proceedings of Design, Automation & Test in Europe
(DATE’12) conference, Dresden, Germany, 2012.-P.165-167
Wei Liu, V. Tenace, A. Calimera, A. Macii, E. Macii, M. Poncino, “NBTI Effects on Tree-
Like Clock Distribution Networks”, GLSVLSI-12: accepted for publication
A. Sassone, W. Liu, A. Calimera, A. Macii, E. Macii, M. Poncino, “Modeling of thermally
induced skew variations in clock distribution network”, THERMINIC-11: IEEE Thermal
Investigations of ICs and Systems, 2011.
L. M. de Lima Silva, A. Calimera, A. Macii, E. Macii, M. Poncino, "Power Efficient
Variability Compensation Through Clustered Tunable Power-Gating", IEEE Journal on
Emerging and Selected Topics in Circuits and Systems, vol.1, no.3, Sept. 2011
M. Caldera, A. Calimera, A. Macii, E. Macii, M. Poncino , “Minimizing temperature
sensitivity of dual-Vt CMOS circuits using Simulated-Annealing on ISING-like models”,
THERMINIC-10: IEEE Thermal Investigations of ICs and Systems, 2010.
THERMINATOR FP7 ICT – 2009.3.2 - 28603 D7.3.1
Page 75
A. Calimera, A. Macii, E. Macii, S. Rinaudo, M. Poncino, “THERMINATOR: Modeling,
control and management of thermal effects in electronic circuits of the future”, THERMINIC-
10: IEEE Thermal Investigations of ICs and Systems, 2010.
A. Calimera,R. Bahar, E. Macii, M. Poncino, “Temperature-Insensitive Dual-Vth Synthesis
for Nanometer CMOS Technologies Under Inverse Temperature Dependence”, IEEE
Transactions on Very Large Scale Integration (VLSI) Systems, vol. 18, no. 11, 2010
A. Timár, M. Rencz, “Studying the influence of chip temperatures on timing integrity”, 12th
IEEE Latin-American Test Workshop. Brazil, 27-30. 03. 2011.
A Timár, M. Rencz, “Studying the Influence of Chip Temperatures on Timing Integrity Using
Improved Power Modeling”, JOURNAL OF LOW POWER ELECTRONICS 7: pp. 1-10.
(2011)
A Timár, Gy. Bognár, M. Rencz, “Improved power modeling in logi-thermal simulation”,
17th International Workshop on Thermal investigations of ICs and Systems. Paris, Paris,
France, 27-29. 09. 2011.
Gergely Nagy, András Poppe, “A Novel Simulation Environment Enabling Multilevel Power
Estimation of Digital Systems”, Proceedings of the 17th International Workshop on THERMal
INvestigation of ICs and Systems
G. Gangemi, “FP7-Funding Projects THERMINATOR, SMAC, MANON Overview”, MUGM
MunEDA User Group Meeting 2012, October 2012, Munich, Germany
Z. Abbas, M. Olivieri, A. Ripp, G. Strube, M. Yakupov, “Yield optimization for low power
current controlled current conveyor”, SBCCI 2012, September 2012, Brasília, Brazil
Colaci, G. Boarin, A. Roggero, L. Civardi, C. Roma, A. Ripp, M. Pronath, G. Strube:
“Systematic Analysis & Optimization of Analog/Mixed-Signal Circuits Balancing Accuracy
and Design Time”, SBCCI 2011 Brazil, September 2011, Sao Paolo, Brazil
N. Seller, “Optimization of a 2.133GHz level shifter in 28nm”, MUGM MunEDA User Group
Meeting 2011, Munich, Germany
U. Trautner, M. Pronath, “Synopsys Custom and Analog Mixed-Signal Overview & MunEDA
WiCkeD Integration”, MUGM MunEDA User Group Meeting 2010, Munich, Germany
S. Coparale, R. Rvatti, G. Setti, "Representation of PWM signals through time warping",
Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on ,
vol., no., pp.3589-3592, 25-30 March 2012.
Reef Eilers, Malte Metzdorf, Sven Rosinger, Domenik Helms, Wolfgang Nebel, “Phase space
based NBTI model”, Proc. of International Workshop on Power and Timing Modeling,
Optimization and Simulation (PATMOS), 2012
Sven Rosinger, Malte Metzdorf, Domenik Helms, Wolfgang Nebel, “Behavioral-Level
Thermal- and Aging-Estimation Flow”, Proc. Of 12th Latin-American Test Workshop (LATW),
p. 1-6, 2011
V. Melikyan, A. Gevorgyan, A. Baghdasaryan, H. Melikyan, “Thermal Via’s Placement
Zones Identifying Using Voronoi Diagrams”, Proceedings of the 32th International Scientific
Conference Electronics and Nanotechnology (ELNANO 2012), Kiev, Ukraine, 2012.-P.77-79
V. Melikyan, Babayan E., Harutyunyan A., Melikyan N., Zargaryan G., “Method of
Reducing Thermal Dependence of Timing Delays of Digital Integrated Circuits”, Proceedings
of 5th All-Russian scientific-technical conference “Problems of Developing Advanced Micro-
and Nanoelectronic Systems -2012” (MES-2012), Moscow, Russia, 2012. –P409-412
THERMINATOR FP7 ICT – 2009.3.2 - 28603 D7.3.1
Page 76
V.Sh. Melikyan, A.A. Durgaryan, A.H. Balabanyan, E.H. Babayan, M. Stanojlovic, A.G.
Harutyunyan, “Process-voltage-temperature Variation Detection and Cancellation Using On-
Chip Phase-Locked Loop”, Proceedings of the 56th Electronics, Telecommunications,
Computers, Automatic Control and Nuclear Engineering (ETRAN) Conference, Zlatibor,
Serbia, 2012.-P.EL1.2-1-4
P. Bibilo, A. Solovev, V. Melikyan, A. Harutyunyan, E. Babayan, “Estimation of Power
Consumption of Digital CMOS Circuits Based on Logic Simulation of their Structural
Descriptions”, Proceedings of Engineering Academy of Armenia, Yerevan, Armenia, 2012. –
P.600-610
V. Melikyan, E. Babayan, A. Harutyunyan, “Pattern-Based Approach to Current Density
Verification”, Proceedings of the 4th Small Systems Simulation Symposium 2012, Nis,
Serbia, 2012.-P.58-61
V. Melikyan, A. Balabanyan, E. Babayan, A. Durgaryan, “Decreasing of Frequency
Variation in High-Speed Ring Oscillator using Bandgap Reference”, Proceedings of the 32th
International Scientific Conference Electronics and Nanotechnology (ELNANO 2012), Kiev,
Ukraine, 2012.-P.79-81
R. Roldman, K. Bartleson, T. Wood, V. Melikyan, E. Babayan, “Synopsys’ Low Power
Design Educational Platform”, Proceedings of the 9th European Workshop on
Microelectronics Education (EWME 2012), Grenoble, France, 2012.-P.23-26
V. Melikyan, E. Babayan, A. Harutyunyan, “Pattern-Based Approach to Current Density
Verification”,/ Electronics, Faculty of Electrical Engineering, University of Banja Luka,
Volume 16, Number 1, Serbia, 2012.-P.77-82
V. Melikyan, A. Harutyunyan, “Modeling of IC Interconnects and Power Rails”, Chartarapet,
Yerevan, 2012 (in Armenian)
V. Melikyan, A. Durgaryan, A. Khachatryan, H. Manukyan, E. Musayelyan, “Self-
compensating Low Noise Low Power PLL Design”, Proceedings of IEEE East-West Design &
Test Symposium (EWDTS’12), Kharkov, Ukraine, 2012.-P.29-33
V.Sh. Melikyan, S.V. Gavrilov, V.K. Aharonyan, N.K. Aslanyan, A.S. Hovhannisyan, “On-
die CMOS Termination Resistor for USB Transmitter”, RAs National Academy of Science
and SEUA, Yerevan, RA, Vol. 65, N 3, Yerevan, 2012.-P. 295-304
P. Magnone, C. Fiegna, G. Greco, G. Bazzano, E. Sangiorgi, S. Rinaudo, “Modeling of
Thermal Network in Silicon Power MOSFETs”, Ultimate Integration on Silicon (ULIS), 14-16
Marzo 2011, Cork, Ireland.
P. Magnone, C, Fiegna, G. Greco, G. Bazzano, S. Rinaudo, E. Sangiorgi, “Numerical
Simulation and Modeling of Thermal Transient in Silicon Power Devices”, Ultimate
Integration on Silicon (ULIS), pp. 153-156, 6-7 March 2012,. Grenoble (France).
P. Magnone, C, Fiegna, G. Greco, G. Bazzano, S. Rinaudo, E. Sangiorgi, “Numerical
Simulation and Modeling of Thermal Transient in Silicon Power Devices”, ELSEVIER Solid-
State Electronics, in press.
THERMINATOR FP7 ICT – 2009.3.2 - 28603 D7.3.1
Page 77
H. Oprins, V. Cherman, B. Vandevelde, M. Stucchi, G. Van der Plas, P. Marchal, and E.
Beyne, “Steady state and transient thermal analysis of hot spots in 3D stacked ICs using
dedicated test chips”, 27th Annual IEEE Thermal Measurement, Modeling and Management
Symposium (SEMI-Therm), March 20-24, 2011, 131-137.
H. Oprins, V. Cherman, B. Vandevelde, C. Torregiani, M. Stucchi, G. Van der Plas, P.
Marchal, and E. Beyne, “Characterization of the Thermal Impact of Cu-Cu bonds achieved
using TSVs on hot spot dissipation in 3D stacked ICs”, Proceedings of ECTC, May 30- June 1,
2011, 861-868.
H. Oprins, V. Cherman, “Numerical and experimental characterization of hot spot dissipation
in 3D stacks”, Electronics Cooling Magazine, Vol. 18(2), 2012, pp. 18-23.
D. Milosevic, H. Oprins, J. Ryckaert, P. Marchal, G. Van der Plas, “DRAM-on-logic Stack –
Calibrated Thermal and Mechanical Models Integrated into A Design Flow”, IEEE Custom
Integrated Circuits Conference (CICC), September 18-21 2011, San Jose, California, invited.
H .Oprins, V. Cherman, B. Vandevelde, G. Van der Plas, P. Marchal, and E. Beyne,
“Numerical and experimental characterization of the thermal behavior of a packaged DRAM-
on-logic stack”, 62nd
Electronic Components and Technology Conference - ECTC, 2012, pp.
1081-1088.
Gergely Nagy, László Pohl, András Timár, András Poppe, “Yield enhancement by logi-
thermal simulation based testing”, Proceedings of the 18th International Workshop on
THERMal INvestigation of ICs and Systems (THERMINIC'12). Budapest, Hungary,
2012.09.25-2012.09.27. pp. 196-199. Paper 42.
Gergely Nagy, András Timár, Albin Szalai, Márta Rencz, András Poppe, “New simulation
approaches supporting temperature-aware design of digital ICs”, Proceedings of the 28th
IEEE Semiconductor Thermal Measurement and Management Symposium (SEMI-
THERM'12). San Jose, USA, 2012.03.18-2012.03.22. pp. 313-318.(ISBN: 978-1-4673-1109-
0)
A Timar, M. Rencz, “Temperature dependent timing in standard cell designs”, Proceedings of
the 18th International Workshop on THERMal INvestigation. Budapest, Hungary,
2012.09.25-2012.09.27. pp. 179-183.
A Timar, M. Rencz, “Real-time heating and power characterization of cells in standard cell
designs”, MICROELECTRONICS JOURNAL (2012)IF: [0.919*]
A Timar, M. Rencz, “Acquiring real-time heating of cells in standard cell designs”,
Proceedings of the 13th IEEE Latin-American Test Workshop (LATW'12). Quito, Ecuador,
2012.04.10-2012.04.13. pp. 121-125.
Gergely Nagy, András Poppe, “Simulation Framework for Multilevel Power Estimation and
Timing Analysis of Digital Systems Allowing the Consideration of Thermal Effects”,
Proceedings of the 13th IEEE Latin-American Test Workshop (LATW'12). Quito, Ecuador,
2012.04.10-2012.04.13. pp. 1-5.