Training 211

download Training 211

of 31

Transcript of Training 211

  • 8/10/2019 Training 211

    1/31

    Institut fur Integrierte SystemeIntegrated Systems Laboratory

    Department of Information Technology and Electrical Engineering

    VLSI II: Entwurf von hochintegrierten Schaltungen

    227-0147-00

    Training 2

    Energy Efciency and Power DistributionProf. Dr. H. Kaeslin

    Dr. N. Felber

    SVN Rev.: 1025Last Changed: 2013-11-05

    Reminder:

    With the execution of this training you declare that you understand and accept the regulations aboutusing CAE/CAD software installations at the ETH Zurich. These regulations can be read anytime at

    http://dz.ee.ethz.ch/regulations/index.en.html .

    http://dz.ee.ethz.ch/regulations/index.en.htmlhttp://dz.ee.ethz.ch/regulations/index.en.html
  • 8/10/2019 Training 211

    2/31

    1 What you will learn

    In previous trainings, you have learned how to carry out a digital circuit design that meets giventiming and area constraints. This exercise will extend your knowledge to power considerations. Morespecically, we will show you:

    How to determine node activity gures of adequate accuracy. How to estimate a circuits power dissipation from node activities. How to locate excessive voltage losses in power and ground networks. How to detect excessive current densities in power and ground networks. How to improve power and ground distribution networks where necessary. A few ideas for improving a circuits overall energy efciency (optional).

    You will be assisted by M ENTOR G RAPHICS MODELSIM (for circuit simulation) and by C ADENCE S OCE NCOUNTER (for place&route, preparation of power and ground nets, IR drop analysis, and currentdensity estimations).

    2 Introduction

    2.1 Theoretical background

    As explained in section 9.1 of our textbook ,1 four phenomena dissipate energy in static CMOS cir-cuits:

    Phenomenon Results in dissipation NatureCharging and discharging of capacitive loads while node voltages dynamicCrossover currents are in transitDriving of resistive loads (if any) at all times, even after staticLeakage currents circuit has settled

    We will not be concerned with static power in this exercise as we limit ourselves to pure CMOS circuitswith no resistive loads and because leakage is almost negligible due to the conservative fabricationprocess being studied. For the needs of EDA tools the dynamic dissipation can be attributed to librarycells as follows.

    Internal power P int is the power dissipated inside a cell for the charging and discharging of internalcapacitances and due to crossover currents.

    Switching power P ext is the power dissipated inside a cell for charging and discharging the loadcapacitance connected to the cells output. That external load consists of the input capacitancesof all cells being driven plus the parasitic capacitances of the wires (aka interconnect).

    The total power dissipation P tot related to a cell can now be expressed as

    P tot = P stat + P dyn P dyn = P int + P ext (1)

    Calculating P ext is straightforwardP ext = f cp

    2

    C ext U 2

    dd (2)

    1 Hubert Kaeslin, Digital Integrated Circuit Design, from VLSI Architectures to CMOS Fabrication, CambridgeUniversity Press, 2008.

    2

  • 8/10/2019 Training 211

    3/31

    where denotes the switching activity of the cells output node, C ext the load capacitance attached,and U dd the supply voltage. f cp stands for the computation rate, i.e. the inverse of the computa-tion period. 2 P int gets calculated in much the same way, yet coming up with accurate activity andcapacitance gures requires detailed information about the inner circuitry and layout of each cell.

    A power estimator essentially is a piece of software that sums up the various contributions over anentire circuit. Provided the same clock and voltage get used everywhere, this amounts to

    P ckt =M

    m =1P int m +

    N

    n =1P ext n f cp U

    2dd (

    M

    m =1

    m2

    C intm +N

    n =1

    n2

    C ext n ) (3)

    Index m = 1 ...M refers to the cells instantiated in the circuit and n = 1 ...N to the nets of interconnectrunning in between. For each cell, an internal activity gure m is estimated from the node activities atthe input(s). Note that C intm is not meant to correspond to any capacitance physically present in thecircuit. Rather, it is just a numerical parameter adjusted for each cell during library characterizationsuch as to model its internal dissipation. 3

    Equation ( 3) tells us a few important things about power dissipation and power estimation:

    Realistic switching activity gures are crucial, they can be obtained from gate-level simulations. Realistic capacitance gures are important, they are best extracted from layout data. Dynamic power grows with U dd squared. The power vs. speed dilemma is discussed in the

    textbook.

    2.2 Manual activity and power calculations for warm up

    To get a feeling for the process, let us estimate the power consumption of the toy example of Figure 1,a simple arithmetic processing unit that accepts two unsigned numbers of 4 bits each ( InputAxDIand InputBxDI ) and that delivers either their sum or their product at the output ( OutputxDO ) as an8 bit word.

    Figure 1: A small arithmetic unit used for hand calculations.

    A signal AddxSI decides which operation result gets assigned to the output according to the followingrule (in pseudo-VHDL):

    2 For standard single-edge-triggered one-phase clocking, computation period and clock cycle are the same f cp = f clk .Double-edge triggered circuits, in contrast, offer two computation periods per clock cycle so that f cp = 2 f clk .

    3 Incidentally observe that any attempt to capture the internal dissipation of a cell with a single quantity is not exactlyaccurate as the energy dissipated when one input toggles may also depend on what is happening at other inputs at thesame time. And in the occurrence of a bistable, the current state is likely to matter too. While industrial standardcell models typically cover all possible situations, we shall not be concerned with such details here.

    3

  • 8/10/2019 Training 211

    4/31

    if AddxSI = 1then OutputxDO

  • 8/10/2019 Training 211

    5/31

    Student Task 1:

    1. Output waveform: Collecting all 8 bits into one signature, draw the waveform and nu-meric values of OutputxDO in Figure 2.

    2. Switching activities: Assuming single-edge-triggered one-phase clocking, complete thenode activity column in Table 1.

    3. Power spent for switching of nets: You now have all the facts required to calculate theswitching powers associated with the various nets according to ( 2). Fill in the numbers intothe last column.

    4. Power dissipated within circuit blocks: Now consider Table 2. What is the main sinkof power among the blocks listed there and how much does it dissipate?

    5. Consolidated dissipation: Compiling all contributions from Table 1 and Table 2, howmuch power does the circuit dissipate internally, that is, with no load attached?

    6. Overall dissipation: Suppose each output drives a load of 1 pF. What is the total powerconsumption now?

    3 The test vehicle used for computerized calculations

    3.1 Architectural overview

    Figure 3 illustrates the circuit serving as a test case for this exercise. The circuit is entirely digitaland dominated by two nite impulse response (FIR) lters of identical structures that differ in theircoefcients. Each lter is fully parallel. At the output, an adder combines the high-pass and low-pass responses. A 2-bit selection input signal can be used to only output the high-pass componentor the low-pass component. Additional ags ( ModexSI / TestModexTI ) enable/disable the lterscompletely.

    OUTSELECTxSI

    DATAOUTxDO16

    DATAINxDI16

    0

    1

    MODExSI

    0

    1

    0

    0

    1

    0

    TESTMODExTI

    Figure 3: High-level diagram of test vehicle used in this exercise (simplied).

    In the exercise, node activity gures will be determined by way of gate-level simulations. For compar-ison, let us now make a quick back-of-the-envelope calculation from data available without detailedsimulation. The test vehicle is believed to have the characteristics below.

    5

  • 8/10/2019 Training 211

    6/31

    Clocking discipline single-edge-triggered one-phaseClock frequency f clk [MHz] 50Supply voltage U dd [V] 1.8Number of interconnect nets N 5 500Avg. load capacitance C ext n [fF] 30.0Avg. switching activity n [1] 0.2

    Number of cell instances M 3 900Avg. equiv. capacitance C int m [fF] 25.0Avg. internal activity m [1] same as n

    Student Task 2: Plug in these numbers into (3) and put down the result here: ....

    3.2 Install test vehicle and start cockpit

    We provide you with a nished test vehicle with nal routing completed. To install it do

    Student Task 3:

    1. Open a Unix shell window.2. Install the test vehicle:

    sh > /home/vlsi2/t2/install_t2_partA

    3. Start the cockpit:

    sh > cd training2_partAsh> icdesign umcL180 &

    The design views now available include

    1. Source code (available at sourcecode/..)2. C ADENCE S OC E NCOUNTER database3. Final netlist4. .sdf le for back annotation

    3.3 Generating stimuli

    For running meaningful power simulations we will need the right input stimuli. We provide a set ofstimuli in the simvectors directory (input.stim). During this training, you will need to modify the stimuliles to estimate power in different operating modes. As seen in Figure 3, the signal OutSelectxSIis used to control which lter block is added to the output. Furthermore, there is a ModexSI anda TestModexTI signal that controls how the internal registers are enabled. These signals can beused to congure the test vehicle in a variety of modes. To change the operating mode, you need toadapt the number in the rst line of the stimuli le simvectors/input.stim , since it encodes theoperating mode of the design as integer value. See the following table for the operating modes wewill use in this exercise.

    The subsequent integer values in the stimuli le correspond to the input data. Next let us give sometechnical comments on the process of automated power estimation.

    6

  • 8/10/2019 Training 211

    7/31

    TestModexTI ModexSI OutSelectxSI(1) OutSelectxSI(0) int valueEnable all: 1 1 1 1 15

    Disable HP: 1 1 1 0 14Clock Gate: 0 0 1 0 2

    4 Power Estimation Flow

    We are going to use the same CAD/CAE tools your are familiar with from previous exercises and/orfrom your semester project in VLSI design. During earlier design phases, ModelSim had served tofunctionally verify RTL source code. The focus now shifts to collecting the respective toggle countsof electrical nodes present in a circuit netlist as a prerequisite for power calculations.

    In search of accuracy, we are going to do a postlayout simulation that includes the various lay-out parasitics that had come into existence once placement and routing were completed. For thispurpose, the netlist previously written out by C ADENCE S OC E NCOUNTER in Verilog format iscompiled using vlog instead of vcom (have a look into the le modelsim/compile gate.csh in orderto observe the compilation of the verilog netlist). Since ModelSim is able to perform mixed-languagesimulations, we can use any VHDL testbench (almost the same as the testbench for rtl simulation,only with some minor adaptations) to carry out this particular postlayout simulation.

    The next point that merits your attention is the selection of the stimuli. As power dissipation is data-dependent, it is important to make a proper choice of the stimuli vectors to get meaningful results.The node activities used for power estimation must be statistically representative for the targetapplication which implies that the stimuli will not necessarily be the same as those employed duringfunctional verication.

    What follows is a brief overview of the le types involved in annotating a netlist.

    SDF back annotation: The SDF (Standard Delay Format) le contains the information about the

    interconnect and cell delays in a design. It can be exported from C ADENCE S OC E NCOUNTERto transmit these delay data to a simulator (and/or to a static timing analyzer). This le isrequired for any type of post-layout simulation, irrespective of whether you are interested incalculating power consumption or in gate-level functional verication.

    VCD back annotation: The VCD (Value Change Dump) le logs all signal changes (i.e. the eventsin VHDL terminology) that occur during a simulation run. The information is essentially the sameas in the ModelSim wave window but in textual form. File size thus not only grows with designcomplexity but also with the length of a simulation run. A VCD is required for power analysiswith C ADENCE S OC E NCOUNTER . For obvious reasons, it is always possible to extract theaverage activity for each circuit node from a VCD le but not the other way round.

    As a welcome observation, we note that no parasitics exchange le (such as SPEF or RSPF) isrequired to transport estimated capacitance values from the place&route tool to the power calculationtool as both functions are assumed by C ADENCE S O C E NCOUNTER in the current design ow.

    Side note: Our experience suggests that, while internal dissipation is well characterized in our tech-nology (umc L180), leakage power is often by far overestimated.

    7

  • 8/10/2019 Training 211

    8/31

    5 SoC Encounter Power Analysis

    In this section we will perform a power analysis of our nal chip using different sets of toggle activities.C ADENCE S OC E NCOUNTER is able to perform a power analysis based on statistical estimates of theswitching activity. For more accuracy it can also process value change dump (VCD) les generatedas a result of post-layout simulations. Throughout the whole power analysis exercise you will have toupdate the following table continously.

    We will rst start C ADENCE S O C E NCOUNTER and load the saved test vehicle.

    Student Task 4:

    Start C ADENCE S O C E NCOUNTER .

    In the C ADENCE S OC E NCOUNTER GUI and select the menu Design Restore \Design SoCE... and choose chip lter.enc from the save directory. Among the viewson the top right hand, select the last one, the P HYSICAL VIEW .

    Power Analysis Method Total Power [mW] Dominating Instances Power [mW]

    Global Activity

    Input Activity

    VCD-Based Activity

    Enable all

    Enable all (zero inputs)Disable HPClock Gate

    5.1 Statistical Power Analysis

    As you know, dynamic power consumption directly depends on the switching activity. C ADENCE S OCE NCOUNTER provides some simple approaches that estimates the switching activity of the circuit,without running costly simulations. These methods are useful to quickly get a rst measure of thechips power consumption.

    Global activity

    C ADENCE S O C E NCOUNTER allows to automatically set a default toggle-activity value to all internalnodes. Throughout the power analysis each internal node of your chip will toggle with this valueduring each clock cycle.

    Student Task 5: In order to start this analysis, select Power Power AnalysisRun Power Analysis... a . In this form, select the folder reports/power as the resultsdirectory (see Figure 4).For the moment leave the clock frequency at 100 Mhz. Then step intothe Activity tab and write 0.2 as global activity (this means that every node will change itsstate with a probability of 0.2 per clock cycle). This is a good initial value. At this point, you are

    8

  • 8/10/2019 Training 211

    9/31

    able to start your rst statistical power analysis. Press the OK button (or A PPLY )a If the menu Run Power Analysis... is not available select rst Set Power Analysis Mode... and press

    OK with the default settings. Now the previous menu should be accessible

    The power analysis will then start and write lines similar to the following on the C ADENCE S OCE NCOUNTER shell window:

    CPE found ground net: GNDCPE found power net: VCC voltage: 1.8VINFO (POWER-1606): Found clock ClkxCI with frequency 50MHz from SDC file.

    CK: assigning clock ClkxCI to net ClkxCI

    Propagating signal activity...

    Starting Levelizing2011-Nov-07 10:29:54 (2011-Nov-07 09:29:54 GMT)2011-Nov-07 10:29:54 (2011-Nov-07 09:29:54 GMT): 5%..

    Among the messages in the console you will nd some information about the clock. Notice that theclock frequency extracted from the SDC le (50 MHz) does not match the frequency specied in theGUI. The tool will use the SDC version, so the entry in the GUI will be ignored. It is important thatyou always check the clock frequency on the console.

    Student Task 6: Adjust the clock frequency ( dominant frequency value) in the GUI so thatit matches the SDC value, and rerun the analysis.

    There will be a warning message on the console about the TIE cells not having a power model. Sincethe tie cells, do not have any switching activity (they tie the output to either logic-1 or logic-0), this isnot really a problem.

    At the end of the analysis C ADENCE S O C E NCOUNTER will write a summary on the console. Theresult will also be written to the chip lter.rpt le, in the reports/power directory. Have a look at it andtry to identify the main results of the power dissipation of your chip. How much power does the chipdissipate? What are the values that contribute most to the total power?

    Student Task 7: Talk to an assistant and discuss where most of the power is being dissipated.Calculate the total power dissipated by these instances. Update the results table at the beginning

    of section 5. Use the additional column to enter the power dissipated by the above mentionedinstances.

    Once we run the analysis again this report le will be overwritten. For this exercise we wouldlike to preserve the le, so that we can compare the results later on. Step into the encounterdirectory of this exercise and make a copy or move the le under a different name, for example:

    sh> cd ../encountersh > mv reports/power/chip_filter.rpt \sh > reports/power/chip_filter_ga.rpt

    9

  • 8/10/2019 Training 211

    10/31

    Figure 4: Run Power Analysis menu in Cadence SoC Encounter .

    Input Activity

    Setting all internal nodes to a xed activity is a gross oversimplication. Not all gates will switch withthe same probability (i.e. a 3-input AND gate switches its output much less than say a 2-input XORgate). Instead of setting a default switching value to every internal node of the chip, it is also possible

    to dene only the activity of the input pins. C ADENCE S OC E NCOUNTER is then able to propagatethis activity inside the chip.

    Student Task 8:

    To execute this new power analysis go back into the Run Power Analysis... menuand deselect the global activity option in the Activity tab. Return to the Basic tab andput the value 0.2 in the input activity eld. As before set the frequency to 50 MHz. Leavethe op activity and the clock gate activity elds empty a .

    Run the analysis and check the new report. What is the total power dissipation of the chipnow? Can you explain the difference with the previous value? Which of the two results ismore reliable?

    Update the table you started from the last time with the current results.

    As before, rename the generated report le:

    sh > mv reports/power/chip_filter.rpt \sh > reports/power/chip_filter_ia.rpt

    a The rst species the activity of outputs of sequential logic, while the latter species the average number of times that a clock-gating cell switches in a clock cycle.

    10

  • 8/10/2019 Training 211

    11/31

    5.2 Stimuli-based Power Analysis

    Using a circuit simulator to determine node activity gures

    Instead of trying to estimate the switching power (with different levels of accuracies), we can use theMENTOR GRAPHICS MODELSIM simulator to run the complete simulation and determine the exact

    switching activity. We can tell M ENTOR GRAPHICS MODELSIM to write out a Value Change Dump(VCD) le from the post-layout netlist, which will for all nodes include information that tells when thenode has switched to what value.

    Student Task 9: Step into the modelsim directory of this exercise:

    sh> cd ../modelsim

    Compile the placed & routed netlist of the nal design. Also compile the testbench and relatedles. All these compilations can be performed by executing a single shell scrip ta :

    sh > ./compile_gate.csh

    Now start the simulator with a prepared run script:

    sh > ./run_gate.csh

    a A good idea is to take a look at it! you should know what you are executing.

    To view the input and output of the lter, there is a .do le that will show the relevant signals in theWave window. On the console you could type:

    vsim> do wave.do

    Student Task 10:

    Now we are ready to generate the dump le. We will rst simulate the circuit for 100 nsso that the circuit is properly initialized (we do not want to include the activity during theinitialization phase). Then we have to tell modelsim where to store the VCD le. The lastthing is to specify the names of the nodes that we would like to monitor, i.e., the scope. Thefollowing three commands are used for this purpose:

    vsim> run 100nsvsim > vcd file ./vcd/chip_filter.vcdvsim > vcd add -r /chip_filter_tb/DUT/ *

    At this stage we can run the gate-level simulation until the end (20.142 ns). Moreover, thesimulator needs to be ushed at the end of the simulation run to make M ENTOR G RAPHICSMODELSIM write the VCD le.

    vsim> run -allvsim> vcd flush

    11

  • 8/10/2019 Training 211

    12/31

    For a real design, the simulation could take a very long time, and more importantly, could producevery large (Gigabytes !!) of VCD les. For your own designs consider writing the VCD les to the /scratch directory.

    This simulation, however, should not take that long. As you can see from the wave window, the inputsare rather random, and should produce a lot of activity.

    Stimuli-based Activity

    At this point, we have a VCD le that contains the toggle activity of the nodes in the design basedon a simulation with actual stimuli. We will now give it to C ADENCE S OC E NCOUNTER to perform astimuli-based power analysis:

    Student Task 11:

    As before, select the menu Power Power Analysis Run Power Analysis... .

    In the main tab, select VCD F ILE to perform a simulation-based power analysis. Note

    that if you dont check this option, SoC uses the values given in the other elds. Take thegenerated VCD le and enter as S COPE the top-level module chip lter tb/DUT . Note thatthere is no leading slash / in the scope. You could also specify a start and stop time forthe power simulation. Here, specify a start time of 100 ns, and a stop time of 20,000 ns(numbers are taken from the simulation). Leave the block eld empty and press A DD . Donot forget to press A DD !

    The results directory should be reports/power . See Figure 5 to get an overview of thewindows setup. Press OK..

    Figure 5: Run Power Analysis menu in Cadence SoC Encounter with vcd le.

    12

  • 8/10/2019 Training 211

    13/31

    Once the power analysis starts, it will start writing to the C ADENCE S O C E NCOUNTER shell messagesthat look similar to the last times. But we have to study them carefully. When the clock period speciedin the SDC le, and the clock period within the VCD le do not match, you will get a message thatsays (for example):

    WARNING (POWER-1784): Existing clock frequency 217.391MHz

    is being overwritten with 200.034MHz on clock rooted onnet ClkxCI from VCD file.

    In this case the VCD clock frequency will be taken. In our exercise, we do not have this problem.

    Furthermore, there will be a message similar to the following one

    With this vcd command, 4426896 value changes and 1.99e-05 secondsimulation time were counted for power consumption calculation.

    The line above summarizes how C ADENCE S O C E NCOUNTER has interpreted the VCD le. It is veryimportant to make sure that the time (expressed in seconds) is equal to what we have simulated (andhave intended). In our case, the time should be 20,000 ns - 100ns =19,900 ns, which matches theabove message. Make sure that you have the correct time.

    Filename (activity) : ../modelsim/vcd/chip_filter.vcdFound in design : 24858/26118Coverage for file : 5473/5473 = 100%

    The lines above tell us what C ADENCE S O C E NCOUNTER has extracted from the VCD le. It is

    very easy to make mistakes and use the wrong VCD le. The second line shows the total numberof switching activities, and the third line shows what percentage of the internal nodes that wereannotated.

    If you see that the message looks like the following:

    Found in design : 0/0Coverage for file : 0/5473 = 0%

    you have a problem (most probably, it is the wrong le, or the wrong scope has been speciedbecause the leading slash has been omitted). C ADENCE S O C E NCOUNTER will still perform the

    analysis regardless of the success of the annotation. Since nothing was backannotated, the resultswill just be wrong.

    Student Task 12:

    Take a look at the report chip lter.rpt in the output directory that you have selected. Howmuch power does the chip dissipate now?

    Update your results table with the latest result. Do not forget to update the power in thesecond (mystery) column.

    Compare the results with the older analyses, does your result make sense?

    13

  • 8/10/2019 Training 211

    14/31

    5.3 Effect of Switching Activity

    For the last part we have used a simulation of random input data. The stimuli le was given for theexercise, and we just used these values. The question that we should now investigate is how muchcould the stimuli le effect the overall power consumption.

    Student Task 13: To do this, we apply the stimuli producing the least activity in the design: an all zero vector.

    Generate a stimuli le with an all zero input and record a new VCD le. (You will have togure out how)

    Update the estimated power in our table.

    Present the results to an assistant.

    5.4 Architectural Changes to Save Power

    Architectural decisions can have a signicant effect on the power consumption of the circuit. Thetest circuit we use in this exercise has been designed to have several different operation modes thatcorrespond to differing architectural choices. A summary of the options can be found in Section 3.3 .

    The stimulus le in the previous section used both the high-pass and the low-pass lter componentat the same time (option Enable all ). The rst thing that we will do is to disable the high-pass-lter(option Disable HP ) and check the resulting power analysis.

    Student Task 14:

    Modify the stimulus le, simvectors/input.stim so that the option Disable HP is selected.You should only change the rst number in the stimulus le. (Make sure you are not usingthe stimuli le with zero activity!)

    Perform a power analysis using the VCD le generated from the new stimulus le.

    Report your numbers in the table. How does it compare to previous results?

    After examining the power reports, and consulting the simplied block diagram in Figure 3, you shouldnotice that there is a way to reduce the power consumption without losing functionality.

    Student Task 15:

    Describe a couple of approaches that could reduce the power consumption of the circuit. Discuss

    your solutions with an assistant.

    We will implement a solution that uses clock gating technique to disable the unused lter bank. Thetest circuit already has the control signals for this solution (see Section 3.3 ). We will use the optionClock Gate . This option will a) only enable one block, and b) use clock gating to stop the clockpropagation in the block that is not enabled.

    14

  • 8/10/2019 Training 211

    15/31

    Student Task 16:

    Modify the stimulus le, simvectors/input.stim so that the option Clock Gate is selected.You should only change the rst number in the stimulus le.

    Perform a power analysis using the VCD le generated from the new stimulus le.

    Report your numbers in the table. How does it compare to previous results?

    The change in the input le uses the ModexSI to disable the lter blocks in connection with theOutSelectxSI signal. The TestModexTI signal toggles the Clock gating circuitry: 1 - clock gateinactive, 0 - clock gate active.

    Normally architectural changes like the one we have just described can not always be performed bychanging the input stimuli (this was done in this exercise to save time). Such architectural changeswould require changes to be made to the circuit description, re-synthesis of the circuit, and a freshback end design process. Once the backend process is complete we would extract the SDF le andthe netlist, use M ENTOR G RAPHICS MODELSIM to generate a new VCD le, import this le back intoC ADENCE S O C E NCOUNTER and perform the power analysis.

    E Explain the numbers in your nal table with your assistant. E Next week, we will study the effects of IR drop and investigate the effects of different power distributionstrategies.

    15

  • 8/10/2019 Training 211

    16/31

    6 Ground bounce, supply droop and Electromigration

    In this part of the training we want to determine an adequate power routing strategy for our design.We can determine the width, layers, and the number of stripes and power rings by evaluating howmuch the power distribution is affected.

    To perform this analysis, we will use the Rail Analysis of CADENCE S O C E NCOUNTER . The railanalyser can show the current density, ground bounce and supply droop across the power lines ina chip. This allows us to evaluate whether or not the current power distribution is adequate for thedesign. In C ADENCE S O C E NCOUNTER the ground bounce and supply droop are called IR drop .

    While designing the power nets, it is important to keep in mind two different problems:

    IR drop: Since the metal exhibits a natural resistance ( R ), current ( I ) owing through such aconnection will create a voltage drop. This in turn will reduce the supply voltage of any cell,which is at the detriment of its performance (increased propagation time e. g.). Additionally,excessive supply drop and ground bounce may violate noise margins leading to a malfunc-tion of the chip 4 . Depending on process voltage temperature (PVT) variations, it immediately

    inuences the correct behavior of the chip. Electromigration 5 : Thermally agitated metal ions are washed away by owing electrons, thereby

    reducing the cross section of the metal. As a nal result an interruption of a power line canoccur, which destroys the chip. This phenomenon is dependent on the current density J .

    IR drop is a problem that has an immediate effect on the chips operation, while electromigration isa slow process, which may show its negative impact after months or even years during which the IChas been correctly working. The positive side effect of designing the supply wires sufciently widewith respect to electromigration is that fusing due to of high current densities is prevented. That is,constraints for preventing electromigration are much tighter than those for preventing fusing.

    Fortunately, C ADENCE S OC E NCOUNTER features efcient rail analysis tools that show the IR dropalong the supply lines and the current density therein graphically. Basically, there are two versions ofthe Rail Analysis available:

    Early Rail Analysis : Is a simplied analysis that can be used after oorplanning.

    Rail Analysis : A more accurate analysis, that can also take into account the power distri-bution within macros such as I/O pads and memory macros.

    In this exercise we will use the more precise one, i. e. the Rail Analysis .

    7 The Test Vehicle

    The design being used throughout this part of the training will already be very familiar to you. It isthe same design you used in Exercise 3. In order to give you an overview of it once again, Figure 7illustrates the main components.

    4 Check the VLSI book in chapter 10.35 Check the VLSI book in chapter 11.6.1 for a more detailed discussion

    16

  • 8/10/2019 Training 211

    17/31

    BCJRDDataxDI

    LLRxDO

    top

    mbcjr_chip

    DataxDI

    OutRam1xD OutRam2xD OutRam3xD

    AlphaxD

    GammaxD

    BetaxD

    BetaGammaxD

    MBCJRUnit

    betaUnit

    betaConn

    gammaAdder

    mem2 mem3mem1

    alphaMem

    dummyBetaUnit

    dummyBetaConn

    gammaAddergammaAdder

    alphaConn

    alphaUnit

    in2Gamma

    P A D S

    i_res..

    ClkxCI

    ResetxRBI

    DataxDI

    LLRxDO

    TestModexTI

    BistGammaOkxTO

    BistAlphaOkxTO

    BistEnxTI

    BistAlphaDonexTO

    BistGammaDonexTO

    LLRSelectxSI

    ModexSI

    InputMemory

    P A D S

    LLRUnit

    MBJCRFsm

    FSM

    Figure 6: The test vehicle being used.

    7.1 Installation and Preparation Work

    The test vehicle can be installed as follows:

    Student Task 17:

    1. Open a Unix shell window.2. Install the test vehicle:

    sh > /home/vlsi2/t2/install_t2_partB

    3. Start the cockpit:

    sh > cd training2_partBsh> icdesign umcL180 &

    Afterwards, load the already prepared design:

    Student Task 18:

    1. Start C ADENCE S O C E NCOUNTER .2. Navigate to Design Restore Design SoCE... and choose mbcjr chip.enc from the

    save directory.3. Change to the P HYSICAL VIEW of the design.

    17

  • 8/10/2019 Training 211

    18/31

  • 8/10/2019 Training 211

    19/31

    Figure 7: Set Rail Analysis Mode GUI in Cadence SoC Encounter .

    8.2 IR Drop Threshold

    To perform IR drop analysis, we need to x a threshold that indicates the worst acceptable voltagelevel in the design. The threshold voltage can be extracted from the databook (located in the DOCSdirectory):

    Student Task 21: Look for the operating conditions in the standard cell databook and report thefollowing values:

    Operating voltage:

    Minimal voltage:

    At rst sight, a good threshold value might be the minimal voltage of the standard cells. However, weneed to take into account that the IR drop analysis is done for VCC and ground separately, that is themaximal IR drop is the sum of VCC and ground drops.

    Student Task 22: Taking into account the considerations from before, determine an appropriatethreshold level for the IR drop on the power nets:

    19

  • 8/10/2019 Training 211

    20/31

    8.3 Rail Analysis Run

    Student Task 23:

    To run the analysis select from the menu Power Rail Analysis Run Rail Analysi

    Set VCC as the Power Net(s), set the Voltage(s) and the appropriate threshold.

    In the Power Data menu choose the Current Files switch and then select the instancecurrent le that was generated in the previous step, i.e. static VCC.ptiavg for the net VCC(located in the reports/power directory).

    CADENCE S O C E NCOUNTER does not really know how the power signal will enter the chip.You can do this by using the P OWER PAD denition. The easiest way is to use a Pad File .To create this le, choose Pad File click on the C REATE button.

    In the Edit Pad Location window set the net name under Auto Fetch Pad Locationto VCC and press A UTO FETCH . The Pad Location List is updated with all the VCC sup-plies. Now you can save this list under the name mbcjr chip VCC.pp in the save folder (use

    the VS le format). Close the window by pressing C ANCEL . Back to the Run Rail Analysis... you have to load the Pad Location List that you

    have just created by selecting it within the Files: option. As the Net Name: use VCCand press the A DD button.

    After providing the results directory reports/rail , the GUI should look similar to Figure 8.Press the A PPLY button.

    If the rail analysis succeeded, the C ADENCE S O C E NCOUNTER shell should display an output similarto the following:

    * Exiting vstorm2 normally.

    vstorm2 exited successfully.Check Reports/main.html generated inside state directory.

    8.4 View Rail Analysis Result

    Once the rail analysis is completed, you have to open a new window, named Power & Rail \Results to be able to see the results.

    Student Task 24:

    Go to the menu Power Report Power & Rail Results... .

    This will bring up a new window. In the Basic tab, at rst select the B ROWSE buttonand choose the previously generated rail analysis results, which should be located in thereports/rail directory. The results les will be called something like VCC 25C avg 1 . Pressthe L OADS TATE button to load the results.

    20

  • 8/10/2019 Training 211

    21/31

    Figure 8: Rail Analysis GUI in Cadence SoC Encounter .

    Note that the last number of the result les directory (1 in the example aforementioned) gets incre-mented each time you run a new rail analysis. Thus, when you want to view the results of a new rail

    analysis, you need to load the state from the new result directory. The tool will allow you to visualizedifferent features like the IR Drop or the Current Density directly on your chip.

    IR Drop

    For the rst step we will analyze the IR Drop map of the chip.

    Student Task 25: Under R AIL ANALYSIS P LOT TYP E select IR - IR DROP . Make sure that theoption A UTO APPLY in the ACTION eld is checked. Otherwise you will have to press the A PPLYbutton in order to show the results. Compare your settings with those from Figure 9.This will give

    you a color coded map of the IR drop of your chip. The highest drop will be colored dark red. Youcan dim the rest of the circuit with F9 button to see the IR drop more clearly.

    By default, the tool will automatically determine the color ranges. You can change this if you want inthe A UTO FILTER eld (e. g. by pressing the A UTO button).

    Resistor Current

    In the Power & Rail Results window select RC - RESISTOR CURRENT to show the plot of the currentowing across the wires. Again you can check A UTO APPLY or press A PPLY .

    21

  • 8/10/2019 Training 211

    22/31

    Figure 9: Power & Rail Results GUI in Cadence SoC Encounter .

    22

  • 8/10/2019 Training 211

    23/31

    Resistor Current Density

    The resistor current density plot ( RJ - RESISTOR CURRENT DENSITY option) computes the ratioJ/J max for every wire of the chip. More precisely, J corresponds to the actual chip current densityand J max is the maximal allowed current density of the selected metal. A ratio greater than 1 meansthat the current density limit of the segment is violated. This is an important aspect since for critical

    values of J/J max , your chip could suffer from the problems described at the beginning of Sec. 6.

    Student Task 26:

    Examine the default design, talk to an assistent and discuss some possible solutions inorder to better distribute the power.

    Where is the worst IR drop located?

    Where is the worst resistor current density located? Why?

    9 Power Distribution Techniques

    Throughout this section we will apply different techniques, which allow us to better distribute theavailable power within our design. In order to see how the particular techniques effect the powerdistribution of our design, Table 3 should be updated with the gained results continously.

    Note that, in order to make you aware of the different problems for power distribution, we use a designthat is very bad in the beginning so that you see the increases of the different steps. In a typical chipdesign ow, most of the steps are however not necessary.

    Table 3: Power Distribution Techniques - Results Table.

    Voltage / IR Drop [ V ] Nets below Threshold [%]

    Default design:

    Connected pads:

    Connected macro:

    Widened power rings:

    Doubled power rings:

    Power rings @ Metal /Metal :

    Added power stripes:

    Student Task 27: Have a look at the results of the rail analysis of the default design, which youhave gained during Section 8 and ll out the rst row of Table 3. The rst empty column of thetable should contain the maximum IR Drop within the design, whereas the second column shouldbe completed using the number of nets, which violate the IR Drop threshold (in %).

    23

  • 8/10/2019 Training 211

    24/31

    9.1 Supply Pads Connectivity

    One of the major issues, why our test vehicle has such a bad power distribution is due to the tinyconnections between the supply pads and the actual core of the design. One way to solve thisproblem would be to manually widen those connections. Another way is to use the built in routingoption from C ADENCE S OC E NCOUNTER , which makes sure that the connections are as wide as

    possible for the used supply pads. This can be done in the following way:

    Student Task 28:

    Go to the menu Route Special Route... . Within the B ASIC tab have a look at theR OUTE eld and deselect all options except the one for P AD P INS . Your settings shouldlook similar to those in Figure 10. Close the dialog using the OK button and as soon asrouting has nished, have a look at the newly created connections at the supply pads.

    Run another rail analysis as described in Section 8.3, have a look at the results (see Sec-tion 8.4) and complete the appropriate row in Table 3.

    Figure 10: Special Route GUI in Cadence SoC Encounter to improve Pad Connectivity.

    9.2 Macro Blocks Connectivity

    You should have already recognized that another major problem within our design is the connectivityof the macro block. Fixing this issue is more or less equal to the previous one:

    24

  • 8/10/2019 Training 211

    25/31

    Student Task 29:

    Go to the menu Route Special Route... . Within the B ASIC tab have a look at theR OUTE eld and deselect all options except the one for B LOCK P INS . Close the dialog usingthe OK button and as soon as routing has nished, check the newly created connectionsat the macro block.

    Run another rail analysis and complete the results table.

    9.3 Adjustment of the Power Rings

    The current width of the power rings is denitely at a minimum (they are almost as narrow as the celllibrary allows them to be). In order to get some information about the different available metal layersand their electrical characteristics, you will now examine one of the technology specic les providedby the design kit:

    Student Task 30: Navigate to the directory encounter/tech/lef/ and open the le header6 V55. \lef using less . Browse through the le and complete the following table:

    Minimum Wire Width Maximum Wire Width a Resistance Thicknessm m / m

    Metal 1Metal 3Metal 6

    a Watch out for the maximum wire width before slotting occurs.

    Now you should be able to set the width of the power rings accordingly:

    Student Task 31:

    Use the ruler to determine the width of the power rings. How wide are they currently?

    What would be a more suitable width for the power rings?

    Ask an assistant whether your assumptions are suitable or not. Correct them if necessary.Afterwards open the menu Power Power Planning Add Rings... and insert thesettings illustrated in Figure 11.

    Run another rail analysis and complete the results table.

    Widen the power rings denitely improved the power distribution of our design. Nevertheless, not allof the nets reach the previously dened threshold. Hence, we have to take further steps in order toacquire a lower IR Drop. One possibility is to double the number of power rings:

    Student Task 32: Open the menu Power Power Planning Add Rings... and applythe same settings as within the previous step, except the N ET (S ). Here insert GND VCC GND \VCC, which results in doubled power rings. After hitting the OK button, run another rail analysisand write down the results in Table 3.

    25

  • 8/10/2019 Training 211

    26/31

    Figure 11: Add Rings GUI in Cadence SoC Encounter .

    26

  • 8/10/2019 Training 211

    27/31

    As you should see from your results, the addition of a second power ring does not improve the powerdistribution much. Therefore you can delete the second power ring we have just created by simplyremoving the appropriate wires within the design. What you can see from the previous step is thatoversized power networks do not always help you to get a better power distribution. Instead, theyonly consume die size, which certainly can be used in a better way.

    Throughout the previous section you have gained some electrically-specic information about thedifferent metal layers. Maybe you can already imagine that the choice of the correct metal layer alsoplays a major role during designing the power distribution network. Hence, let us now try to changethe metal layers of our power ring in order to reduce the IR Drop.

    Student Task 33:

    First, remove the existing power ring within the oorplan (select and delete).

    Open the menu Power Power Planning Add Rings... and keep the previouslyentered settings (Check that you do not insert the unnecessary second power ring thistime.), except that you choose a more suitable metal layer.

    Press the OK button and run another rail alaysis. Have a look at the results of the railanalysis and complete the corresponding row in your results table. Which metal layer didyou choose and does the change improve the power distribution?

    9.4 Power Stripes

    Still some of the voltage levels of the nets within our design are below the initially set threshold. Aswe can see from the latest rail analysis results, the highest IR Drop is located right in the middle ofour design. Therefore we will try to correct these violations by inserting power stripes.

    Like during the insertion of power rings, you also have many different parameters which you can tuneduring the insertion of power stripes. Some of them are listed in the following:

    Orientation: Power stripes can, of course, be inserted either horizontally or vertically. Because thesupply wires for the standard-cells are horizontally aligned, vertical power stripes are moresuitable to improve the power distribution.

    Width: As with power rings, the width of the power stripes can be dened.

    Quantity: Depending on the present design, you may have to adjust the number of the power stripesbeing inserted.

    Power Grids: Further power distribution techniques like a power grid (i. e. vertical as well as horizon-

    tal stripes) are possible 6 .

    For our design we will only insert a single power stripe:

    Student Task 34:

    Open the menu Power Power Planning Add Stripes... and navigate to the B A-SIC tab. The stripes should be designed for the Net(s) GND VCC. Choose an appropriatemetal layer and a Vertical direction. The Width of the stripes should be 20 m and they

    6 Figure 10.9 of Section 10.4 within our textbook Digital Integrated Circuit Design, from VLSI Architectures to CMOSFabrication shows some sample layouts.

    27

  • 8/10/2019 Training 211

    28/31

    should have a Spacing of 1.5 m .

    In the S ET PATTERN eld select the N UMBER OF SETS and insert just a single set.

    The stripes should be inserted at a predened location a . Within the First/Last Stripe sectionselect Start from: left and for R ELATIVE FROM CORE OR SELECTED AREA insert 430 m .Compare your settings with those from Figure 12 and press the OK button.

    Run your nal rail analysis and check the results. Complete the results table. Hopefully,you dont have anymore violating nets.

    a As already mentioned earlier, re-runnig the whole backend designow for each power distribution improvementwould have been too time-consuming for a single afternoon. Therefore the nice guys from the DZ have alreadyprepared a suitable location for the power stripes.

    Figure 12: Add Power Stripes GUI in Cadence SoC Encounter .

    28

  • 8/10/2019 Training 211

    29/31

    9.5 Conluding remarks

    Although we primarly tried to reduce the worst case IR drop and tried to be above the speciedthreshold voltage, you should in general also check that the IR drop distribution is consistent to yourexpectations. For instance, you would expect increasing IR drop the farther away you go from powerdistribution.

    Also note that we only do a rail analysis for the VCC net and thus omit the ground network in thistraining.

    10 Its Your Turn

    Now that you are more or less an exper t7 in power analysis and power distribution techniques andyou know how to circumvent appearing problems, you can show what you have learnt by the use ofa new sample design.

    Student Task 35: In order to close the current design use the C ADENCE S O C E NCOUNTER shell and type in:

    enc > freeDesign

    This will close the previous sample design. Open the new design as usual by navigatingto Design Restore Design SoCE... and choose mbcjr chip II.enc from the save directory. Change to the P HYSICAL VIEW of the design.

    Before you can start with the power distribution analysis in C ADENCE S O C E NCOUNTER , you needto create a VCD le to get node activity information using the technique you learned in the rst partof this training. In the following, we recapitulate the ow:

    Student Task 36:

    Compilation of the netlist: As a starting point, use the (C ADENCE S O C E NCOUNTER ) exportedVerilog netlist, which is located at /encounter/out/mbcjr chip II.v . This netlist, together withthe testbench- and simulation-specic VHDL les, has to be compiled. The required VHDLles are listed in the following:

    1. /sourcecode/VHDLTools.vhd 2. /sourcecode/LTEPkg.vhd 3. /sourcecode/mbcjr simulstuff.vhd 4. /sourcecode/mbcjr chip TB pack.vhd 5. /sourcecode/mbcjr chip TB.vhd

    You may want to have have a look at the gate-level compile script we used during the rstpart of this training.

    Simulation of the netlist: If the netlist and the VHDL les have been compiled successfully, you

    7 Although you should be familiar with all of the tasks required for this part of the training, do not hesitate to ask anassistant if you get stuck somewhere. The EDA tools can be a little bit confusing at the beginning. Nevertheless,this part of the training should help you to get a better overview of how power analysis works by going through allof the different steps on your own, this time without a guided tour provided by the assistants.

    29

  • 8/10/2019 Training 211

    30/31

    can start with the actual simulation of the netlist. The gate-level simulation script from therst part of the training will help you to design a suitable run script for your current design.The SDF le you will need for the simulation is located at /encounter/out/mbcjr chip II.sdf \.xed.gz . Because the present design has a RAM macro block in it, you have to specifythe fsa0a c memaker verilog - library before you can run the simulation (In addition to thecore- and I/O-specic verilog libraries.).

    In order to get the VCD le, which contains the information of the nodes during the actualrunning phase of the design, we recommend to generate the VCD le only between 1sand 3s . This, on the one hand, gives you the advantage that you do not generate thetoggle activity during the initialization phase and on the other hand limits the size of theresulting VCD le because of the simulation end time.

    Power Simulation: Now that you have the node activity le, you can switch back to C ADENCES O C E NCOUNTER and create the power-specic les required for the subsequent rail anal-ysis by running a VCD -based power simulation. Do not forget to run the power simulationsetup at Power Power Analysis Set Power Analysis Mode... at rst. Howmuch power does the design consume?

    Check the output in the C ADENCE S OC E NCOUNTER shell in order to be sure that thecoverage of the VCD le is OK and hence your power value is correct. After running thepower analysis the les static GND.ptiavg and static VCC.ptiavg should be available in thedirectory /encounter/reports/power/ .

    Now you are ready to start with your rst attempts in order to improve the power distribution of thenew design. Do not forget to do the setup of the rail analysis as described in Section 8.1 before youstart with the actual analysis.

    Student Task 37:

    Your rst task will be to perform a rail analysis of the initial design and complete the rst rowof Table 4. Then, improve the power distribution network step-by-step using the techniquesyou have seen in the guided example in the previous section.

    Complete the results table below by describing the power distribution technique you haveapplied in the rst row and the resulting maximum IR Drop in the second row. The goalshould be to achieve a minimum supply voltage level of 1.788 V.

    Remark (Hints): In the following, we provide some hints and comments that should help you tothe improve power distribution:

    1. A well formed power distribution network cannot be detected by only considering the worst

    case IR Drop. Rather, try to build your network in a way such that almost all components(standard cells, macro blocks, etc.) are provided with the same supply voltage. This in-cludes that you should not simply stop your efforts as soon as all nets do not violate theinitially set threshold anymore, but try to achieve a balanced power distribution.

    2. As you have seen, the special route option in C ADENCE S O C E NCOUNTER can be usedto route specic nets, such as VCC and GND. However keep in mind that C ADENCE S O CE NCOUNTER considers only those nets, which are not yet connected and moreover con-siders only those wires, which have not been placed yet (i. e. if there are two wires alreadyplaced on two different metal layers and are running across each other, C ADENCE S O CE NCOUNTER will not check whether they should be connected during a special route pro-

    30

  • 8/10/2019 Training 211

    31/31

    cess).

    3. Some of the problems in the design might be much easier to detect by using further analysismethods of the rail analysis, which we have not mentioned in this training. Feel free to trythe other analysis methodes besides IR Drop and Current Density .

    Table 4: New Design Power Distribution Techniques - Results Table.

    Step Power Destribution Improvement Voltage / IR Drop [ V ]

    0 None (Initial design):

    1

    2

    3

    4

    Congratulations Thats it!

    E Present the results to your assistant and discuss any open questions. E