VLSI design challenges and EDA in the forthcoming decade...1 VLSI design challenges and EDA in the...

95
1 VLSI design challenges and EDA in the forthcoming decade Prof. Takayasu Sakurai Center for Collaborative Research, and Institute of Industrial Science, University of Tokyo E-mail:[email protected] 1 DA show ‘01

Transcript of VLSI design challenges and EDA in the forthcoming decade...1 VLSI design challenges and EDA in the...

  • 1

    VLSI design challenges and EDA in the forthcoming decade

    Prof. Takayasu SakuraiCenter for Collaborative Research, and

    Institute of Industrial Science,University of Tokyo

    E-mail:[email protected]

    1

    DA show ‘01

  • T.Sakurai

    Technology trend

    Des

    ign

    rule

    , gat

    e le

    ngth

    Gat

    e ox

    ide

    thic

    knes

    s [n

    m]

    Year1998 2002 2006 2010 2014

    Design rule

    Gate length

    Gate oxide thickness

    DRAM bit per chip

    0.1

    1

    10

    100

    1000

    DR

    AM

    bit

    per c

    hip

    [Gbi

    t]1000

    100

    10

    1

  • T.Sakurai

    Limit of Miniturization

    Conventional I-V curve at 0.04µm (Even down to 0.014µm)    

    0.04µm MOSFET

    0.0 0.4 0.8 1.2 1.6 2.00.00

    0.21

    0.42

    0.63 Vg = 2.0 V

    Vg = 1.6 V

    Vg = 1.2 V

    Vg = 0.8 V

    Gate Length = 40 nm

    Dra

    in C

    urre

    nt[m

    A/µ

    m]

    Drain Voltage [V]

    0.04µm

    M. Ono, M. Saito, T. Yoshitomi, C. Fiegna, T. Ohguro, and H. Iwai, "Sub-50nm gate Length N-MOSFETs with 10 nm Phosphorus Source and Drain Junctions", IEDM Technical Digest, pp. 119 -122, 1993.H. Kawaura, T. Sakamoto, Y. Ochiai, J. Fujita, and T. Baba, "Fabrication and Characterization of 14-nm-Gate-Length EJ-MOSFETs", Extended Abstracts of SSDM, pp.572-573, 1997.

  • T.Sakurai

    Scaling Law

    Drain Source

    Gate

    0.2micron

    Drain SourceGate

    0.2micronSize 1/2

    Size x1/2Voltage x1/2Electric Field x1Speed x3Cost x1/4

    Power density x1.6RC delay/Tr. delay x3.2Current density x1.6Voltage noise x3.2Design complexity x4

    Favorable effects Unfavorable effects

  • T.Sakurai

    Three crises in VLSI designs

    Power crisis

    Interconnection crisis

    Complexity crisis

  • T.Sakurai

    Ever Increasing VLSI Power

    959085800.01

    0.1

    1

    10

    100

    Year

    Pow

    er (W

    )x4 / 3years

    Processors published in ISSCC

  • T.Sakurai

    VDD, power and current trend

    Year

    Volta

    ge [V

    ]

    Pow

    er p

    er c

    hip

    [W]

    V DD

    curr

    ent [

    A]

    1998 2002 2006 2010 20140

    0.5

    1

    1.5

    2

    2.5

    0 0

    200 500

    Current

    Power

    Voltage

    International Technology Roadmap for Semiconductors 1999 update sponsored by the Semiconductor Industry Association in cooperation with European Electronic Component Association (EECA) , Electronic Industries Association of Japan (EIAJ), Korea Semiconductor Industry Association (KSIA), and Taiwan Semiconductor Industry Association (TSIA)

  • T.Sakurai

    Ultra Low-Voltage Operation

    inverter (T=300K)

    2nand(T=300K)

    inverter (T=77K)

    Vin (V)

    Vout

    (V)

    50mV

    25mV

    100mV360mV

    140mV

    J.Burr&J.Shott,"A 200mV Self-Testing Encoder/Decoder using Stanford Ultra-Low-Power CMOS",ISSCC94, pp.84-85.

    ((((Stanford Univ.)Stanford Univ.)Stanford Univ.)Stanford Univ.)

  • T.Sakurai

    Power & Delay Dependence on VDD & VTH

    Power : P = pt •fCLK •CL •VDD + I0 •10 •VDD 2V thS

    (αααα=1.3)

    k • CL • VDD(VDD - Vth)

    ααααDelay = k•Q

    I=

    12

    34

    -0.400.40.8

    00.2

    0.4

    0.6

    0.8

    1x 10-4

    Vth (V)

    VDD(V)

    Pow

    er (W

    )

    A

    B

    12

    34

    -0.400.40.8

    0

    1

    2

    3

    4

    5x 10

    -10

    Del

    ay (s

    )

    Vth (V)

    VDD(V)

    A B

  • T.Sakurai

    Controlling VDD and VTH for low powerLow power →→→→ Low VDD →→→→ Low speed →→→→ Low VTH →→→→ High leakage →→→→ VDD-VTH control

    Active Stand-byMultiple VTH Dual-VTH MTCMOSVariable VTH VTH hopping VTCMOSMultiple VDD Dual-VDD Boosted gate MOSVariable VDD VDD hopping

    *) MTCMOS: Multi-Threshold CMOS*) VTCMOS: Variable Threshold CMOS• Multiple : spatial assignment• Variable : temporal assignment

    Software-hardware cooperationTechnology-circuit cooperation

  • T.Sakurai

    Controlling VDD and VTH for low power

    Active Stand-byMultiple VTH Dual-VTH MTCMOSVariable VTH VTH hopping VTCMOSMultiple VDD Dual-VDD Boosted gate MOSVariable VDD VDD hopping

    • MTCMOS: Multi-Threshold CMOS

  • T.Sakurai

    Multi-Threshold CMOS Circuit

    CMOS logic

    Low-VTHD Q

    Hi-VTH Hi-VTHHi-VTH

    Low-VTH

    St'bySt'by

    MTCMOS logic MTCMOS latch

    In active mode, low-VTH (~0.2V) achieve high speed. In standby mode when St'by signal is high, high-VTH(~0.6V) MOSFET in series to normal logic circuits cut off leakage current. Doesn’t work under 0.8V.

  • T.Sakurai

    Controlling VDD and VTH for low power

    Active Stand-byMultiple VTH Dual-VTH MTCMOSVariable VTH VTH hopping VTCMOSMultiple VDD Dual-VDD Boosted gate MOSVariable VDD VDD hopping

  • T.Sakurai

    Subthreshold current

    NMOS L = 0.3µµµµm

    VGS3V

    VBS

    D

    BG

    ID

    0 1 2

    0.4V

    VBS=0V

    -1V-2V

    -3V

    -410

    -210

    -610

    -810

    -1010

    Gate Source Voltage VGS [V]

    Dra

    in C

    urre

    nt I D

    [A]

    VTH

    )2.0~15.0(2

    110

    0

    ≈−≈−−=γγ

    φγ

    BST

    BSFTTH

    VVVVV

    )decade/V.~s(I s

    VV

    D

    THGS

    1010

    W = 10µµµµm

  • T.Sakurai

    Standby Power Reduction (SPR) CircuitISSCC'95 pp.318-31

    VDD (2V)

    VSS (0V)

    VPBB (-2V)

    VNWELL (2 or 4V)VNBB (4V)

    Level ShifterVoltage Switch

    V1 CW

    M3 M4M5

    CW

    M2M1

    V2V3

    V4 VPWELL (0 or -2V)

    St'by

    St'by

    are added to ensure reliability

    • In standby mode and in IDDQ test, substrate bias is applied to increase VTH, which reduces leakage.

    • In active mode, substrate bias is not applied to lower VTH, which ensures high speed.

  • T.Sakurai

    Previous circuit schemes

    [1] S.Mutoh et al. IEEE, JSSC, 1995. [2] T.Kuroda et al. IEEE, JSSC, 1996.

    MTCMOS [1] VTCMOS [2]• Tunneling leakagecannot be cut-off.

    • Area penalty increaseswhen VDD < 1V.

    • Junction leakage increases due toband-to-band tunneling.

    • Tunneling leakage isnot suppressed.

    High Vth(Stand-by)

    Low Vth (Active)VDD

    Vnn

    Vpp

    GND

    VDD

    GNDV

    GNDHigh Vth

    Low Vth

    (~0.6V)

    (~0.2V) (~0.2V)

    (~0.6V)

  • T.Sakurai

    Transistors go leaky

    1999 2002 2005 2008 201110-13

    10-1210-1110-1010-910-810-710-610-5

    Requirement(Stand-by Power = 1mW)

    (Vbs= -1V)

    JunctionLeakage

    0.1V

    0.2V

    0.3V

    0.4V

    Vth = 0.4V

    SubthresholdLeakage

    Gate TunnelingLeakage

    (180nm) (50nm)(70nm)(100nm)(130nm)

    I off

    (A/u

    m)

    Year (Technology Node)

  • T.Sakurai

    Controlling VDD and VTH for low power

    Active Stand-byMultiple VTH Dual-VTH MTCMOSVariable VTH VTCMOS VTCMOSMultiple VDD Dual-VDD Boosted gate MOSVariable VDD VDD hopping

  • T.Sakurai

    Boosted-Gate MOS (BGMOS)

    Leak cut-off Switch (LS)- high VTH- thick TOX

    VDD

    Virtual VSS

    CMOS circuits- low VTH- ultra thin TOX

    T.Inukai, M.Takamiya, K.Nose, H.Kawaguchi, T.Hiramoto and T. Sakurai, "Boosted Gate MOS (BGMOS): Device/Circuit Cooperation Scheme to Achieve Leakage-Free Giga-Scale Integration," CICC'00, p.409, May 2000.

    0VVDDVBOOST

    Device / circuit collaborative approach

  • T.Sakurai

    Leak switch optimization

    1 2 3 4 5 6 7 8

    0

    1

    2

    3

    Vth = 0.5V (const.)Tech. Node: 50nm

    Tox(nm)

    Dra

    in C

    ondu

    ctan

    ce (m

    S/µ µµµm

    )1 2 3 4

    Vboost (V)

    50

    100

    150

    200

    L g(n

    m)

  • T.Sakurai

    GSI’s in deep-submicron era

    High voltageHigh VTHThick TOX

    Low voltage Low VTHThin TOX

    Logic CircuitsF/F

    SRAM (leak cut-off by high VTH)

    Leak cut-offPower Switch

    Analog(low VDDReducesS/N ratio)

    I/O (for compatibility)

    VDD

    VBOOST

    VSS

  • T.Sakurai

    Power switch gate width in BGMOS

    Degrade circuit speed unpredictably

    VDD

    Logic

    low-VTH

    Power switchhigh-VTH

    StbyVirtual ground

    VDD

    R CX

    VDD

    Kao, DAC'97, pp.409-414.

  • T.Sakurai

    Controlling VDD and VTH for low power

    Active Stand-byMultiple VTH Dual-VTH MTCMOSVariable VTH VTH hopping VTCMOSMultiple VDD Dual-VDD Boosted gate MOSVariable VDD VDD hopping

  • T.Sakurai

    Dual-VTH concept

    Low-VTH circuit(High leakage)

    High-VTH circuit(Low leakage)

    Critical paths

    Non-critical paths

  • T.Sakurai

    Calculated result of power reduction(Theoretical, VDD=0.5V)

    0 0.1 0.2 0.3 0.40

    0.5

    1

    VDD=0.5VVTH(low)=-0.1V∆∆∆∆VTH=±±±±0.04V

    VTH(high)-VTH(low)

    Nor

    mal

    ized

    pow

    er

    td

    P(td)

  • T.Sakurai

    Synopsis simulation resultSimple processor example

    0 0.1 0.2 0.3 0.40

    0.5

    1

    VTH(high)-VTH(low)

    Nor

    mal

    ized

    pow

    er

    VDD=0.5V

    VTH(low)=0.1V

    VTH(low)=0V

    VTH(low)=-0.1V

  • T.Sakurai

    Controlling VDD and VTH for low power

    Active Stand-byMultiple VTH Dual-VTH MTCMOSVariable VTH VTH hopping VTCMOSMultiple VDD Dual-VDD Boosted gate MOSVariable VDD VDD hopping

  • T.Sakurai

    Clustered Voltage Scaling for Multiple VDD’s

    Lower VDD portion is shown as shaded

    CVS StructureConventional Design

    Critical Path

    Level-Shifting F/F

    Critical Path

    FF

    FF

    FF

    FF

    FF

    FF

    FF

    FF

    FF

    FF

    FF

    FF

    FF

    FF

    FF

    FF

    FF

    FF

    FF

    FF

    FF

    M.Takahashi et al., “A 60mW MPEG4 Video Codec Using Clustered Voltage Scaling with Variable Supply-Voltage Scheme,” ISSCC, pp.36-37, Feb.1998.

    Once VL is applied to a logic gate, VL is applied to subsequent logic gates until F/F’s to eliminate DC current paths. F/F’s restore VH.

  • T.Sakurai

    Path-delay Distribution in Dual-VS

    MEF (1527 cells)

    MCB (1366 cells)

    VLD (3812 cells)

    DMA (1493 cells)

    DCT (5466 cells)

    RISC (5645 cells)

    before

    after

    t

    MEC (2912 cells)

    p(t)

    IDCT (6227 cells)

    VLC (3462 cells)

  • T.Sakurai

    Clustered Voltage Scaling Technique

    M.Takahashi et al., “A 60mW MPEG4 Video Codec Using Clustered Voltage Scaling with Variable Supply-Voltage Scheme,” ISSCC, pp.36-37, Feb.1998.

    Pow

    er d

    issi

    patio

    n (m

    W)

    10090

    80

    70

    60

    50

    40

    30

    20

    10

    03.3V

    Conventional2.5VVS &

    VTCMOS

    2.5V & 1.75VDual-VS & VTCMOS

    Logic

    F/FClock

    Memory

    -43%

    -43%

    -43%

    -43%

    -30%

    -37%

    -51%

    ±0%

    Measured

    Measured

  • T.Sakurai

    Controlling VDD and VTH for low power

    Active Stand-byMultiple VTH Dual-VTH MTCMOSVariable VTH VTH hopping VTCMOSMultiple VDD Dual-VDD Boosted gate MOSVariable VDD VDD hopping

  • T.Sakurai

    Energy consumption isproportional tothe square of VDD.

    Energy consumption isproportional tothe square of VDD.

    VDD should be loweredto the minimum levelwhich ensuresthe real-time operation.

    VDD should be loweredto the minimum levelwhich ensuresthe real-time operation.

    Normalized workload0.0 0.2 0.4 0.6 0.8 1.0

    Nor

    mal

    ized

    pow

    er

    0.0

    0.2

    0.4

    0.6

    0.8

    1.0

    Variable VddFixed Vdd

    If you don’t need to hussle,VDD should be as low as possible

  • T.Sakurai

    Application slicing and software feedback loop in Voltage Hopping

    S.Lee and T.Sakurai, “Run-time Power Control Scheme Using Software Feedback Loop for Low-Power Real-time Applications,”ASPDAC'00, A5.2, pp.381~pp.386, Jan. 2000.S.Lee and T.Sakurai, “Run-time Voltage Hopping for Low-power Real-time Systems,” DAC'00, June 2000.

    3

    1 2 3 N...4

    TR3

    TR1TR2

    TR4

    TSF = Time constraint of sync frame

    1 2TC2TC3 TTAR3

    fVAR = f1 = fCLK TTD

    TTD

    fVAR = f2 = fCLK/2

    fVAR = f3 = fCLK/3

    fVAR = f4 = fCLK/4

    TTD

    TTD

    TL3,f1 = TTD + TW3TL3,f2 = TTD + TW3 x 2

    TL3,f4 = TTD + TW3 x 4TL3,f3 = TW3 x 3

    Clock frequency for the previous timeslot was f3.

    Processor coreProcessor core

    Voltagefrequencycontroller

    Voltagefrequencycontroller

    Clock & VDDClock & VDDControl info

  • T.Sakurai

    Run-time Voltage Hoppingreduces power to less than 1/10

    MPEG-4 video encoding

    Transition Delay TTD (ms)0.0 0.2 0.4 0.6 0.8 1.0

    Nor

    mal

    ized

    Pow

    er P

    /PFI

    X

    0.00

    0.02

    0.04

    0.06

    0.08

    0.10

    0.12

    0.14

    0.16

    RVH: 2 levels (f,f/2)RVH : 3 levels (f,f/2,f/3)RVH : 4 levels (f,f/2,f/3,f/4)RVH : infinite levelspost-simulation analysis

    SH-4

    Clock

    VDDSH-4 Modified

    Clock

    VDD

  • T.Sakurai

    Measured power characteristics

    Total power = 0.8 x 0.08 + 0.16 x 0.86 + 0.07 x 0.06 = 0.2WTotal power = 0.8 x 0.08 + 0.16 x 0.86 + 0.07 x 0.06 = 0.2W

    VDD hopping can cut down power consumption to 1/4VDD hopping can cut down power consumption to 1/4

    0.8W

    0

    0.2

    0.4

    0.6

    0.8

    1

    Supply voltage: VDD [V]

    Pow

    er: P

    [W]

    0 1 2

    ƒ=100MHz

    ƒ=200MHz

    0.16W

    Downto 1/5

    Time for sleep: 6% 0.07W

    Time for VDDmin: 86%

    Time for VDDmax: 8%

  • T.Sakurai

    Power Conscious OS & Application Slicing

    0 10 3020 40 50 60

    Proposed scheduling: cooperation of OS and applications (power consumption=0.24)

    0 10 3020 40 50 60

    Speed control within application slices (power consumption=0.47)

    0 10 3020 40 50 60

    Conventional rate-monotonic scheduling (power consumption=1)

    0 10 3020 40 50 60

    Speed control with power-conscious OS (power consumption=0.85)

    Y.S.Shin, H.Kawaguchi, T.Sakurai, "Cooperative Voltage Scaling (CVS) between OS and Applications for Low-Power Real-Time Systems," CICC'01, pp.553-556, May 2001.

  • T.Sakurai

    Power switches

    Supply voltage to SH-4

    (B) SH-4 embedded board

    (A) System

    Hardware

  • T.Sakurai

    Controlling VDD and VTH for low power

    Active Stand-byMultiple VTH Dual-VTH MTCMOSVariable VTH VTH hopping VTCMOSMultiple VDD Dual-VDD Boosted gate MOSVariable VDD VDD hopping

    K. Nose, M.Hirabayashi, H.Kawaguchi, S.Lee and T.Sakurai, “VTH-hopping Scheme for 82% Power Saving in Low-voltage Processors,” to be published, CICC 2001.

  • T.Sakurai

    VTH-hopping

    -0.1 0 0.1 0.20

    1

    2

    4

    6

    8

    1

    0

    2

    Dynamic power

    Delay

    Leakage power

    Nor

    mal

    ized

    del

    ay :

    t d(t d

    (VTH

    =0)=

    1)

    Nor

    mal

    ized

    pow

    er :

    P (P

    (VTH

    =0)=

    1)

    Threshold voltage : VTH [V]

    VDD=0.5V0.35µµµµm process

    Fixed VTHDynamic VTH control

    Normalized workloadN

    orm

    aliz

    ed P

    ower

    0 0.5 10

    0.5

    1

    VTHlow=0VVDD=0.5V0.35µµµµm process

  • T.Sakurai

    Schematic of VTH-hopping

    VTHcontroller

    Freq. control

    CONT

    fCLK or fCLK/2

    VBSP2

    Target processor

    VTH-hopping(VTHlow&VTHhigh)

    Power control block

    VBSP1

    VBSN2VBSN1

    VTHlow_Enable

    VTHhigh_Enable

    VDD

    GND

    VBSN

    VBSP

  • T.Sakurai

    Microphotograph of RISC processor

    0.6µµµµm process

    Overhead of VTH-hopping= 14%

    RISC core = 2.1mm x 2.0mm

    VBS selector = 0.2mm x 0.6mm

  • T.Sakurai

    Power comparison

    Fixed VTHFixed freq.

    Dual VTH0

    0.5

    1

    Leakage powerDynamic power

    VDD=0.5VVTHlow=0V94% f/2 operation

    Nor

    mal

    ized

    pow

    er

    VTH-hopping VTH-hopping+ Dual VTH

  • T.Sakurai

    Principle for Power Reduction

    ● Lowering switching probability (αααα)● Low transition coding● Gated clock , Conditional F/F● Power estimation / optimization CAD

    ● Lowering load capacitance (CL)● Embedded memory● Low capacitance circuit● Low-power cell library (gate sizing)

    ● Lowering supply voltage (VS • VDD=VDD2)● Variable-VDD, Variable-VTH, Multi-VDD, Multi-VTH , DC/DC ● Low voltage memory● Low swing clock / bus

    ● Lowering operating frequency● Better algorithm

    leak.juncleakgate10IfVVCP s/V0CLKDDSL TH ++•+••••α≈−

  • T.Sakurai

    Bus Shuffling

    ● Bus shuffling● Virtually no overhead● Pattern information is necessary: applicable to special-

    purpose systems● Layout modification or compiler backend support

    ● Problem definition● Given a set of patterns or statistics of patterns● Find order of bus lines so that power consumption due

    to area and coupling capacitances is minimized

    Y. Shin and T. Sakurai, “Coupling-driven bus design for low-power application-specific systems,”Proc. Design Automation Conf. (DAC), pp.750-753, June 2001.

  • T.Sakurai

    Bus Shuffling

    ● Example

    0001111001110001001100000100111110000111

    0001111001110001001100000100111110000111

    0001111001110001001100000100111110000111

    0001111001110001001100000100111110000111

    0001111001110001001100000100111110000111

    0001111001110001001100000100111110000111

    0000111111100001001000011100111010011010

    time

    b7 ……. b0

    shielding lines line with highest transition prob.

    line with highest positive switchingcorrelation with b6

    line with highest positive switchingcorrelation with b6 or b0

    partial order of this cluster: b0b6b5

  • T.Sakurai

    Experiments● Result of heuristic

    ● 7 data address sets● % power saving compared to un-shuffled

    buses

    0

    10

    20

    30

    40

    50

    60

    70

    80

    90

    100

    wavelet linear laplace compress lowpass fft ac3

    % s

    avin

    g

    eta=1eta=2eta=3eta=4eta=5eta=infinity

    ηηηη=coupling cap.

    area cap.

  • T.Sakurai

    Software Level Low-Power WorkLow-power SW

    Applications Operating system (OS)

    Source-level design Compilation

    -LP appl design-Power-aware appl

    -Instruction scheduling-Register allocation-Memory mapping

    Shutdown Slowdown

    -Time-out-Predictive-Policy opt

    -Reactive system-Real-time system

    Techniques with architectural changes

    -Cache design-Memory hierarchy design-Code compression-Application-specific processor design-Encoding

  • T.Sakurai

    Less wasteful design wins low-power race

    Example of MPEG2 decodingProcessor (software)~~~~25W

    DSP~~~~4W

    Dedicated sytem LSI (SW/HW)~~~~0.7W

    Low

    -pow

    er

    Hig

    h fle

    xibi

    lity

  • T.Sakurai

    Summary – low-powerFor reducing standby power, insert a power switch in series to logic circuit (BGMOS).→→→→ Choice of power switch gate width

    For reducing active power, dual-VTH scheme and software control of VDD and VTH are promising.→→→→ Tools to support system-level low-power design with S/H co-design capability

    Future giga-scale integration will use multiple VDD, VTH and Tox.→→→→ Tools to support new tech.

  • T.Sakurai

    Drain induced barrier lowering (DIBL)

    ● Drain-Source間電圧が下がるとバンドギャップが増加し,間電圧が下がるとバンドギャップが増加し,間電圧が下がるとバンドギャップが増加し,間電圧が下がるとバンドギャップが増加し,リーク電流が減少するリーク電流が減少するリーク電流が減少するリーク電流が減少する

    qVDDqVDD

    Low VDDHigh VDD

    Source Drain

  • T.Sakurai

    Stack effect

    ● Stack effect の最大の利点の最大の利点の最大の利点の最大の利点 = VINT

  • T.Sakurai

    Scaling of DIBL factor

    0.1030.0760.0570.042DIBL factor : λλλλ

    11.510.59.58stack effect factor

    0.91.21.51.8VDD

    0.060.090.130.18generation [µµµµm]

    S=80mV/decade

  • T.Sakurai

    Analysis of DIBL effect

    ● DIBLは閾値を変えるだけと仮定するは閾値を変えるだけと仮定するは閾値を変えるだけと仮定するは閾値を変えるだけと仮定する (ON時も時も時も時もDIBLを受けるを受けるを受けるを受ける)● VTH=0.15VDDH, その他のパラメータは前のスライドと同じその他のパラメータは前のスライドと同じその他のパラメータは前のスライドと同じその他のパラメータは前のスライドと同じ● 効果が世代にほとんどよらず約効果が世代にほとんどよらず約効果が世代にほとんどよらず約効果が世代にほとんどよらず約12%に減少に減少に減少に減少

    VDDH ↔↔↔↔ VDDL

    ‘0’

    0.06 0.09 0.13 0.180

    1

    2

    Design rule [µµµµm]

    0

    1

    0.40.2 Pr

    atio

    = P

    (VD

    DL)

    /P(V

    DD

    H)

    Volta

    ge [V

    ]

    VDDH

    VDDL

    Pratio

  • T.Sakurai

    Three crises in VLSI designs

    Power crisis

    Interconnection crisis

    Complexity crisis

  • T.Sakurai

    Interconnect determines cost & perf.

    0

    20

    40

    60

    80

    100

    Pow

    er [%

    ]

    MOSFET

    2000 ‘05 ‘10 ‘15

    Num

    ber o

    f int

    erco

    nnec

    t la

    yers

    Year

    (ITRS’99)

    0

    20

    40

    60

    80

    100

    Del

    ay [%

    ]

    1mm Cu

    0

    20

    40

    60

    80

    100

    Proc

    ess

    step

    s [%

    ]# of layers

    RC delayw/o buffers

    Interconnect

    MOSFETGatedelay

    2000 ‘05 ‘10 ‘15Year

    2000 ‘05 ‘10 ‘15Year

    2000 ‘05 ‘10 ‘15Year

    11

    10

    9

    8

    7

    6

    Interconnect

  • T.Sakurai

    DSM interconnect design issuesbecome major design closure obstacles

    Larger currentIR drop (static and dynamic)Reliability (electro-migration)

    Smaller geometry / Denser patternRC delayCrosstalk noiseDelay fluctuationSignal Integrity

    Higher speedInductanceEMI

  • T.Sakurai

    Interconnect parameters trend

    0

    1

    2

    3

    4

    1996 2000 2004 2008 2012Year

    εεεε r

    ρρρρ[ΩΩΩΩ • cm]

    Aspect ratio

    Width [x0.1μμμμm]

    Al Cu

    Semiconductor Industry Association roadmaphttp://notes.sematech.org/1997pub.htm

  • T.Sakurai

    RC delay and gate delay

    1996 2000 2004 2008 201210-12

    10-11

    10-10

    10-9

    10-8

    Year

    Clock period

    Gate delay

    3mm

    1mm

    100µm

    50µm

    Del

    ay (s

    ec)

  • T.Sakurai

    Buffer insertion

    a) Without buffers b) With buffers

    RINTCINT

    ( )INTINTOPTOPT

    MOSINTINTINTOPT

    INTINTOPT

    INT

    INTOPT

    INTINTINTINT

    TINTINTTTTINTINT

    CCppChkgatesofCap

    CRCRpppDelay

    stagesofnumberOptimizedCRCR

    ppk

    kDelay

    inverterbufferofsizeOptimizedCRRCh

    hDelay

    BufferedhCk

    Rk

    Ch

    RhCh

    Rpk

    Ck

    RpkDelay

    MOSFETminimumofresistanceeffectiveGateRMOSFETminimumofecapacitancGateC

    CRCRCRCRt

    73.0/.

    4.22

    :0

    :0

    :

    ::

    )(693.0377.0

    210

    00221

    002

    1

    0

    0

    00

    00

    21

    0

    0

    05

    ===

    ττ≈+=

    =→=∂

    =→=∂

    +++≈

    +++≈

    RTCT

  • T.Sakurai

    Power and delay optimization

    1 1.2 1.4 1.6 1.8 2 2.2 2.4 2.6 2.8 30.20.40.60.81

    1.21.41.61.8

    D/D

    OPT

    , h/h

    OPT

    , k/k

    OPT

    Capacitance increase / CINT

    h: size of repeater

    k: # of stages

    D: total delay

    CINT: interconnect

    capacitance

    a) Without repeaters b) With repeaters

    RINTCINTRTCT

  • T.Sakurai

    Buffer insertion with junction cap.

    ( )

    )CCwhen(C04.1)0Cwhen(C73.0

    CPPC

    CCC

    PPCChk)increasepower(p

    00JINT

    0JINT

    INTJ2

    1INT

    0

    0J0

    2

    10J0OPTOPT

    ====

    η=+

    =+=∆

    CJ

    ( )

    ( )

    )CCwhen(L8.2

    )0Cwhen(L4.2

    CR,/1pppL2

    CCRCRCC

    CpppL2Delay

    00J0MOS0INT

    0J0MOS0INT

    000MOSJ0MOS0INTJ221

    0J000INT0INT0J0

    0221OPT

    =ττ≈

    =ττ≈

    =τηττη+=

    +

    ++=

    b) With buffersa) Without buffers

    RINTCINTRTCT

    CJ L:length k stagesSize: h

    Length: l

    0

    0JJ C

    C1+=η

    Total power is 15% smaller when CJ is negligible.

    Delay is 15% smaller when CJ is negligible.

    Delay is a geometric mean of interconnect delay and gate delay and decreases when gate is faster.

  • T.Sakurai

    Buffered interconnect delay

    1998 2002 2006 2010 201410-11

    10-10

    10-9

    10-8

    Gate delay

    1um x 1um chip long

    Clock period (global)

    Clock period (local)

    1um x 1um chip long buffered

    Year

    Del

    ay (s

    ec)

    • In Emotion Engine design for PS2, about 10k buffers are inserted, which is increasing.

  • T.Sakurai

    RC delay of global interconnections

    10-10

    10-9

    10-8

    10-7

    10-6

    Year

    Del

    ay (s

    ec)

    6µm x 2µm cross-section interconnectGlobal interconnect

    Chip18mm x 18mm

    30mm x 30mm

    1998

    Clock period

    Minimum cross-section interconnect

    2002 2006 2010 2014

  • T.Sakurai

    DSM interconnect design issues

    Larger currentIR drop (static and dynamic)Reliability (electro-migration)

    Smaller geometry / Denser patternRC delayCrosstalk noiseDelay fluctuationSignal Integrity

    Higher speedInductanceEMI

  • T.Sakurai

    Capacitive Coupling Noise

    1996 2000 2004 2008 2012Year

    0

    1

    2

    3

    4

    C20

    C12C12

    C12/C20

    peak couple noise / signal voltage

    ratio

  • T.Sakurai

    Coupling noise in RC bus

    0 1 20

    1

    Time : t / RC

    Volta

    ge :

    V / E

    1

    Vp

    Vp ≈ Cc/C1+ 2Cc/C

    0 1 2Time : t / RC

    0

    1Vo

    ltage

    : V

    / E1

    Vp

    V1

    V2

    V1

    V2

    Cc / C = 1Cc / C = 1

    E10 t=0

    RC

    CRCc

    V1

    V2

    RC

    CRCc

    V1

    V2Cc

    E10 t=0

    E10 t=0

    RC

    CRCc

    V1

    V2

    RC

    CRCc

    V1

    V2Cc

    E10 t=0

    Vp = 1+ 4Cc/C -1√√√√1+ 4Cc/C +1√√√√Vp ≈

    2Cc/C2+ 3Cc/C

    (Cc/C < 2)(Bus)

    (Three lines)H.Kawaguchi and T.Sakurai, "Delay and Noise Formulas for Capacitively Coupled Distributed RC Lines," ASPDAC, Digest of Tech. Papers, pp.35-43, Feb. 1998.

  • T.Sakurai

    Coupling among Interconnections

    0 0.5 1 1.50

    0.5

    1

    Nor

    mal

    ized

    vol

    tage

    time

    w/o buffers

    CCCC

    C CC/C=3.1

    +105%-58%

    )C/C()(..

    elogRCt

    C

    pd

    =≤+≈

    ++−+=

    ηηη

    ηηη

    2370631

    68126

    1221 2

    0.2µµµµ 0.2µµµµ0.3µµµµ0.5µµµµ

    H.Kawaguchi and T.Sakurai, "Delay and Noise Formulas for Capacitively Coupled Distributed RC Lines," 1998 ASPDAC, Digest of Tech. Papers, pp.35-43, Feb. 1998.

  • T.Sakurai

    Coupling among Interconnections

    0 0.5 1 1.50

    0.5

    1

    time

    Nor

    mal

    ized

    vol

    tage Normal buffer insertion

    Zigzag buffer insertion

    WN/WP=25µµµµ/38µµµµ0.13mm processLINT=3mmK=8 (# of buf.)

    +32%-28%

    +9%

    0

    0.5

    1

    Nor

    mal

    ized

    vol

    tage

    time0 0.5 1 1.5

    In1 In2 In3

    In1

    In2

    In3

    In1

    In2

    In3

    +4%

  • T.Sakurai

    DSM interconnect design issues

    Larger currentIR drop (static and dynamic)Reliability (electro-migration)

    Smaller geometry / Denser patternRC delayCrosstalk noiseDelay fluctuationSignal Integrity

    Higher speedInductanceEMI

  • T.Sakurai

    Interconnect Cross-Section and Noise

    Unscaled / anti-scaled• Clock• Long bus• Power supply

    Scaled interconnect• Signal

    1V 20W →→→→20A current5% noise →→→→0.05V noise →→→→~0.02V / 20A→→→→~10µm thick CuThick layer interconnect, area pad, package are co-designed.

  • T.Sakurai

    DSM interconnect design issues

    Larger currentIR drop (static and dynamic)Reliability (electro-migration)

    Smaller geometry / Denser patternRC delayCrosstalk noiseDelay fluctuationSignal Integrity

    Higher speedInductanceEMI

  • T.Sakurai

    Inductance• Now RC effects surmounts LC

    effects because R > |jωωωωL|.

    • In the future, both of R and ωωωωLincrease (but for signal lines R > |jωωωωL|).

    • Exception in low-R lines

    • Inductive effects should be considered in wide clock lines in a fast processor, power supply lines and wide-bit bus lines changed at the same time.

    • Clock lines are placed on power plane to reduce inductive effects.

    [1] D.A.Priore, "Inductance on Silicon for Sub-micron CMOS VLSI," Symp. on VLSI Circuits, 1993.

    W / H

    L : S

    elf-i

    nduc

    tanc

    e (n

    H/ c

    m)

    L = 2 ln 6H0.8W + T

    100

    10

    10.001 0.01 0.1 0.5

    0.1

    0.01

    0.001T/H=

  • T.Sakurai

    Inductive Effects

    10-2

    10-1

    100

    101

    102

    1996 2000 2004 2008 2012Year

    ω ωωωL

    / R

    Min. width (scaled)

    W=1μμμμmmmm

    W=10μμμμmmmm

    W=100μμμμmmmm

  • T.Sakurai

    Inductive Effects in Clock Lines

    Board design practice is imported in LSI.

    P.J.Restle & A Deutsch, “Designing the Best Clock Distribution Network,” VLSI circuits symp., pp.2-3, May 1998.

  • T.Sakurai

    Three crises in VLSI designs

    Power crisis

    Interconnection crisis

    Complexity crisis

  • T.Sakurai

    VLSI Design in 2010

    Designing a map of 10m wide roadsfor a world atlas

  • T.Sakurai

    Overcome complexity crisis

    MPU Core

    Cache

    ROMRAM

    MPEG CoreUSB Core

    ProprietaryLogic

    IP (A inc.)IP (B univ.)IP (C inst.)IP (D semi.)

    IP ; CPU, DSP, memories, analog, I/O, logic..HW/FW/SW

    • Re-use and sharing of design• Design in higher abstraction

  • T.Sakurai

    System LSI for Games

    ●●●● Clock freq. 300MHz

    ● ● ● ● 10M transistors

    ● ● ● ● Graphics synthesizer integrate

    40M tr. With embedded DRAM

    ●●●● Memory bandwidth 3.2GB/s

    ●●●● Floating operation 6.2GFLOPS/sec

    ●●●● 3D CG 6.6M polygon/sec

    ●●●● MPEG2 decode

  • T.Sakurai

    Issues in System-on-Chip

    • Un-distributed IP’s (i.e. CPU, DSP of a certain company)

    • Low yield due to larger die size

    • Huge initial investment for masks & development

    • IP testability, upfront IP test cost

    • Process-dependent memory IP’s

    • Difficulty in high precision analog IP’s due to noise

    • Process incompatibility with non-Si materials and/or

    MEMS

  • T.Sakurai

    Technologies integrated on a chip

    ‘98

    LogicSRAM

    Flash memoryEmbedded DRAM

    CMOS RFFPGAMEMS

    FeRAMChemical sensors

    Electro-opticalElectro-biological

    Year‘00 ‘02 ‘04 ‘06 ‘08 ‘10 ‘12

    +0+1~2+4+4~5+3~5+2+2~10+4~5+2~6+5~8+?

    ITRS’99RF : Radio FrequencyFPGA : Field Programmable Gate ArrayMEMS : Micro Electro Mechanical SystemsFeRAM : Ferroelectric RAM

  • T.Sakurai

    DRAM embedding

    DRAM Processor System LSI

    Two orders of magnitude improvement in bandwidth and power

    K.Sawada, T.Sakurai, et al, "A 72K CMOS Channelless Gate Array with Embedded 1Mbit Dynamic RAM," in Proc. CICC'88, pp.20.3.1-20.3.4, May 1988.

    BUT EXPENSIVE!

  • T.Sakurai

    Micro-machined mechanical switch

    G.Weinberger, “The New Millennium: Wireless Technologies for a Truly Mobile Society,” ISSCC, pp.20-24, Feb. 2000.

  • T.Sakurai

    Silicon MEMS microphone

    M.Pinto, “Atoms to Applets: Building Systems ICs in the 21st Century,” ISSCC, pp.26-30, Feb. 2000.

    10umWill soon exceed the performance of the best commercial microphones, yet be inexpensive and potentially integrated with on-chip electronics.

  • T.Sakurai

    System-in-Package

  • T.Sakurai

    System on a Chip

    SoC vs. SiP

    Chip

    MPU Core

    Cache

    ROMLogic

    AnalogUSB Core

    DRAM• Smaller area• Shorter interconnect• Optimized process for

    each die (Analog, DRAM, MEMS…)

    • Good electrical isolation• Through-chip via

    • Heat dissipation is an issue

    Heat spreader/Heat pipe

    System in a PackageMPU Core

    DRAM

    Logic

    AnalogCache

    USB ROM

  • T.Sakurai

    Superconnect example based on three-dimensional assembly

    PURE LOGIC

    Heat Sink

    ANALOG

    RF/ANALOG

    DRAM

    K.Ohsawa, H.Odaira, M.Ohsawa, S.Hirade, T.Iijima, S.G.Pierce, “3-D Assembly Interposer Technology for Next-Generation Integrated Systems,” ISSCC Digest of Tech. Papers, pp.272-273, Feb.2001.

  • T.Sakurai

    System-in-Packageの課題の課題の課題の課題

    • Special design tools for placement & route for co-

    design of LSI’s and assembly

    • High-density reliable substrate and metallization

    technology

    • low-cost, available known good die

    (reworkablility and module testing)

  • T.Sakurai

    Super-connect technologyD

    esig

    n ru

    le (µ

    m)

    0.01

    0.1

    1

    10

    100

    Past Present Future

    Tr. gate

    Tr. gate

    Tr. gate

    InterconnectUpper layerLower layer

    Upper layer

    Middle layer

    Lower layer

    Super-connect

    PackagePackagePackage

    Technology Vacuum

    M. Kimura,"Superconnect : 21st Century LSI Production and Design Method", Nikkei Microdevices, no.180, pp.62-79, June 2000.

  • T.Sakurai

    Super-connect

    Bandwidth

    0.1

    1

    10

    100

    Design rule Power@1GB/s

    Area

    102

    103

    104

    105

    [mm] [mW] [GB/sec]

    [mm2/bit]

    Super-connect

    Off-chip On-chip

    1

    10

    100

    1000

    Cost/line

    [AU]

    10

    1000[day]

    100

    Turn-aroundtime

    1

    10

    100

    1000

    1

    10

    100

    1000

  • T.Sakurai

    0.5V LSIでの逆温度特性での逆温度特性での逆温度特性での逆温度特性

    Photograph of 32bit FA0.3µµµµm CMOS

    0000 0.50.50.50.5 1111 1.51.51.51.5 22220000

    1111

    2222

    3333

    4444

    Norm

    aliz

    ed

    Norm

    aliz

    ed

    Norm

    aliz

    ed

    Norm

    aliz

    ed t ttt

    pd

    pdpd

    pd

    VDD [V]

    90ºC

    50ºC

    20ºC Measured

    K.Kanda, K.Nose, H.Kawaguchi, and T.Sakurai,"Design Impact of Positive Temperature Dependence of Drain Current in Sub 1V CMOS VLSI's",CICC99, pp.563-566, May 1999.

  • T.Sakurai

    Assembly & Packaging

    “There is an increased awareness in the industry that assembly and packaging is becoming a differentiator in product development.”

    International Technology Roadmap for Semiconductors, ITRS’99 p.213

  • T.Sakurai

    LSI in 2014Year Unit 1999 2014 Factor

    Design rule µm 0.18 0.035 0.2Tr. Density /cm2 6.2M 390M 30Chip size mm2 340 900 2.6Tr. Count per chip (µP) 21M 3.6G 170DRAM capacity 1G 1T 1000Local clock on a chip Hz 1.2G 17G 14Global clock on a chip Hz 1.2G 3.7G 3.1Power W 90 183 2.0Supply voltage V 1.5 0.37 0.2Current A 60 494.6 8Interconnection levels 6 10 1.7Mask count 22 28 1.3Cost / tr. (packaged) µcents 1735 22 0.01Chip to board clock Hz 500M 1.5G 3.0# of package pins 810 2700 3.3Package cost cents/pin 1.61 0.75 0.5

    International Technology Roadmap for Semiconductors 1998 update sponsored by the Semiconductor Industry Association in cooperation with European Electronic Component Association (EECA) , Electronic Industries Association of Japan (EIAJ), Korea Semiconductor Industry Association (KSIA), and Taiwan Semiconductor Industry Association (TSIA) , International Technology Roadmap for Semiconductors: 1999 edition. Austin, TX:International SEMATECH, 1999.

  • T.Sakurai

    Retinal Prosthesis

    Prosthesis Prosthesis -- DualDual IntraocularIntraocular UnitsUnits

    Courtesy: Prof. Wentai Liu (North Carolina Univ.)http://www.ece.ncsu.edu/erl/faculty/wtl_data/retina.html

  • T.Sakurai

    Summary – Interconnect & SiPNew possibilities with buffered interconnects may open up new tools opportunity.

    Silicon-in-Package needs new tools that support co-design of VLSI’s, package and assembly.

    New design closure issues

  • T.Sakurai

    Super-expoenentially increasing design challenges

    1K

    1Billion

    Num

    ber o

    f Tra

    nsis

    tors

    Functionality + Testability

    Functionality + Testability + Wire delay

    Functionality + Testability + Wire delay+ Power management

    Functionality + Testability + Wire delay+ Power management

    + Embedded software

    Functionality + Testability + Wire delay+ Power management

    + Embedded software + Signal integrityFunctionality + Testability + Wire delay+ Power management + Embedded software + Signal integrity + RF

    Functionality + Testability + Wire delay+ Power management + Embedded software + Signal integrity + RF +Hybrid chips

    Functionality + Testability + Wire delay+ Power management + Embedded software + Signal integrity + RF +Hybrid chips + PackagingFunctionality + Testability + Wire delay+ Power management + Embedded software + Signal integrity + RF +Hybrid chips + Packaging + Management of physical limits

    ITRS’99