Device and architecture co-optimization – Large search space – Need fast yet accurate power and...

1
Device and architecture co-optimization –Large search space –Need fast yet accurate power and delay estimator for FPGAs Trace-based power and delay estimator (Ptrace) Optimization result –Reduce energy delay product by 18.4% and area by 23% –LUT size 5 provides the maximum power and delay combined yield 1 2 3 4 5 6 7 8 9 10 11 12 I Leak,N ,w afer24,Layout1 -1.03 -1.02 -1.01 -1 -0.99 -0.98 -0.97 -0.96 -0.95 -0.94 Target function driven component analysis (FCA) Given a target function f(X 1 , X 2 ) –Find out the linear decomposition matrix W to minimize the error mean, variance, and skewness of f(·) when ignoring high order dependence –FCA has the same complexity as PCA and ICA but more accurate Approximate max operation using second order polynomial Works for all three delay models More efficient and accurate than that using Fourier series –20X faster than that using Fourier Series –Computational complexity O(n 3 ) for quadratic delay model O(n) for others Within 2% error compared to MC simulation Max operation using Fourier series approximation –Approximate PDF of variation sources by Fourier Series –Apply moment matching to reconstruct the canonical form of max operation All operations are based on either close form formulae or lookup table –Computational complexity O(nK 2 ) Only works for linear and semi- quadratic delay model Within 5% error compared to MC For the CMOS technology scaling, process variation has become a potential show-stopper if not appropriately handled. These variations introduce significant uncertainty for both circuit performance and leakage power. Statistical modeling, analysis, and optimization for VLSI circuits has thus become the frontier research topic in recent years in combating such variation effects. As the process advances to nanometer technologies and low-energy embedded applications are explored for FPGAs, power consumption becomes a crucial design constraint for FPGAs. It is well known architecture and device setting have great impact on FPGA power and performance. However, how to perform statistical optimization, considering both device and architecture has not been solved by previous works. In addition, some reliability issues, such device aging and soft error rate (SER) may affect the performance of FPGAs. Such impact was not considered in the previous works either. Besides FPGAs, statistical modeling and analysis for ASICs are also hot research topics. There are many works on statistical timing and power modeling and analysis. However, how to efficiently perform statistical static timing analysis (SSTA) for non-linear delay model with non-Gaussian variation sources is still a hard problem. Moreover, most of statistical analysis assumes independent variation sources and apply principle component analysis (PCA) or independent component analysis (ICA) to decompose dependent variation sources. However, some of the variation sources are non-linearly dependent, such as Leff and Vth. In this case, the linear operation (such as PCA or ICA) cannot completely remove dependence. How to handle the non-linear dependent variation sources is another unsolved problem. Spatial correlation is another concern in statistical analysis. Many recent works try to model spatial correlation as a function of distance. However, some recent research work observe that the spatial correlation mainly comes from the deterministic across wafer variation and the pure random spatial variation is not significant. Modeling across wafer variation is also a challenge problem. Ph.D.’09: Statistical Modeling and Optimization for VLSI Circuits Student: Lerong Cheng ([email protected]) Advisor: Lei He Co- advisor: Puneet Gupta EDA Lab (http://eda.ee.ucla.edu), Electrical Engineering Department, UCLA L. Cheng, P. Wong, F. Li, Y. Lin, and L. He, “Device and Architecture Co-Optimization for FPGA Power Reduction,” DAC, 2005. P.Wong, L. Cheng, Y. Lin, and L. He, “FPGA Device and Architecture Evaluation Considering Process Variation,” ICCAD, 2005. L. Cheng, J. Xiong, and L. He, “FPGA Performance Optimization via Chipwise Placement Considering Process Variations,” FPL, 2006. L. Cheng, J. Xiong, and L. He, .Non-Linear Statistical Static Timing Analysis for Non-Gaussian L. Cheng, J. Xiong, and L. He, “Non-Gaussian Statistical Timing Analysis Using Second-Order Polynomial Fitting,” ASPDAC, 2008. L. Cheng, Y.Lin, L. He, and Y. Cao, “Trace-Based Framework for Concurrent Development of Process and FPGA Architecture Considering Process Variation and Reliability,” ISFPGA, 2008. L. Cheng, P. Gupta, and L. He, “Accounting for Non-linear Dependence Using Function Driven Component Analysis,” ASPDAC 2009. Collaborators: Dr. Jinjun Xiong, Dr. Yan Lin, Dr. Fei Li, and Miss Phoebe Wong, Introduction Block-based SSTA operations –Add (simple) –Max (hard) Core operation of SSTA Delay model Linear, efficient but not accurate –Quadratic, accurate but not efficient –Quadratic without crossing term (semi-quadratic), efficient and somewhat accurate Analysis of Non-Linear Dependence Statistical Modeling and Optimization for FPGAs References & Collaborators Statistical Static Timing Analysis Modeling of Across-Wafer Variation Switching activity Ratio of short circuit power Critical path structure Circuit element statistics Area Chip level area, delay and power Circuit level delay and power VPR Psim Trace collect ion Device dependen t Device independen t FPGA chipwise placement for timing optimization Concurrent design of process and FPGA architecture –Develop process and architecture concurrently in order to shorten the time to market –Need to estimate FPGA power and delay from process parameters Ptrace2 –Based on ITRS Mastar4 transistor model Analysis result –Device aging leads to 8.5% delay degradation after 10 years –Neither device aging nor process variation has impact on SER Programmability of FPGAs offer a unique opportunity to leverage process variation and improve circuit performance –Perform placement according to the chipwise variation maps –Improve performance up to 12.1% Across-wafer variation can be approximated as quadratic function After subtracting the across- wafer variation, purely random spatial correlation is not significant In the die point of view, the within wafer is spatially correlated –This variation is not purely random –Cannot be modeled as random correlated variation New Variation Model –Exactly model the across wafer variation –Only 4 random variables: Xc, Yc, mw, and r –More accurate and efficient than spatial l variation model 1 2 3 4 5 6 7 8 9 10 11 12 Frequency,w afer24,Layout1 0.8 0.85 0.9 0.95 1 1.05 1.1 1.15 Loca- tion Our Model Spatial Correlation model µ σ 95% T (s) µ σ 95% T (s) LL-C +0.7 +1.1 +0.5 15.3 +2.4 +1.5 +5.2 154 (10.1X) LR-C +0.2 +1.1 -0.2 14.7 +0.0 +8.8 -1.4 155 (10.5X) UL-C -0.2 -0.6 +0.1 15.2 -0.2 +7.6 -0.1 153 (10.1X) UR-C +0.2 +0.6 +0.1 14.9 -0.7 +4.8 -1.3 152 (10.2X) Target function f(X 1 , X 2 ) Samples of X 1 , X 2 Joint moments of X 1 , X 2 Error of moments of f as function of transfer matrix Nonlinear programming Minimizing error of moments of f Transfer matrix W Target function f(X 1 , X 2 ) Samples of X 1 , X 2 Transfer matrix W Joint moments of X 1 , X 2 Moments of P 1 , P 2 ρ ij of P 1 , P 2 Function of P 1 , P 2 g(P 1 , P 2 ) Result with Correct dependence Result assuming ρ ij =0 Error Linear operation is used to decomposed dependent variation sources Not accurate with existence of non-linear dependence Need to estimate the error introduced by ignoring non-linear dependence Define high order correlation coefficient Circuit delay Comparison Wafer frequency Wafer leakage Across-wafer variation is looked on as spatial correlated in the die point of view PDF comparison Approximate max operation as second order polynomial PDF comparison Fourier Series approximation of PDF Performance improve under different utilization rate Performance improvement histogram PTRACE
  • date post

    19-Dec-2015
  • Category

    Documents

  • view

    215
  • download

    0

Transcript of Device and architecture co-optimization – Large search space – Need fast yet accurate power and...

Page 1: Device and architecture co-optimization – Large search space – Need fast yet accurate power and delay estimator for FPGAs  Trace-based power and delay.

Device and architecture co-optimization

ndashLarge search space

ndashNeed fast yet accurate power and delay estimator for FPGAs

Trace-based power and delay estimator (Ptrace)

Optimization result

ndashReduce energy delay product by 184 and area by 23

ndashLUT size 5 provides the maximum power and delay combined yield

1 2 3 4 5 6 7 8 9

1

2

3

4

5

6

7

8

9

10

11

12

ILeakN

wafer 24 Layout 1

-103

-102

-101

-1

-099

-098

-097

-096

-095

-094

Target function driven component analysis (FCA)

ndashGiven a target function f(X1 X2)

ndashFind out the linear decomposition matrix W to minimize the error mean variance and skewness of f() when ignoring high order dependencendashFCA has the same complexity as PCA and ICA but more accurate

Approximate max operation using second order polynomial

Works for all three delay models

More efficient and accurate than that using Fourier series

ndash20X faster than that using Fourier Series

ndashComputational complexity O(n3) for quadratic delay model O(n) for others

Within 2 error compared to MC simulation

Max operation using Fourier series approximation

ndashApproximate PDF of variation sources by Fourier Series

ndashApply moment matching to reconstruct the canonical form of max operation

All operations are based on either close form formulae or lookup table

ndashComputational complexity O(nK2)

Only works for linear and semi-quadratic delay model

Within 5 error compared to MC simulation

For the CMOS technology scaling process variation has become a potential show-stopper if not appropriately handled These variations introduce significant uncertainty for both circuit performance and leakage power Statistical modeling analysis and optimization for VLSI circuits has thus become the frontier research topic in recent years in combating such variation effects

As the process advances to nanometer technologies and low-energy embedded applications are explored for FPGAs power consumption becomes a crucial design constraint for FPGAs It is well known architecture and device setting have great impact on FPGA power and performance However how to perform statistical optimization considering both device and architecture has not been solved by previous works In addition some reliability issues such device aging and soft error rate (SER) may affect the performance of FPGAs Such impact was not considered in the previous works either

Besides FPGAs statistical modeling and analysis for ASICs are also hot research topics There are many works on statistical timing and power modeling and analysis However how to efficiently perform statistical static timing analysis (SSTA) for non-linear delay model with non-Gaussian variation sources is still a hard problem

Moreover most of statistical analysis assumes independent variation sources and apply principle component analysis (PCA) or independent component analysis (ICA) to decompose dependent variation sources However some of the variation sources are non-linearly dependent such as Leff and Vth In this case the linear operation (such as PCA or ICA) cannot completely remove dependence How to handle the non-linear dependent variation sources is another unsolved problem

Spatial correlation is another concern in statistical analysis Many recent works try to model spatial correlation as a function of distance However some recent research work observe that the spatial correlation mainly comes from the deterministic across wafer variation and the pure random spatial variation is not significant Modeling across wafer variation is also a challenge problem

PhDrsquo09 Statistical Modeling and Optimization for VLSI Circuits

Student Lerong Cheng (lerongeeuclaedu) Advisor Lei He Co-advisor Puneet GuptaEDA Lab (httpedaeeuclaedu) Electrical Engineering Department UCLA

L Cheng P Wong F Li Y Lin and L He ldquoDevice and Architecture Co-Optimization for FPGA Power Reductionrdquo DAC 2005PWong L Cheng Y Lin and L He ldquoFPGA Device and Architecture Evaluation Considering Process Variationrdquo ICCAD 2005L Cheng J Xiong and L He ldquoFPGA Performance Optimization via Chipwise Placement Considering Process Variationsrdquo FPL 2006L Cheng J Xiong and L He Non-Linear Statistical Static Timing Analysis for Non-Gaussian Variation Sources DAC 2007

L Cheng J Xiong and L He ldquoNon-Gaussian Statistical Timing Analysis Using Second-Order Polynomial Fittingrdquo ASPDAC 2008L Cheng YLin L He and Y Cao ldquoTrace-Based Framework for Concurrent Development of Process and FPGA Architecture Considering Process Variation and Reliabilityrdquo ISFPGA 2008L Cheng P Gupta and L He ldquoAccounting for Non-linear Dependence Using Function Driven Component Analysisrdquo ASPDAC 2009

bullCollaborators Dr Jinjun Xiong Dr Yan Lin Dr Fei Li and Miss Phoebe Wong

Introduction

Block-based SSTA operationsndashAdd (simple)ndashMax (hard)

Core operation of SSTA

Delay modelndashLinear efficient but not accurate

ndashQuadratic accurate but not efficient

ndashQuadratic without crossing term (semi-quadratic) efficient and somewhat accurate

Analysis of Non-Linear Dependence

Statistical Modeling and Optimization for FPGAs

References amp Collaborators

Statistical Static Timing Analysis

Modeling of Across-Wafer Variation

Switching activityRatio of short circuit

powerCritical path structure

Circuit element statistics Area

Chip level area

delay and power

Circuit level delay and power

VPRPsim

Trace collection

Device dependent

Device independent

FPGA chipwise placement for timing optimization Concurrent design of process and FPGA architecture

ndashDevelop process and architecture concurrently in order to shorten the time to market

ndashNeed to estimate FPGA power and delay from process parameters

Ptrace2

ndashBased on ITRS Mastar4 transistor model

Analysis result

ndashDevice aging leads to 85 delay degradation after 10 years

ndashNeither device aging nor process variation has impact on SER

ndash Programmability of FPGAs offer a unique opportunity to leverage process variation and improve circuit performance

ndashPerform placement according to the chipwise variation maps

ndashImprove performance up to 121

Across-wafer variation can be approximated as quadratic function

After subtracting the across-wafer variation purely random spatial correlation is not significant

In the die point of view the within wafer is spatially correlated

ndashThis variation is not purely random

ndashCannot be modeled as random correlated variation

New Variation Model

ndashExactly model the across wafer variation

ndashOnly 4 random variables Xc Yc mw and r

ndashMore accurate and efficient than spatial l variation model

1 2 3 4 5 6 7 8 9

1

2

3

4

5

6

7

8

9

10

11

12

Frequency wafer 24 Layout 1

08

085

09

095

1

105

11

115

Loca-tion

Our Model Spatial Correlation model

micro σ 95 T (s) micro σ 95 T (s)

LL-C +07 +11 +05 153 +24 +15 +52 154 (101X)

LR-C +02 +11 -02 147 +00 +88 -14 155 (105X)

UL-C -02 -06 +01 152 -02 +76 -01 153 (101X)

UR-C +02 +06 +01 149 -07 +48 -13 152 (102X)

Target function f(X1 X2)

Samples of X1 X2

Joint moments of X1 X2

Error of moments of f as function of transfer matrix

Nonlinear programming

Minimizing error of moments of f

Transfer matrix W

Target function f(X1 X2)

Samples of X1 X2

Transfer matrix W Joint moments of X1 X2

Moments of P1 P2 ρij of P1 P2Function of P1 P2

g(P1 P2)

Result with Correct dependence

Result assuming ρij =0

Error

Linear operation is used to decomposed dependent variation sources

Not accurate with existence of non-linear dependence

Need to estimate the error introduced by ignoring non-linear dependence

Define high order correlation coefficient

Circuit delay Comparison

Wafer frequency

Wafer leakage

Across-wafer variation is looked on as spatial correlated in the die point of view

PDF comparison

Approximate max operation as second order polynomial

PDF comparison

Fourier Series approximation of PDF

Performance improve under different utilization rate Performance improvement histogram

PT

RA

CE

  • Slide 1