Device and architecture co-optimization – Large search space – Need fast yet accurate power and...
-
date post
19-Dec-2015 -
Category
Documents
-
view
215 -
download
0
Transcript of Device and architecture co-optimization – Large search space – Need fast yet accurate power and...
Device and architecture co-optimization
ndashLarge search space
ndashNeed fast yet accurate power and delay estimator for FPGAs
Trace-based power and delay estimator (Ptrace)
Optimization result
ndashReduce energy delay product by 184 and area by 23
ndashLUT size 5 provides the maximum power and delay combined yield
1 2 3 4 5 6 7 8 9
1
2
3
4
5
6
7
8
9
10
11
12
ILeakN
wafer 24 Layout 1
-103
-102
-101
-1
-099
-098
-097
-096
-095
-094
Target function driven component analysis (FCA)
ndashGiven a target function f(X1 X2)
ndashFind out the linear decomposition matrix W to minimize the error mean variance and skewness of f() when ignoring high order dependencendashFCA has the same complexity as PCA and ICA but more accurate
Approximate max operation using second order polynomial
Works for all three delay models
More efficient and accurate than that using Fourier series
ndash20X faster than that using Fourier Series
ndashComputational complexity O(n3) for quadratic delay model O(n) for others
Within 2 error compared to MC simulation
Max operation using Fourier series approximation
ndashApproximate PDF of variation sources by Fourier Series
ndashApply moment matching to reconstruct the canonical form of max operation
All operations are based on either close form formulae or lookup table
ndashComputational complexity O(nK2)
Only works for linear and semi-quadratic delay model
Within 5 error compared to MC simulation
For the CMOS technology scaling process variation has become a potential show-stopper if not appropriately handled These variations introduce significant uncertainty for both circuit performance and leakage power Statistical modeling analysis and optimization for VLSI circuits has thus become the frontier research topic in recent years in combating such variation effects
As the process advances to nanometer technologies and low-energy embedded applications are explored for FPGAs power consumption becomes a crucial design constraint for FPGAs It is well known architecture and device setting have great impact on FPGA power and performance However how to perform statistical optimization considering both device and architecture has not been solved by previous works In addition some reliability issues such device aging and soft error rate (SER) may affect the performance of FPGAs Such impact was not considered in the previous works either
Besides FPGAs statistical modeling and analysis for ASICs are also hot research topics There are many works on statistical timing and power modeling and analysis However how to efficiently perform statistical static timing analysis (SSTA) for non-linear delay model with non-Gaussian variation sources is still a hard problem
Moreover most of statistical analysis assumes independent variation sources and apply principle component analysis (PCA) or independent component analysis (ICA) to decompose dependent variation sources However some of the variation sources are non-linearly dependent such as Leff and Vth In this case the linear operation (such as PCA or ICA) cannot completely remove dependence How to handle the non-linear dependent variation sources is another unsolved problem
Spatial correlation is another concern in statistical analysis Many recent works try to model spatial correlation as a function of distance However some recent research work observe that the spatial correlation mainly comes from the deterministic across wafer variation and the pure random spatial variation is not significant Modeling across wafer variation is also a challenge problem
PhDrsquo09 Statistical Modeling and Optimization for VLSI Circuits
Student Lerong Cheng (lerongeeuclaedu) Advisor Lei He Co-advisor Puneet GuptaEDA Lab (httpedaeeuclaedu) Electrical Engineering Department UCLA
L Cheng P Wong F Li Y Lin and L He ldquoDevice and Architecture Co-Optimization for FPGA Power Reductionrdquo DAC 2005PWong L Cheng Y Lin and L He ldquoFPGA Device and Architecture Evaluation Considering Process Variationrdquo ICCAD 2005L Cheng J Xiong and L He ldquoFPGA Performance Optimization via Chipwise Placement Considering Process Variationsrdquo FPL 2006L Cheng J Xiong and L He Non-Linear Statistical Static Timing Analysis for Non-Gaussian Variation Sources DAC 2007
L Cheng J Xiong and L He ldquoNon-Gaussian Statistical Timing Analysis Using Second-Order Polynomial Fittingrdquo ASPDAC 2008L Cheng YLin L He and Y Cao ldquoTrace-Based Framework for Concurrent Development of Process and FPGA Architecture Considering Process Variation and Reliabilityrdquo ISFPGA 2008L Cheng P Gupta and L He ldquoAccounting for Non-linear Dependence Using Function Driven Component Analysisrdquo ASPDAC 2009
bullCollaborators Dr Jinjun Xiong Dr Yan Lin Dr Fei Li and Miss Phoebe Wong
Introduction
Block-based SSTA operationsndashAdd (simple)ndashMax (hard)
Core operation of SSTA
Delay modelndashLinear efficient but not accurate
ndashQuadratic accurate but not efficient
ndashQuadratic without crossing term (semi-quadratic) efficient and somewhat accurate
Analysis of Non-Linear Dependence
Statistical Modeling and Optimization for FPGAs
References amp Collaborators
Statistical Static Timing Analysis
Modeling of Across-Wafer Variation
Switching activityRatio of short circuit
powerCritical path structure
Circuit element statistics Area
Chip level area
delay and power
Circuit level delay and power
VPRPsim
Trace collection
Device dependent
Device independent
FPGA chipwise placement for timing optimization Concurrent design of process and FPGA architecture
ndashDevelop process and architecture concurrently in order to shorten the time to market
ndashNeed to estimate FPGA power and delay from process parameters
Ptrace2
ndashBased on ITRS Mastar4 transistor model
Analysis result
ndashDevice aging leads to 85 delay degradation after 10 years
ndashNeither device aging nor process variation has impact on SER
ndash Programmability of FPGAs offer a unique opportunity to leverage process variation and improve circuit performance
ndashPerform placement according to the chipwise variation maps
ndashImprove performance up to 121
Across-wafer variation can be approximated as quadratic function
After subtracting the across-wafer variation purely random spatial correlation is not significant
In the die point of view the within wafer is spatially correlated
ndashThis variation is not purely random
ndashCannot be modeled as random correlated variation
New Variation Model
ndashExactly model the across wafer variation
ndashOnly 4 random variables Xc Yc mw and r
ndashMore accurate and efficient than spatial l variation model
1 2 3 4 5 6 7 8 9
1
2
3
4
5
6
7
8
9
10
11
12
Frequency wafer 24 Layout 1
08
085
09
095
1
105
11
115
Loca-tion
Our Model Spatial Correlation model
micro σ 95 T (s) micro σ 95 T (s)
LL-C +07 +11 +05 153 +24 +15 +52 154 (101X)
LR-C +02 +11 -02 147 +00 +88 -14 155 (105X)
UL-C -02 -06 +01 152 -02 +76 -01 153 (101X)
UR-C +02 +06 +01 149 -07 +48 -13 152 (102X)
Target function f(X1 X2)
Samples of X1 X2
Joint moments of X1 X2
Error of moments of f as function of transfer matrix
Nonlinear programming
Minimizing error of moments of f
Transfer matrix W
Target function f(X1 X2)
Samples of X1 X2
Transfer matrix W Joint moments of X1 X2
Moments of P1 P2 ρij of P1 P2Function of P1 P2
g(P1 P2)
Result with Correct dependence
Result assuming ρij =0
Error
Linear operation is used to decomposed dependent variation sources
Not accurate with existence of non-linear dependence
Need to estimate the error introduced by ignoring non-linear dependence
Define high order correlation coefficient
Circuit delay Comparison
Wafer frequency
Wafer leakage
Across-wafer variation is looked on as spatial correlated in the die point of view
PDF comparison
Approximate max operation as second order polynomial
PDF comparison
Fourier Series approximation of PDF
Performance improve under different utilization rate Performance improvement histogram
PT
RA
CE
- Slide 1
-