Graph Sparsification Approaches to Scalable Integrated Circuit … · 2017-08-02 · 1 Graph...

45
Graph Sparsification Approaches to Scalable Integrated Circuit Modeling and Simulations Zhuo Feng ICSICT, Oct, 2014 Design Automation Group Acknowledgements: My PhD students Xueqian Zhao (MTU) and Lengfei Han (MTU)

Transcript of Graph Sparsification Approaches to Scalable Integrated Circuit … · 2017-08-02 · 1 Graph...

Page 1: Graph Sparsification Approaches to Scalable Integrated Circuit … · 2017-08-02 · 1 Graph Sparsification Approaches to Scalable . Integrated Circuit Modeling and Simulations. Zhuo

1

Graph Sparsification Approaches to Scalable Integrated Circuit Modeling and Simulations

Zhuo Feng

ICSICT, Oct, 2014

Design Automation Group

Acknowledgements: My PhD students Xueqian Zhao (MTU) and Lengfei Han (MTU)

Page 2: Graph Sparsification Approaches to Scalable Integrated Circuit … · 2017-08-02 · 1 Graph Sparsification Approaches to Scalable . Integrated Circuit Modeling and Simulations. Zhuo

2

Scalable SPICE-Accurate IC Simulations

+-

VinMp

Vref

Rf1

Rf2

Cout

Vout

Iout

Error Amp

Cur. Amp. Cf

If

IC

VG

VR VR

VRVR

Analog Circuit Blocks

Digital Circuit BlocksOriginal Circuit with

Analog and Digital Blocks

Motivation– Integrated circuit (IC) system that involves billions of transistors and

interconnect components needs to be accurately modeled and analyzed

Challenges in large-scale SPICE-accurate IC simulations– Computational cost grows rapidly with traditional direct solution methods

– Iterative solution methods need to be robust and efficient for general tasks

Power Delivery Network (PDN) w/ Embedded Voltage Regulators (VRs)

Page 3: Graph Sparsification Approaches to Scalable Integrated Circuit … · 2017-08-02 · 1 Graph Sparsification Approaches to Scalable . Integrated Circuit Modeling and Simulations. Zhuo

3

Background of SPICE Simulation Algorithms

Standard SPICE simulators rely on Newton-Raphson (NR) method– Step1: Linearize the nonlinear devices (transistors, diodes, etc)

– Step 2: Update the solution through NR iteration

( ) , ( )k kk k

x x

f qG x C xx x

δ δδ δ

= =

( ) ( ( )) ( ( )) ( ) 0dF x f x t q x t u tdt

= + + =

Problem formulation– Nonlinear differential equations

– f(.) and q(.) denote the static and dynamic nonlinearities, respectively

Jacobian of F(x)

Page 4: Graph Sparsification Approaches to Scalable Integrated Circuit … · 2017-08-02 · 1 Graph Sparsification Approaches to Scalable . Integrated Circuit Modeling and Simulations. Zhuo

4

Prior Works

Direct and iterative solvers have been used in SPICE simulations– Direct solver: LU decomposition (KLU [1])

– Expensive for large-scale post-layout IC problems due to the exponentially increased memory and runtime cost

– Krylov-subspace iterative methods: GMRES [2]– Pros: black box solver, good memory efficiency, high parallelism– Cons: problem dependent convergence properties, worse runtime

– ILU and domain-decomposition based preconditioners, etc

References:[1] T. Davis, et al. Algorithm 907: KLU, a direct sparse solver for circuit simulation problems. ACM Trans. Math. Softw., 2010.[2] Y. Saad, et al. GMRES: a generalized minimal residual algorithm for solving nonsymmetric linear systems. SIAM J. Sci. Stat. Comput., 1986.[3] D. A. Spielman, et al. Nearly-linear time algorithms for graph partitioning, graph sparsification, and solving linear systems. ACM STOC, 2004.[4] M. Bern, et al. Support-graph preconditioners. SIAM J. Matrix Anal. Appl., 2006.

Our contribution: a circuit-oriented preconditioning approach– Novel circuit-oriented preconditioners (compared to matrix-oriented ones )

– Rigorous mathematic foundation: graph sparsification research [3-4]

– Consistent performance when solving transistor-level nonlinear circuits

Page 5: Graph Sparsification Approaches to Scalable Integrated Circuit … · 2017-08-02 · 1 Graph Sparsification Approaches to Scalable . Integrated Circuit Modeling and Simulations. Zhuo

5

Graph Sparsification Techniques Graph sparsification basics

– Find a subgraph P approximating the original graph G in some measure (pairwise distance, cut values, graph Laplacian, etc)

– Maintain the same set of vertices such that P can be used as a proxy for G in numerical computations w/o introducing much error

– A good graph sparsifier should keep very few edges to limit the computation and storage cost

Figure source: L. Koutis, G. L. Miller and R. Peng. A fast solver for a class of linear systems. Commun. ACM, 2012

G P

Page 6: Graph Sparsification Approaches to Scalable Integrated Circuit … · 2017-08-02 · 1 Graph Sparsification Approaches to Scalable . Integrated Circuit Modeling and Simulations. Zhuo

6

Support-graph preconditioner (SGP)– Example: find a spanning tree from the original graph

– Compute matrix factors w/o introducing any fill-ins for the spanning tree

The condition number of P-1G can be greatly reduced

1 2 3

4 5

1

987

6

42

4

6 5

49

8

1 3

3

1

2

3

4

5

6

7

8

9

2 0 1 0 0 0 0 02 4 0 3 0 0 0 00 4 0 0 8 0 0 01 0 0 6 0 4 0 00 3 0 6 5 0 1 00 0 8 0 5 0 0 30 0 0 4 0 0 9 00 0 0 0 1 0 9 40 0 0 0 0 3 0 4

dd

dd

dd

dd

d

Support-Graph Preconditioner

1

1

42

4

6 5

49

8

1 3

3

2 3

654

7 8 9

1

2

3

4

5

6

7

8

9

' 2 0 0 0 0 0 0 02 ' 4 0 0 0 0 0 00 4 ' 0 0 8 0 0 00 0 0 ' 6 0 4 0 00 0 0 6 ' 5 0 0 00 0 8 0 5 ' 0 0 00 0 0 4 0 0 ' 9 00 0 0 0 0 0 9 ' 40 0 0 0 0 0 0 4 '

dd

dd

dd

dd

d

Matrix 1st 2nd 3rd 4th 5th 6th condG 26.170 23.182 17.572 11.514 9.373 6.673 135.948P 25.239 23.540 17.579 10.909 9.865 6.822 16.752

P-1G 1.431 1.204 1.062 1.000 1.000 1.000 17.442

G P

Page 7: Graph Sparsification Approaches to Scalable Integrated Circuit … · 2017-08-02 · 1 Graph Sparsification Approaches to Scalable . Integrated Circuit Modeling and Simulations. Zhuo

7

A naïve support-circuit preconditioner (SCP)– Sparsifies the linear networks of the original circuit network

– Takes advantage of existing sparse matrix techniques (Cholesky, LU, etc)

– Nearly-linear complexity for analyzing nanoscale (parasitics-dominant) ICs– E.g. clock networks, power delivery networks, etc.

Support-Circuit Preconditioner

VR VR

VRVR

Digital Circuit Blocks

VR VR

VRVR

Support-Circuit Preconditioner

Support Graph of the Original Network

Page 8: Graph Sparsification Approaches to Scalable Integrated Circuit … · 2017-08-02 · 1 Graph Sparsification Approaches to Scalable . Integrated Circuit Modeling and Simulations. Zhuo

8

General-purpose support-circuit preconditioner (GPSCP)– Extracts sparsified network from the linearized circuit of the original circuit

– Leverages existing sparse matrix solution techniques

– Nearly-linear complexity for analyzing more general nonlinear circuit systems

Support-Circuit Preconditioner (Cont.)

Linearized Circuit

dsgdsCm gsg V

g

s

d

gsC

gdC

1g4g

3g 2g

5g

Nonlinear Circuit

dg

s

3R

4R 5R

1R

2R

dsgdsCm gsg V

g

s

dgdC

1g

3g 2g

5g

Support Circuit

Page 9: Graph Sparsification Approaches to Scalable Integrated Circuit … · 2017-08-02 · 1 Graph Sparsification Approaches to Scalable . Integrated Circuit Modeling and Simulations. Zhuo

9

Nonlinear Circuit

dg

s

3R

4R 5R

1R

2R

Support-Circuit Preconditioner Extraction (1)

Directed weighted graph corresponding to a linearized circuit – Can be obtained around an solution point during NR iterations

– Will be sparsified through graph decomposition and sparsification

Linearized Circuit

dsgdsCm gsg V

g

s

d

gsC

gdC

1g4g

3g 2g

5g1

Directed Weighted Graph

dsg dsChm gsg V

g

s

d

gsCh

gdCh

1g

2g3g

4g

5g

2 dsg dsCh

g

s

d

gsCh

gdCh

1g

2g3g

4g

5gUndirected Weighted Graph

3

Support Graph

dsg dsCh

g

s

dgdCh

1g

2g3g

5g

4

Page 10: Graph Sparsification Approaches to Scalable Integrated Circuit … · 2017-08-02 · 1 Graph Sparsification Approaches to Scalable . Integrated Circuit Modeling and Simulations. Zhuo

10

Controlling Sources

mgV

dsg dsCh

g

s

dgdCh

1g

2g

3g

5gSupport Graph

Support-Circuit Preconditioner Extraction (2)

Support-circuit preconditioner extraction– Combine support graph and other components (e.g. controlling sources)

– Factor the Jacobian matrix of the support circuit to create the preconditioner

dsg dsChm gsg V

g

s

dgdCh

1g

2g

3g

5g

Support Circuit

5

5

dsgdsCm gsg V

g

s

dgdC

1g

3g 2g

5g

6

Spt-CKT Spt-CKT

General-Purpose Support Circuit

7

Page 11: Graph Sparsification Approaches to Scalable Integrated Circuit … · 2017-08-02 · 1 Graph Sparsification Approaches to Scalable . Integrated Circuit Modeling and Simulations. Zhuo

11

Quality Quantification of Support Graph Preconditioners

Convergence of support-graph preconditioners– The convergence relies on the condition number of matrix pencil (G,P)

– The support of pencil (G,P) is defined as:

– Eigenvalues of pencil (G,P) are bounded by– A smaller means faster convergence

τ( , ) min | ( ) 0, all T nG P x P G x xσ τ τ= ∈ℜ − ≥ ∈ℜ

max

min

( , )( , )( , )G Pk G PG P

λλ

=

Spanning-tree support graph as a preconditioner– May require many iterations to converge if (mismatch) is too large

– can be estimated by comparing Joule heating of two resistive networks

Power dissipated by G:

Power dissipated by P:

Tx Gx

Tx Px

τ

ττ

Page 12: Graph Sparsification Approaches to Scalable Integrated Circuit … · 2017-08-02 · 1 Graph Sparsification Approaches to Scalable . Integrated Circuit Modeling and Simulations. Zhuo

12

Ultra-Sparsifier Support Graph (1)

Ultra-sparsifier (non-tree) support graphs– Ultra-sparsifier contains at most n-1+k edges (spanning tree + extra edges)

– It is k-ultra-sparse that -approximates the original graph with high probability [1]

– Adding extra edges to the spanning tree can better approximate the original graph (e.g. eigenvalues, power dissipations)

Spanning tree

Edges of spanning tree graph Extra edges

Ultra-sparsifier

[1] D. A. Spielman and S. Teng. Nearly-linear time algorithms for graph partitioning, graph sparsification, and solving linear systems. In Proc. ACM STOC, 2004.

Page 13: Graph Sparsification Approaches to Scalable Integrated Circuit … · 2017-08-02 · 1 Graph Sparsification Approaches to Scalable . Integrated Circuit Modeling and Simulations. Zhuo

13

Ultra-Sparsifier Support Graph (2)

Sparsity control of an ultra-sparsifier support graph– Provides tradeoffs between the quality and efficiency of preconditioners

– Weighted degree of a vertex v in a graph A is defined:

– Example: for a 2D-mesh grid, 1 ≤ wd(v) ≤ 4– If wd(v) ->1: one dominant edge – If wd(v) ->4 : four evenly critical edges

( )

( )( )max ( , )u neighbor v

vol vwd vw u v∈

=

vol(v): total weight incident to node vw(u,v): the weight of the edge connecting nodes v and u

Page 14: Graph Sparsification Approaches to Scalable Integrated Circuit … · 2017-08-02 · 1 Graph Sparsification Approaches to Scalable . Integrated Circuit Modeling and Simulations. Zhuo

14

Ultra-Sparsifier Support Graph (3)

Iterative ultra-sparsifier support graph construction– Define θ as the matching factor threshold (0 < θ < 1) of node weighted degree

Step 1• Compute weighted degree wd of each node

in the original graph A

Step 2• Compute the support graph A’ with

weighed degree wd’

Step 3• Recover edges to A’ until wd’/wd > θ for

each node in the support graph A’

Step 4• Return the final ultra-sparsifier support

graph A’ for support-circuit preconditioningExtra edges

Ultra-sparsifierSpanning tree

wd’/wd < θwd’/wd > θ

Page 15: Graph Sparsification Approaches to Scalable Integrated Circuit … · 2017-08-02 · 1 Graph Sparsification Approaches to Scalable . Integrated Circuit Modeling and Simulations. Zhuo

15

Performance Model Guided Sparsification

Runtime performance model can help find the optimal θ– Which is better: a denser or sparser support graph?

tot GMRES LUT N T T= ⋅ +

LUTGMRESTN ⋅

Denser preconditioner

1. Greater LU factorization time2. Less GMRES iterations

LUT

GMRESTN ⋅

Sparser preconditioner

1. Less LU factorization time2. More GMRES iterations

Goal: minimize Ttot by finding a proper matching factor threshold θ !

Total Runtime:

Page 16: Graph Sparsification Approaches to Scalable Integrated Circuit … · 2017-08-02 · 1 Graph Sparsification Approaches to Scalable . Integrated Circuit Modeling and Simulations. Zhuo

16

Finding the Optimal Weighted Degree Threshold θ Optimal weighted degree threshold θ

– Exploit symbolic matrix factorization results to quickly identify optimal θ– E.g. find θ that maximizes the flops change of Cholesky factorizations

Page 17: Graph Sparsification Approaches to Scalable Integrated Circuit … · 2017-08-02 · 1 Graph Sparsification Approaches to Scalable . Integrated Circuit Modeling and Simulations. Zhuo

17

Performance Modeling Results

Experiments results of IBM power grid benchmarks

Runtime and flops vs. weighted degree threshold θ

Runtime results of manual and automatic sparsification schemes

Page 18: Graph Sparsification Approaches to Scalable Integrated Circuit … · 2017-08-02 · 1 Graph Sparsification Approaches to Scalable . Integrated Circuit Modeling and Simulations. Zhuo

18

Test Cases for Experiments

CKT # nunk # Mos # R # C # L # I

ldo1 3M 84K 6M 250K 7K 250K

ldo2 5M 71K 10M 422K 12K 422K

pg1 3M 144 6M 250K 7K 250K

pg2 6M 144 11M 490K 14K 490K

clk1 3M 65K 6M 3M - -

clk2 6M 65K 11M 6M - -

Circuit Design Parameters:• #nunk: number of unknowns in the circuits• #Mos: number of MOSFET• #R: number of resistors• #L: number of inductors• #C: number of capacitors• #I: number of current sources

Three Circuit Design Types:• ldo: large PDNs with on-chip VRs• pg: large PDNs with power gating• clk: clock distribution network

Page 19: Graph Sparsification Approaches to Scalable Integrated Circuit … · 2017-08-02 · 1 Graph Sparsification Approaches to Scalable . Integrated Circuit Modeling and Simulations. Zhuo

19

Results of Performance Model Guided Sparsification

Experimental results for a large PDN with multiple VRs– Performance guided sparsification approach achieve nearly-optimal runtime

Runtime of a single NR step using different θ

Page 20: Graph Sparsification Approaches to Scalable Integrated Circuit … · 2017-08-02 · 1 Graph Sparsification Approaches to Scalable . Integrated Circuit Modeling and Simulations. Zhuo

20

Experimental Results

CKT #NR Direct GPSCPTime (s) #GMRES Time (s) Speedup

ldo1 237 279,629 4,130 15,368 18X

ldo2 314 - 3,979 23,793 -

pg1 222 108,784 3,381 10,204 11X

pg2 421 185,892 3,478 14,206 13X

clk1 132 50,688 1,452 3,493 14X

clk2 219 112,497 2,555 8,001 14X

• Runtime comparison for transient analysis (100-time-step)

• Memory comparisonCKT Direct GPSCP

ldo1 4.2GB 0.8GB/5X

ldo2 - 1.1GB/-

pg1 3.2GB 0.8GB/4X

pg2 7.8GB 1.6GB/5X

clk1 4.3GB 0.8GB/5X

clk2 10.0GB 1.4GB/7X

Page 21: Graph Sparsification Approaches to Scalable Integrated Circuit … · 2017-08-02 · 1 Graph Sparsification Approaches to Scalable . Integrated Circuit Modeling and Simulations. Zhuo

21

Experimental Results (2)

A large PDN with embedded multiple VRs

Page 22: Graph Sparsification Approaches to Scalable Integrated Circuit … · 2017-08-02 · 1 Graph Sparsification Approaches to Scalable . Integrated Circuit Modeling and Simulations. Zhuo

22

RF Simulation Methods For nonlinear RF circuits, output is usually quasi-periodic

– SPICE may require simulating many periods to reach steady state

– Time-domain shooting method can not handle distributed devices Harmonic Balance (HB) analysis for steady-state RF simulation

– HB analysis can capture the steady-state spectral response directly

– Harmonic balance also refers to balancing the current between linear and nonlinear portions at every harmonic frequency

Output may containfreqs. other than 0ω

( )t0cos ω

NonlinearCircuit

+v−

v Freq Domain, MHz

dB

Time Domain (ps)

Volta

ge (v

)

Page 23: Graph Sparsification Approaches to Scalable Integrated Circuit … · 2017-08-02 · 1 Graph Sparsification Approaches to Scalable . Integrated Circuit Modeling and Simulations. Zhuo

23

HB Analysis of RF Circuits

Non-autonomous circuit analysis[1]

: state variables

: impulse response function of linear circuit components

: dynamic nonlinearities

: static nonlinearities

: time-dependent excitation sources

[1] K. S. Kundert and A. Sangiovanni-vincentelli. Simulation of Nonlinear Circuits in the Frequency Domain, CAD, 1986

( )x t

( )q

( )f

( )b t

( )y t

are typically periodic functions( ),x t ( ),q ( )f

Page 24: Graph Sparsification Approaches to Scalable Integrated Circuit … · 2017-08-02 · 1 Graph Sparsification Approaches to Scalable . Integrated Circuit Modeling and Simulations. Zhuo

24

HB Analysis of RF Circuits (2) HB Jacobian matrix (frequency domain)

– and represent the Fast Fourier Transform(FFT) and Inverse Fast Fourier Transform(IFFT) respectively

– G and C denote the linearization of q() and f() at s time domain sampled points, (s=2k+1, k is positive frequencies number)

– includes lots of dense blocks introduced by

1102 −− ΓΓ+ΓΩΓ+= GCfjYJhb π

∂∂

∂∂

∂∂

=

St

t

t

xq

xq

xq

C

2

1

∂∂

∂∂

∂∂

=

St

t

t

xf

xf

xf

G

2

1

kI

kI

0

Γ 1−Γ

hbJ 1 1&C G− −Γ Γ Γ Γ

Page 25: Graph Sparsification Approaches to Scalable Integrated Circuit … · 2017-08-02 · 1 Graph Sparsification Approaches to Scalable . Integrated Circuit Modeling and Simulations. Zhuo

25

Challenges in Harmonic Balance (HB) Analysis Direct Methods for RF HB circuit simulation (A. Mehrotra et al, DAC’09)

– Challenged by solving large yet non-sparse Jacobian matrices– Cons: comp./memory cost grows quickly with circuit size

Traditional iterative methods for HB analysis (P. Feldmann et al, CICC’96, W. Dong et al, TCAD’09)

– Pros: black-box, matrix-oriented, memory-efficient– E.g. ILU preconditioner, domain-decomposition preconditioner

– Cons: inefficient/unreliable for strongly nonlinear RF systems

=Γ⋅⋅Γ −

12

1

21

1

GGG

GGGGG

G

s

s

s

=

sg

gg

G

2

1

TsGGG ],,,[ 21

Tsggg ],,,[ 21

FFT

Dense circulant matrices due to FFT/IFFT operations

Page 26: Graph Sparsification Approaches to Scalable Integrated Circuit … · 2017-08-02 · 1 Graph Sparsification Approaches to Scalable . Integrated Circuit Modeling and Simulations. Zhuo

26

From graph sparsification to Jacobian matrix sparsification– Modified nodal analysis (MNA) matrix reduction: 20% ~ 38% fewer entries

– Fill-ins during LU reduction: 60% LU factorization Speedup: 50X

Graph Sparsification Approach to HB Analysis

• • • • • ⇒• • • • • • •

MNA MatrixHB Jacobian Matrix

• × • • • × • × ⇒× × • × × • • × • • × × • × •

Fill-ins during LUBlock Fill-ins during LU

Before Graph Sparsification

• • • • ⇒• • • • •

MNA Matrix

HB Jacobian Matrix

• × • • • ⇒• × • • • × •

Fill-ins during LUBlock Fill-ins during LU

After Graph Sparsification

Page 27: Graph Sparsification Approaches to Scalable Integrated Circuit … · 2017-08-02 · 1 Graph Sparsification Approaches to Scalable . Integrated Circuit Modeling and Simulations. Zhuo

27

Conclusion Graph sparsification approaches to circuit simulations

– MNA matrix decomposition into Laplacian and Complement matrices

– Performance-guided graph sparsification of Laplacian matrix

– Support-circuit preconditioner construction

Our preliminary results– Highly reliable convergence for time/frequency domain simulations

– Up to 18X (21X) speedup and 7X (6X) memory reduction for time (frequency) domain simulations

– Scalable to large post-layout integrated circuits

Future work– Will explore spectral graph sparsification methods

– Will exploit heterogeneous CPU-GPU computing platforms

Page 28: Graph Sparsification Approaches to Scalable Integrated Circuit … · 2017-08-02 · 1 Graph Sparsification Approaches to Scalable . Integrated Circuit Modeling and Simulations. Zhuo

28

Nonlinear Devices Evaluation in HB

Evaluation of nonlinear devices Freq->Time: terminal voltage waveformsTime domain: evaluate current (derivative) waveformsTime->Freq: currents(derivatives) in freq. domain

Terminal voltage spectrum

IFFT/IAPDFT

Terminal voltage samples

Device evaluation Ids

samples

FFT/APDFT(Almost-Periodic DFT)

Ids spectrum

Terminal voltage samples– Need sampling at 2k+1 time points (k is the positive frequencies number)

according to Nyquist–Shannon sampling theorem.

Page 29: Graph Sparsification Approaches to Scalable Integrated Circuit … · 2017-08-02 · 1 Graph Sparsification Approaches to Scalable . Integrated Circuit Modeling and Simulations. Zhuo

29

Support-Circuit Preconditioner for HB Analysis Step 1: MNA matrix decomposition of linearized RF circuit

– Laplacian Matrix (P): passive devices such as resistors, capacitors, etc– Complement Matrix (A): active devices such as transconductances, etc

M1

L1

R1L2C2

C1

R2

RF Circuit

Linearized Circuit at t1

Linearized Circuit at ts

. . .

P t1

A t1

L1

R1L2C2

C1Cgd

Cgs gdsCgs

gmVgs

R2

1 23

4

5

L1

R1L2C2

C1Cgd

Cgs gdsCgs

gmVgs

R2

1 23

4

5

P ts

A tst1~ts are s time sampled time points

Page 30: Graph Sparsification Approaches to Scalable Integrated Circuit … · 2017-08-02 · 1 Graph Sparsification Approaches to Scalable . Integrated Circuit Modeling and Simulations. Zhuo

30

Support-Circuit Preconditioner for HB Analysis (2) Step 2: Representative Laplacian matrix construction

– Different sampled time points have different entry values– Normalize the scaled Laplacian matrices of all sampled time points

P t1 P t2 P ts

Representative Laplacian Matrix

Normalize Average

Page 31: Graph Sparsification Approaches to Scalable Integrated Circuit … · 2017-08-02 · 1 Graph Sparsification Approaches to Scalable . Integrated Circuit Modeling and Simulations. Zhuo

31

Support-Circuit Preconditioner for HB Analysis (3)

g1+C2/h

5

2

gds+Cds/h

C1/hCgd/h

31

4g2

Cgs/h

Representative Laplacian Matrix Original Weighted Graph Ultra Sparsifier

C1/hCgd/h

31

4g2

5

2

g1+C2/h

gds+Cds/h

Sparsified Representative Laplacian Matrix

Complement MatrixSparsification pattern Matrix

Step 3: Sparsification Pattern Extraction– Convert matrix to weighted graph– Sparsify the weighted graph and convert back to matrix form– Combine with the complement matrix

Page 32: Graph Sparsification Approaches to Scalable Integrated Circuit … · 2017-08-02 · 1 Graph Sparsification Approaches to Scalable . Integrated Circuit Modeling and Simulations. Zhuo

32

Support-Circuit Preconditioner for HB Analysis (4)

System MNA Matrix t1

Sparsification pattern Matrix

System MNA Matrix t2

System MNA Matrix ts

Sparsified SystemMNA Matrix t1

Sparsified system MNA Matrix t2

Sparsified system MNA Matrix ts

… …

Step 4: MNA Matrix Sparsification

Page 33: Graph Sparsification Approaches to Scalable Integrated Circuit … · 2017-08-02 · 1 Graph Sparsification Approaches to Scalable . Integrated Circuit Modeling and Simulations. Zhuo

33Support circuit preconditionerPermuted matrix

Circulant matrix in HB

Step 5: Support circuit block preconditioner generation– Original matrix : all variables of a single harmonic grouped together

– Permuted matrix: all the harmonics of a single variable grouped together

Support-Circuit Preconditioner for HB Analysis (5)

=Γ⋅⋅Γ −

12

1

21

1

GGG

GGGGG

G

s

s

s

=

sg

gg

G

2

1

TsGGG ],,,[ 21

Tsggg ],,,[ 21

FFT

Permutation FFT

Sparsified MNA matrix

Page 34: Graph Sparsification Approaches to Scalable Integrated Circuit … · 2017-08-02 · 1 Graph Sparsification Approaches to Scalable . Integrated Circuit Modeling and Simulations. Zhuo

34

Case Study : Double-balanced Gilbert Mixer MOSFET linearization model

[21]

[2]

[1] [8]

[16]

[25] [27]

[20] [7]

[15]

[13] [14]

[11] [18]

[22][17]

[4] [6]

M2M1

R7

M5

L1

L0

C0

Vlo+M3 M4

M6

R1

R3

R8

L2

R10

L3C1

R2

Vrf+ R5 Vrf-R6

Vlo-R4

VDD

[1] [8]

[21] [16]

[25] [27]

[20] [7]

[15]

[26]

[13] [14][11] [18]

[22][17]

[4] [6]

[2]

Linearized passive network (Laplacian matrix) extraction

RdsgmVgs gnVbs

D

S

G

B

Cgd

CgsG

B

S

D

[xx] denotes node index

Page 35: Graph Sparsification Approaches to Scalable Integrated Circuit … · 2017-08-02 · 1 Graph Sparsification Approaches to Scalable . Integrated Circuit Modeling and Simulations. Zhuo

35

Case Study : Double-balanced Gilbert Mixer (cont.) Ultra-sparsifier support graph construction

– Step 1: Extract maximum spanning tree

– Step 2: Restore critical edges until reaching a desired approximation

2

4 6

8 11

13 14

1 18

1621 17 22

25 27

2

4 6

8 11

13 14

1 18

1621 17 22

25 27

2

4 6

8 11

13 14

1 18

1621 17 22

25 27

Laplacian graph Maximum spanning tree Ultra sparsifier

Page 36: Graph Sparsification Approaches to Scalable Integrated Circuit … · 2017-08-02 · 1 Graph Sparsification Approaches to Scalable . Integrated Circuit Modeling and Simulations. Zhuo

36

HB Simulation Engine on CPU-GPU Platform

Device evaluation

Support-circuitpreconditioner

Preconditionerfactorization

GMRES iterations

Convergence checking

Start

End

NR

Decompose MNA matrix to Passive and active matrices

1. Performance modeling based sparsification configuration

2. Construct representative passive matrix

3. Extract sparsification pattern4. Sparsify MNA Matrix5. Generate Support-circuit

preconditioner

GPU-based block LU decomposition

Matrix-free iterative solver

Page 37: Graph Sparsification Approaches to Scalable Integrated Circuit … · 2017-08-02 · 1 Graph Sparsification Approaches to Scalable . Integrated Circuit Modeling and Simulations. Zhuo

37

Runtime Performance Modeling Lookup table (LUT) for runtime performance modeling

– 2D LUTs predict LU factorization runtime on GPU

– Two LUTs are created for GPU matrix multiplications and matrix divisions

Runtime performance lookup table for GPU-based matrix operations

Matrix operation batch size

Matrix size

Bilinear interpolation

Page 38: Graph Sparsification Approaches to Scalable Integrated Circuit … · 2017-08-02 · 1 Graph Sparsification Approaches to Scalable . Integrated Circuit Modeling and Simulations. Zhuo

38

Parallel Sparse Block LU Factorization Representative Sparsified MNA Matrix (test matrix)

– Approximates the properties of block sparse matrix– Created by averaging all sparsified MNA matrices– Factorized to get the fill-ins’ locations

Test matrix

Average

Sparsified SystemMNA Matrix t1

Sparsified system MNA Matrix t2

Sparsified system MNA Matrix ts

x

Fill-in

x

xx

x

LU L factor

U factor

Page 39: Graph Sparsification Approaches to Scalable Integrated Circuit … · 2017-08-02 · 1 Graph Sparsification Approaches to Scalable . Integrated Circuit Modeling and Simulations. Zhuo

39

Parallel Sparse Block LU Factorization (cont.) Data dependency graph

– Column k depends on column j, when U(j, k) != 0 [1]

– Can be derived from U matrix

1 2 3 4 5 6 7 8 9

1

2

3

4

5

6

7

8

9

0

02 1 0 6

4 5 3

7

8

9

Level 0

Level 1

Level 2

Level 3

Level 4

[1] J. Gilbert and T. Peierls. Sparse partial pivoting in time proportional to arithmetic operations. SIAM J. Sci. Stat. Comput., 9(5):862–873, 1988.

Page 40: Graph Sparsification Approaches to Scalable Integrated Circuit … · 2017-08-02 · 1 Graph Sparsification Approaches to Scalable . Integrated Circuit Modeling and Simulations. Zhuo

40

Parallel Sparse Block LU Factorization (cont.) Modified data dependency graph

– Identify “fake” dependency when L(j+1:n, j) == 0– Eliminate “fake” dependencies

1 2 3 4 5 6 7 8 9

1

2

3

4

5

6

7

8

9

0

0

2 1 0 6

4 5 37

89

Level 0

Level 1

Level 2

2 1 0 6

4 5 3

7

8

9

Level 0

Level 1

Level 2

Level 3

Level 4

Page 41: Graph Sparsification Approaches to Scalable Integrated Circuit … · 2017-08-02 · 1 Graph Sparsification Approaches to Scalable . Integrated Circuit Modeling and Simulations. Zhuo

41

Parallel Sparse Block LU Factorization (cont.) GPU-based block sparse

matrix LU factorizations– Levelize the factorization

according to data dependency graph

– Each level only contains matrix multiplication and division operations

– Use batched matrix multiplication and inversion functions provided by CUBLAS

2 1 0 6

4 5 37

89

Level 0

Level 1

Level 2

÷X X XX X XX X X

X X XX X XX X X

X X XX X XX X X

÷X X XX X XX X X

X X XX X XX X X

X X XX X XX X X

÷X X XX X XX X X

X X XX X XX X X

X X XX X XX X X

…Level 0

Level n

Result

×X X XX X XX X X

X X XX X XX X X

X X XX X XX X X

X X XX X XX X X

X X XX X XX X X

X X XX X XX X X

X X XX X XX X X

X X XX X XX X X

X X XX X XX X X

÷X X XX X XX X X

X X XX X XX X X

X X XX X XX X X

÷X X XX X XX X X

X X XX X XX X X

X X XX X XX X X

× ×

Page 42: Graph Sparsification Approaches to Scalable Integrated Circuit … · 2017-08-02 · 1 Graph Sparsification Approaches to Scalable . Integrated Circuit Modeling and Simulations. Zhuo

42

Experiment Setup

Note:• Freqs: Number of harmonics• Nunk: Number of unknowns

CKT Name Nodes Tones Freqs Nunk1 mixer 1 302 2 25 147982 mixer 2 1988 2 41 1610283 mixer 3 5262 2 5 473584 mixer 4 7532 2 13 1883005 LNA + mixer 1 343 3 63 428756 LNA + mixer 2 5303 3 14 1431817 LNA + mixer 3 7573 3 14 204471

Widely used RF circuits as the benchmark

Page 43: Graph Sparsification Approaches to Scalable Integrated Circuit … · 2017-08-02 · 1 Graph Sparsification Approaches to Scalable . Integrated Circuit Modeling and Simulations. Zhuo

43

Support-circuit preconditioned HB (SCPHB) method– High robustness and efficiency

– Runtime speedup: 21X (compared with direct solver in DAC’09)

– Memory reduction: 6X (compared with direct solver in DAC’09)

Runtime and Memory Efficiency on CPU

CKTDirect solver BD preconditioner SCPHB preconditioner

Time(s) Mem(GB) Time(s) K-Its Time(s) Mem(GB) K-Its Speedup

1 471.9 0.23 24.9 821 145.5 0.10 204 3.24X

2 19263.1 7.95 5637.6 6731 1408 1.72 383 13.7X

3 686.4 0.36 92.2 165 69.5 0.06 229 9.8X

4 14153.5 4.26 1072.3 273 1035.6 0.73 355 21.3X

5 2561.6 1.92 DNF DNF 821.5 1 194 3.1X

6 4040.9 3.34 DNF DNF 414.7 0.67 328 9.74X

7 6633.6 5.21 DNF DNF 791 0.83 255 8.38X

K-Its : GMRES iteration number; DNF : Do not finish within 1000 Newton iterations

Page 44: Graph Sparsification Approaches to Scalable Integrated Circuit … · 2017-08-02 · 1 Graph Sparsification Approaches to Scalable . Integrated Circuit Modeling and Simulations. Zhuo

44

Simulation runtime VS. input power of LNA+Mixer– BD preconditioner: runtime increases exponentially

– SCPHB preconditioner: runtime remains nearly constant

Runtime Efficiency for Strongly Nonlinearities

Page 45: Graph Sparsification Approaches to Scalable Integrated Circuit … · 2017-08-02 · 1 Graph Sparsification Approaches to Scalable . Integrated Circuit Modeling and Simulations. Zhuo

45

Scalability Nearly-linear runtime and memory scalability

(a) Runtime scalability (b) Memory scalability