An Efficient Graph Sparsification Approach to Scalable ...

24
An Efficient Graph Sparsification Approach to Scalable Harmonic Balance (HB) Analysis of Strongly Nonlinear RF Circuits Design Automation Group Department of Electrical & Computer Engineering Michigan Technological University Authors : Lengfei Han (Speaker) Xueqian Zhao Dr. Zhuo Feng (Advisor)

Transcript of An Efficient Graph Sparsification Approach to Scalable ...

1

An Efficient Graph Sparsification Approach to Scalable Harmonic Balance (HB) Analysis of Strongly Nonlinear

RF Circuits

Design Automation Group

Department of Electrical & Computer EngineeringMichigan Technological University

Authors : Lengfei Han (Speaker)Xueqian ZhaoDr. Zhuo Feng (Advisor)

2

Motivation

Traditional harmonic balance methods for RF simulation– Solve large yet non-sparse Jacobian matrices

• Direct solution methods– Take excessively long simulation time– Consume huge amount of memory resources

• Iterative solution methods– Rely on high-quality preconditioner

Traditional iterative methods’ preconditioners – Matrix-oriented, may miss important circuit information

– Inefficient for strongly nonlinear system

3

Prior Works

Prior RF circuit HB simulation approaches– Direct solution method(A. Mehrotra et al, DAC’09)

• Handles strongly nonlinear RF circuit• Consumes large computational resources

– Block-diagonal averaging preconditioner (P. Feldmann et al, CICC’96)• Fast and memory efficient• Limited to weakly nonlinear systems

– Hierarchical HB preconditioner (W. Dong et al, TCAD’09)• Suitable for parallel computing• Bad performance when handling strongly-nonlinear systems

4

Harmonic Balance

Harmonic balance refers to balancing the current between linear and nonlinear portions at every harmonic frequency.

Harmonic balance process

Nonlinear Diff. Equations

Convert to nonlinear equationof Fourier coefficients

“Approximate” steadystate solution

TimeDomain

5

Harmonic Balance Analysis(1)

Non-autonomous circuit analysis[1]

𝑥𝑥(𝑡𝑡): State variables𝑦𝑦 : Matrix-valued impulse response function of frequency-domain

linear circuit components

𝑞𝑞 : Function for the nonlinear charge and flux

𝑓𝑓 (): Static(memoryless) nonlinearities

𝑏𝑏 : Time-dependent excitations

[1] K. S. Kundert and A. Sangiovanni-vincentelli. Simulation of Nonlinear Circuits in the Frequency Domain, CAD, 1986

( ) 0)())(())(()( =+++−∫ ∞−tbtxf

dttxdqdssxsty

t

6

Harmonic Balance Analysis(2)

HB Jacobian matrix[1]

– Γ and Γ−1 represent the Fast Fourier Transform(FFT) and Inverse Fast Fourier Transform(IFFT) respectively

– 𝐺𝐺 and 𝐶𝐶 denote the linearization of 𝑞𝑞()and𝑓𝑓()at s time domain sampled points

– 𝐽𝐽ℎ𝑏𝑏 includes lots of dense blocks introduced by Γ𝐺𝐺Γ−1 and Γ𝐶𝐶Γ−1

[1] K. S. Kundert and A. Sangiovanni-vincentelli. Simulation of Nonlinear Circuits in the Frequency Domain, CAD, 1986

1102 −− ΓΓ+ΓΩΓ+= GCfjYJhb π

∂∂

∂∂

∂∂

=

St

t

t

xq

xq

xq

C

2

1

∂∂

∂∂

∂∂

=

St

t

t

xf

xf

xf

G

2

1

7

Our Proposed SCPHB Method

Our proposed method: support-circuit preconditioned HB (SCPHB) iterative solver:

– Effective for solving RF nonlinear circuits

– Scalable linearized RF circuit sparsification

– Circuit-oriented preconditioner generation

– Adaptive support-circuit sparsification

– Matrix-free iterative solver

8

Graph Sparsification Techniques

General linear circuit analysis problems can be converted to equivalent weighted, undirected graph problems

The Laplacian matrix A of a graph – Defined by the quadratic form it induces, which is also known as the

admittance matrix in circuit theory

),,( wEVG =

𝑉𝑉 : a set of vertices𝐸𝐸 : a set of edges𝑤𝑤 : a weight function that assigns a positive weight to every edge

∑∈

−=Eds

dsT dxsxwAxx

),(

2, ))()((

9

Graph Sparsification Techniques (cont.)

Graph sparsifier 𝐺𝐺𝐺– Sparse subgraph of 𝐺𝐺 can approximate 𝐺𝐺 in some measure(pairwise

distance, cut values or the graph Laplacian)

– The goal of graph sparsification is to approximate a given graph 𝐺𝐺 by𝐺𝐺’ on the same set of vertices such that 𝐺𝐺’ can be used as a proxy for

𝐺𝐺 in numerical computations without introducing too much error. A good sparsifier should have very few edges that will

immediately result in significantly reduced computation and storage cost

Figure source: L. Koutis, G. L. Miller and R. Peng. A fast solver for a class of linear systems. Commun. ACM, 2012

10

Good sparsifier: has fewer edges, significantly reduced computation and storage cost

Our observation– Modify node analysis (MNA) matrix entries reduction: 20% ~ 38%

– Fill-ins during LU reduction: 60% LU factorization Speedup: 50X

Matrix Sparsification Benefit

11

Support Graph Preconditioners

Spanning-tree support graph as a preconditioner[2]– May not be efficient for ill-conditioned system

– Reduces overall conductivities of the resistive network

– Mismatches the power dissipation between original graph and the spanning tree graph

Spanning tree

Edges of spanning tree graph

Original graph

Edges of original graph

[2] X. Zhao, J. Wang, Z. Feng and S. Hu. Power grid analysis with hierarchical support graphs. In Proc. ACM ICCAD, 2011.

12

Support Graph Preconditioners(cont.)

Ultra-sparsifier support graph as a preconditioner[3]– Adds critical extra edges to spanning tree

– Has better approximation in both eigenvalues and power dissipation

– Introduces more fill-ins during LU factorization

Spanning tree

Edges of spanning tree graph Extra edges

Ultra-sparsifier

[3] X. Zhao and Z. Feng. GPSCP: A General-Purpose Support-Circuit Preconditioning Approach to Large-Scale SPICE-Accurate NonlinearCircuit Simulations. In Proc. IEEE/ACM ICCAD, 2012.

13

Adaptive Support-Circuit Sparsification

Total simulation runtime

GMRESLUtotal TNTT ⋅+=

N : Total GMRES iteration number

: Preconditioner LU factorization runtimeLUT

: One GMRES iteration runtimeGMREST

Adaptive sparsification control– If N is large then maintain more edges

– If N is small then further sparsify the matrix

LUT

GMRESTN ⋅

Dense graph

LUT

GMRESTN ⋅

Sparse graph

14

Flowchart of Proposed Approach

Device evaluation

Support-circuitpreconditioner

Preconditionerfactorization

GMRES iterations

Convergence checking

Start

End

NR

Decompose MNA matrix to Passive and active matrices

1. Construct representative passive matrix

2. Extract sparsification pattern3. Sparsify MNA Matrix4. Generate Support-circuit

preconditioner

Block-based LU decomposition

Matrix-free iterative solver

15

Support Circuit Preconditioner Construction

Step 1: Linearized Circuit Decomposition– Support-graph sparification : symmetric, diagonally dominant matrix – Passive Matrix(P): passive devices such as resistors, capacitors, inductors– Active Matrix(A): active devices such as transconductances, sources

M1

L1

R1L2C2

C1

R2

RF Circuit

Linearized Circuit at t1

Linearized Circuit at ts

. . .

P t1

A t1

L1

R1L2C2

C1Cgd

Cgs gdsCgs

gmVgs

R2

1 23

4

5

L1

R1L2C2

C1Cgd

Cgs gdsCgs

gmVgs

R2

1 23

4

5

P ts

A tsNote: t1~ts are s time sampled time points

16

Support Circuit Preconditioner Construction(cont.)

Step 2: Representative Passive Matrix Construction– Different sampled time points have different entry values– Normalize all sampled time points passive matrix – Average all scaled passive matrices

P t1 P t2 P ts

Representative Passive Matrix

Normalize Average

17

Support Circuit Preconditioner Construction(cont.)

gds+Cds/h

C1/hCgd/h

31

4g2

Cgs/h

g1+C2/h

5

2

Representative Passive Matrix Original Weighted Graph Ultra-Sparsifier

Sparsified Representative Passive Matrix

Active MatrixSparsification pattern Matrix

C1/hCgd/h

31

4g2

5

2

g1+C2/h

gds+Cds/h

Step 3: Sparsification Pattern Extraction– Convert matrix to weighted graph– Sparsify the weighted graph and Convert back to matrix– Combine with Active matrix

18

Support Circuit Preconditioner Construction(cont.)

System MNA Matrix t1

Sparsification pattern Matrix

System MNA Matrix t2

System MNA Matrix ts

Sparsified SystemMNA Matrix t1

Sparsified system MNA Matrix t2

Sparsified system MNA Matrix ts

… …

Step 4: MNA Matrix Sparsification

19Support circuit preconditionerPermuted matrix

Circulant matrix review

Step 5: Support circuit block preconditioner generation– Original matrix : all variables of a single harmonic grouped together

– Permuted matrix: all the harmonics of a single variable grouped together

Support Circuit Preconditioner Construction(cont.)

=Γ⋅⋅Γ −

12

1

21

1

GGG

GGGGG

G

s

s

s

=

sg

gg

G

2

1

TsGGG ],,,[ 21

Tsggg ],,,[ 21

FFT

Permutation FFT

Sparsified MNA matrix

20

Block Sparse Matrix LU Factorization

Test matrix– Has same sparsity structure as the MNA matrix

– Has representative entries of all sampled time points MNA matrices

– Approximates the properties of block sparse matrix

– Has same permutation and pivoting pattern with block sparse matrix LU factorization

Block sparse matrix LU factorization– Applies permutation and pivoting pattern to block sparse matrix

– Performs LU factorization w/o pivoting

– Uses LAPACK/BLAS for matrix dense block multiplication and division

Matrix-free iterative solver– Implicit system Jacobian matrix

– Explicit preconditioner matrix which has limited entries

21

Experiment Setup

Note:• Freqs: Number of harmonics• Nunk: Number of unknowns

CKT Name Nodes Tones Freqs Nunk1 mixer 1 302 2 25 147982 mixer 2 1988 2 41 1610283 mixer 3 5262 2 5 473584 mixer 4 7532 2 13 1883005 LNA + mixer 1 343 3 63 428756 LNA + mixer 2 5303 3 14 1431817 LNA + mixer 3 7573 3 14 204471

Widely used RF circuits as the benchmark

22

Runtime and Memory Efficiency

CKTDirect solver BD preconditioner SCPHB preconditioner

Time(s) Mem(GB) Time(s) K-Its Time(s) Mem(GB) K-Its1 471.9 0.18 24.9 821 138.7 0.05 2102 19263.1 3.46 5637.6 6731 2122.7 0.69 4083 686.4 0.32 92.2 165 54.2 0.04 2454 14153.5 3.84 1072.3 273 1492.4 0.33 2885 2561.6 3.18 DNF DNF 798.4 0.53 1336 4040.9 2.97 DNF DNF 394.3 0.33 2997 6633.6 2.65 DNF DNF 998.3 0.44 238

Support-circuit preconditioned HB(SCPHB) method– High robustness and efficiency

– Runtime speedup: 10X (compared with direct solver)

– Memory reduction: 8X(compared with direct solver)

K-Its : GMRES iteration numberDNF : Do not finish within 1000 Newton iterations

23

Near-constant runtime efficiency

Simulation runtime VS. input power of LNA+Mixer– BD preconditioner: increase exponentially

– SCPHB preconditioner: near-constant

24

Conclusion

A scalable Jacobian matrix solving method is proposed for tackling frequency-domain strongly nonlinear HB analysis

Our experimental results show that SCPHB method can attain:– Obtain up to 10X speedups in RF HB simulations

– Reduce up to 8X memory consumption

Key ideas :– Use ultra-sparsifier support circuit as the preconditioner

– Use block sparse LU matrix solver for factorizing the preconditioner

– Use matrix-free iterative solver

– Use adaptive sparsification control to get best overall runtime