ccf.ee.ntu.edu.twccf.ee.ntu.edu.tw/~cchen/research/rong_thesis.pdfccf.ee.ntu.edu.tw

151
Three-Dimensional Interconnect Modeling for Nano-Scale VLSI Technologies by Rong Jiang Dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy (Electrical Engineering) at the UNIVERSITY OF WISCONSIN-MADISON 2006

Transcript of ccf.ee.ntu.edu.twccf.ee.ntu.edu.tw/~cchen/research/rong_thesis.pdfccf.ee.ntu.edu.tw

Three-Dimensional Interconnect Modeling for Nano-Scale

VLSI Technologies

by

Rong Jiang

Dissertation submitted in partial fulfillment of

the requirements for the degree of

Doctor of Philosophy

(Electrical Engineering)

at the

UNIVERSITY OF WISCONSIN-MADISON

2006

c© Copyright by Rong Jiang 2006

All Rights Reserved

i

Abstract

Designing high performance very large scale integration (VLSI) circuits has become more chal-

lenging than ever due to deep sub-micron effects and accelerating time-to-market cycles. With

the increasing interconnect delay dominance and strong coupling effects, a small change in the

design can cause new timing violations and result in design iterations. At the same time, the in-

dustry trend of integrating higher levels of circuit functionality on one chip and the widespread

growth of wireless communication have triggered the proliferation of mixed analog-digital sys-

tems. The digital and the analog components share a common lossy substrate, which provides

an alternative path for the current to reach difference devices and leads to more significant

electro-magnetic couplings. Furthermore, the ever-increasing complexity of VLSI designs and

integration circuit (IC) process technologies increases the mismatch between a circuit fabri-

cated on the wafer and the one designed in the layout tool. Process induced variations can make

the circuit performance deviate from the design specification and timing-convergence is getting

harder and harder to achieve.

Therefore, to perform fast circuit analysis and optimization, efficient extraction of compact

yet accurate lumped circuit models of on-chip interconnect has become an extremely crucial

ii

part in modern VLSI design. The central theme of this thesis is the fast capacitance extrac-

tion with the consideration of process variations and the interconnect inductive effect modeling

subject to multi-layer lossy substrate effects. The main contributions of the work are as follows:

• ICCAP , a linear-time hierarchical three dimensional (3D) capacitance extraction algo-

rithm. ICCAP proposes a novel method to sparsify and reorder the dense linear system

associated with boundary element method (BEM) capacitance extraction, and hence the

new sparse system can be solved efficiently by preconditioned Krylov subspace itera-

tive methods, such as generalized minimum residue method (GMRES) or preconditioned

conjugate gradient method (PCG).

• STATCAP, a statistical capacitance extraction algorithm considering process induced

variations. By utilizing the efficiency of ICCAP, STATCAP develops a systematic way to

model the surface fluctuation of interconnect geometries and generate explicit quadratic

form representation for parasitic capacitances. The quadratic expression can be directly

integrated into statistical timing analysis and statistical model order reduction to perform

further analysis.

• LITHSIM , a fast aerial image simulation for modeling process variations introduced in

lithography process. By exploiting the regular structures inherent in IC mask patterns,

LITHSIM avoids the sampling process in two dimensional (2D) discrete Fourier trans-

formation or discrete convolution and eliminates aliasing error by providing a close-form

analytic formula to directly generate two dimensional mask image.

iii

• EPEEC, a compact interconnect inductive effect modeling algorithm considering lossy

substrate. Based on the complex image theory, EPEEC extends the traditional partial

equivalent element circuit (PEEC) model to simultaneously take multi-layer substrate

eddy current losses and frequency dependent effects into consideration. To accommo-

date even larger scale on-chip interconnect networks, EPEEC develops a new SPICE-

compatible reluctance extraction algorithm by applying sparsification in the inverse in-

ductance domain with an extended window algorithm.

Those validated interconnect parasitic extraction and modeling algorithms can be easily

integrated into general design tools. We hope that by transferring the proposed algorithms into

the realm of production, these building blocks serve the goal of design for manufacturability in

the state-of-the-art VLSI circuits and can improve the fabrication yield and circuit efficiency in

the long term.

iv

Acknowledgements

First and foremost, I would like to express my deepest gratitude and appreciation to my re-

search advisor, Professor Charlie Chung-Ping Chen, the real professor in my life, for his super-

excellent guidance and tremendous support, and for the opportunities he has created for me

during my graduate study and research at University of Wisconsin, Madison. His vision and

leadership in the semiconductor computer aided design industry has been inspiring to both my

research work and career development. I sincerely thank him for his consistent supervision

and enlightenment in every detail of my research and education at University of Wisconsin,

Madison.

I am thankful to Professor Franco Cerrina, Professor Michael J. Schulte, Professor Kewal

K. Saluja, Professor Parameswaran Ramanathan, Professor Yu Hen Hu, and Professor Shi Jin

for reviewing my dissertation and serving as committee members in my preliminary exams and

defense. Their insightful inputs to this work and expertise in the field of semiconductor and

mathematics have provided me strong support throughout this process.

I would like to sincerely thank my former advisor, Professor Zhiquan Wang at Nanjing

University of Science & Technology for his mentoring, encouragement, support and consistent

v

help. His guidance will be an unerasable part in my life.

I would like to thank Chin-Chi Teng, Pinhong Chen, Eddy Pramono, Yu Zheng and Jin

Zhang for their help and support and sharing their knowledge and expertise during my work at

Cadence Design Systems.

Special thanks to Professor Janet Meiling Wang at University of Arizona, Tucson, my col-

leagues Wenyin Fu and Yi-Hao Chang at National Taiwan University, Mr. Vince Lin from

Springsoft for their tremendous help with me during my research at University of Wisconsin,

Madison. I also deeply thank all the past and present members at the University of Wisconsin,

Madison VLSI-EDA group, Yu-Min Lee, Tsung-Hao Chen, Ting-Yuan Wang, Jeng-Liang Tsai,

and Sanghamitra Roy, for their best friendship, help, and support. I would like to thank all my

Chinese and international friends from different parts of the small world. They made my life in

the United States colorful and enjoyable.

I would like to thank my mom and dad and other members in my big family for their love

during my years in graduate school. Their care always provides the warmest support in my life

and work, wherever I am.

Most importantly, I would like to thank my dear wife, Yi Zhou, for her companion and

love during the last a few years. Together we have managed to lots of meaningful things done

and overcome many difficulties. I deeply thank her for her love, understanding, and consistent

support. Without her love and encouragement, this thesis wouldn’t be possible. I look forward

to enjoying a better and better life with her in the rest of my whole life.

This work was partially funded by National Science Foundation under grants CCR-0093309

vi

& CCR-0204468 and National Science Council of Taiwan, R.O.C. under grant NSC 92-2218-

E-002-030 and by the following participating companies: Intel, TSMC, UMC, Faraday, and

SpringSoft.

Contents

Abstract i

Acknowledgements iv

1 Introduction 1

1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Capacitance Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.3 Inductance Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

1.4 Thesis Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2 Linear Time Hierarchical Capacitance Extraction – ICCAP 11

2.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.1.1 Capacitance Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.1.2 Boundary Element Method . . . . . . . . . . . . . . . . . . . . . . . . 13

2.1.3 Hierarchical Capacitance Algorithms . . . . . . . . . . . . . . . . . . 14

2.2 ICCAP Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

vii

viii

2.2.1 New Basis Panels Generation . . . . . . . . . . . . . . . . . . . . . . 19

2.2.2 Direct Formulation ofJ ′ in Linear Time . . . . . . . . . . . . . . . . . 25

2.2.3 ExtractingE from J ′ . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

2.2.4 SolvingP ′q′ = v′ for Uniform- and Multiple-dielectric Media . . . . . 28

2.2.5 Potential Coefficient Matrix Reordering . . . . . . . . . . . . . . . . . 28

2.2.6 Complexity Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

2.3 Practical Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

2.3.1 Potential Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

2.3.2 Panel Refinement Scheme . . . . . . . . . . . . . . . . . . . . . . . . 33

2.3.3 Direct Construction ofP ′ . . . . . . . . . . . . . . . . . . . . . . . . . 34

2.3.4 Direct Construction ofV ′ . . . . . . . . . . . . . . . . . . . . . . . . 35

2.4 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

3 Statistical Capacitance Extraction – STATCAP 47

3.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

3.1.1 Process Variations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

3.2 Statistic Capacitance Extraction . . . . . . . . . . . . . . . . . . . . . . . . . 51

3.2.1 Variational Capacitance Approximation . . . . . . . . . . . . . . . . . 51

3.2.2 Process Variation Modeling . . . . . . . . . . . . . . . . . . . . . . . 55

3.2.3 Random Variable Reduction . . . . . . . . . . . . . . . . . . . . . . . 57

3.2.4 Potential Coefficient Approximation . . . . . . . . . . . . . . . . . . . 59

ix

3.2.5 Distribution of Parasitic Capacitance . . . . . . . . . . . . . . . . . . . 62

3.3 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

4 Fast Analytic Lithography Simulation – LITHSIM 70

4.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

4.1.1 Simplified Projection System Model . . . . . . . . . . . . . . . . . . . 75

4.1.2 General Lithography System Model . . . . . . . . . . . . . . . . . . . 77

4.2 LithSim Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

4.2.1 Rectangular Pupil . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

4.2.2 Circular Pupil . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

4.2.3 LithSim Simulation Flow . . . . . . . . . . . . . . . . . . . . . . . . . 87

4.3 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

5 Efficient Inductive Effect Extraction with Lossy Substrate – EPEEC 94

5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94

5.2 Electro-magnetic Formulation of Substrate Eddy Current and Complex Image

Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

5.2.1 Generation of Substrate Eddy Currents . . . . . . . . . . . . . . . . . 97

5.2.2 Analytic Vector Potential within A Multilayer Substrate . . . . . . . . 99

5.2.3 Complex Image Theory and Its Application . . . . . . . . . . . . . . . 103

5.3 Eddy-Current-Aware PEEC model: EPEEC . . . . . . . . . . . . . . . . . . . 104

5.3.1 EPEEC Interconnect Modeling Algorithm . . . . . . . . . . . . . . . . 106

x

5.3.2 SPICE Compatible Reluctance Realization . . . . . . . . . . . . . . . 109

5.4 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

5.4.1 EPEEC Model Validation . . . . . . . . . . . . . . . . . . . . . . . . 111

5.4.2 Substrate Effects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111

5.4.3 Inductance vs. Reluctance . . . . . . . . . . . . . . . . . . . . . . . . 116

6 Conclusion 122

List of Figures

1.1 Wire dimension trends in advanced VLSI technologies. . . . . . . . . . . . . . 2

2.1 Capacitance extraction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.2 BEM capacitance algorithms: FastCap. . . . . . . . . . . . . . . . . . . . . . 15

2.3 BEM capacitance algorithms: HiCap and PHiCap. . . . . . . . . . . . . . . . . 16

2.4 Different bases have different structure matrices and potential coefficient matri-

ces with different densities. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

2.5 Fill-ins introduced by a link between non-leaf panels. . . . . . . . . . . . . . . 19

2.6 The elementary operation of switching basis panels is equivalent to perform a

congruence transformation onP . . . . . . . . . . . . . . . . . . . . . . . . . . 21

2.7 Keep on moving basis panels upward is equivalent to apply consecutive congru-

ence transformations on the potential coefficient matrix without explicit matrix

manipulations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

2.8 Comparison of non-zero entries inH andP ′. . . . . . . . . . . . . . . . . . . 24

2.9 Efficient construction of the new structure matrixJ ′. . . . . . . . . . . . . . . 26

xi

xii

2.10 Comparison of non-zero entries inJ andJ ′. . . . . . . . . . . . . . . . . . . . 27

2.11 Extraction flowchart of ICCAP and PHiCap. . . . . . . . . . . . . . . . . . . . 30

2.12 ICCAP capacitance extraction flow. . . . . . . . . . . . . . . . . . . . . . . . 31

2.13 Centroid of triangular panel. . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

2.14 Centroid of quadrilateral panel. . . . . . . . . . . . . . . . . . . . . . . . . . . 33

2.15 High level link and basic link. . . . . . . . . . . . . . . . . . . . . . . . . . . 35

2.16 Direct construction of the sparse potential coefficient matrixP ′. . . . . . . . . 36

2.17 Direct construction of the new right hand sidev′. . . . . . . . . . . . . . . . . 36

2.18 Density of the new potential coefficient matrixP ′. . . . . . . . . . . . . . . . . 38

2.19 preconditioners from incomplete LU factorization with different reordering schemes

(RelativeResidue = 0.01). . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

3.1 Process variations due to (a) chemical-mechanical planarization, (b) optical dif-

fraction, and (c) chemical etching. (Picture courtesy of TSMC, Hsin-Chu, Tai-

wan.) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

3.2 Process variation modeling with correlated statistical position perturbations on

leaf panels. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

3.3 Random variable transformation. . . . . . . . . . . . . . . . . . . . . . . . . . 61

3.4 An efficient algorithm for constructing the random variable transformation ma-

trix R. The functionInsertEntry(R, i, j, value) fills value into the entry(i, j)

of R. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

xiii

3.5 First and second order capacitance models and their comparisons with Monte

Carlo method for the bus2× 2 benchmark (σ = 20%). . . . . . . . . . . . . . 67

3.6 Second order parasitic capacitance modeling with different number of factors

and the comparison with Monte Carlo method for bus2× 2 benchmark. . . . . 68

4.1 General optical lithography process: (1) Photoresist coating (2) Exposure (3)

Development. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

4.2 Subwavelength gap between IC future size and light wavelength. (Picture cour-

tesy of Numerical Technologies, Inc and Synopsis, Inc.) . . . . . . . . . . . . . 72

4.3 Generic exposure system in optical projection lithography. . . . . . . . . . . . 75

4.4 Shift photomask spectrum is equivalent to shift pupil function. . . . . . . . . . 78

4.5 Transmission cross-coefficient (TCC). . . . . . . . . . . . . . . . . . . . . . . 80

4.6 Mask decomposition. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

4.7 Inverse Fourier transformation of a rectangular pupil. . . . . . . . . . . . . . . 83

4.8 εi(x, y) of a1µm× 1µm slit. . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

4.9 Waveform of sine integral function. . . . . . . . . . . . . . . . . . . . . . . . 85

4.10 Windowing method to reduce computational cost. . . . . . . . . . . . . . . . . 88

4.11 LithSim Optical Lithography Simulation Flow. . . . . . . . . . . . . . . . . . 89

4.12 Irradiance calculated by using discrete Fourier transformation, discrete convo-

lution, LithSim, and continuous convolution. . . . . . . . . . . . . . . . . . . . 90

4.13 Errors in irradiance matrices calculated by using discrete Fourier transforma-

tion, discrete convolution and LithSim compared to continuous convolution. . . 91

xiv

4.14 Images (contours) calculated by using discrete Fourier transformation, discrete

convolution, LithSim, and continuous convolution. . . . . . . . . . . . . . . . 92

5.1 A current filament parallel to a multilayer substrate which contains different

layers of different thickness, conductivity, and permeability. . . . . . . . . . . 100

5.2 Eddy-current-aware PEEC model. Each conductor is further discretized to con-

sider the uneven distribution of currents. . . . . . . . . . . . . . . . . . . . . . 105

5.3 Extended window selection algorithm to simultaneously consider physical and

image conductors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108

5.4 SPICE compatible model for reluctance. The original reluctance element is

substituted by serial self inductance and VCVSs. . . . . . . . . . . . . . . . . 110

5.5 Test configuration: two parallel copper interconnects above a two-layer sub-

strate (Length unit:µm). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112

5.6 Self inductance comparison by using three different extraction tools: FastHenry,

SonnetR©, and EPEEC. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113

5.7 Self inductance decreases as frequency increases and conductor-substrate dis-

tance decreases. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114

5.8 Resistance increases as frequency increases and conductor-substrate distance

decreases. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115

5.9 With the same conductivity, the upper layer substrate will have larger effect than

the lower layer. However, the lower layer cannot be ignored when the thickness

of the upper layer is less than its skin depth. . . . . . . . . . . . . . . . . . . . 116

xv

5.10 Self inductance saturates when the thickness of the upper layer grows over its

skin depth. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117

5.11 Waveforms of transient responses by using different interconnect models: PEEC,

EPEEC-L, and EPEEC-R. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118

xvi

List of Tables

2.1 Algorithm of directly constructingJ ′. . . . . . . . . . . . . . . . . . . . . . . 41

2.2 Hierarchical panel refinement scheme. . . . . . . . . . . . . . . . . . . . . . . 42

2.3 Refinement of two panels. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

2.4 Self coupling insertion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

2.5 Simulation results comparison. . . . . . . . . . . . . . . . . . . . . . . . . . . 45

2.6 Comparison with HiCap for some large benchmarks. . . . . . . . . . . . . . . 46

3.1 Simulation runtime comparison for bus crossing benchmark. (1) Monte Carlo

(M.C.); (2) Quadratic Model (QuadMod). . . . . . . . . . . . . . . . . . . . . 69

4.1 Extraction time and error comparison. . . . . . . . . . . . . . . . . . . . . . . 93

5.1 Extended Window Selection Algorithm. . . . . . . . . . . . . . . . . . . . . . 107

5.2 EPEEC Interconnect Modeling Algorithm. . . . . . . . . . . . . . . . . . . . . 120

5.3 Reluctance Realization Algorithm. . . . . . . . . . . . . . . . . . . . . . . . . 121

5.4 Extraction Time and Model Size Comparison. . . . . . . . . . . . . . . . . . . 121

1

Chapter 1

Introduction

1.1 Motivation

The semiconductor industry has been experiencing an unprecedented growth over the last forty

years. As integrated circuit processing technology marches relentlessly down through deep sub-

micron feature sizes, interconnect effects have moved to the forefront as the chip performance

limitations, such as system delay and signal integrity, other than active device characteristics.

Although, the device and wire dimensions are decreasing, the size of the chip is increasing. This

implies that the number of interconnects as well as their lengths are increasing with each new

generation of advanced logic and memory chips. Nowadays, interconnect delay can account for

more than50% percent of the total path delay. Based on these observations, analysis and opti-

mization of the interconnect performance of very large scale integration (VLSI) or ultra large

scale integration (ULSI) designs becomes an indispensable component of the global effects of

2

advancing the Moore’s Law even further.

Figure 1.1: Wire dimension trends in advanced VLSI technologies.

To characterize the interconnect effects in timing analysis, efficient extraction of on-chip

parasitics, such as resistance, capacitance, and inductance associated with complex interconnect

structures has become a crucial issue for establishing compact yet accurate interconnect circuit

models. Resistance is mainly determined by the geometry of the line only and does not change

depending on the distribution of the wires in its surroundings. On the contrary, capacitance and

inductance are strongly affected by geometry and the distribution of nearby conductors. For

example, increasing the number of metal layers and changing the aspect ratio of metal lines

reduce the effect of interconnect capacitance to a certain extent. The upper metal layers have

lower capacitance to the ground because of the shielding effect of lower metal layers. Also,

the lines in lower metal layers are narrower and taller i.e., their vertical height is more than

their horizontal width. With less width, the capacitance to the ground is decreased. Due to the

dependence on the interconnect geometry, the extraction of capacitance and inductance is much

3

harder than the extraction of resistance.

Capacitance extraction and inductance extraction are crucial for not only timing analysis,

but also signal integrity analysis. For large, high performance circuits, functional noise failures

have become a significant design and verification issue. Due to the non-uniform scaling of in-

terconnects, the width and spacing of wires decreases more rapidly than the thickness of wires

with each process shrink. Cross-coupling capacitance between wires is therefore becoming an

increasingly dominant fraction of total wire capacitance, causing an increase in cross-coupled

noise effects. Furthermore, with the employment of hierarchical metal wiring levels and the

recent introduction of copper wiring (because its resistivity is approximately half that of alu-

minum wiring), on-chip inductance modeling has also become an indispensable issue for clocks

and the fastest signal interconnects.

Consequently, accurate and efficient estimation of on-chip capacitances and inductances in

complicated three dimensional interconnects is becoming increasingly important for determin-

ing the final circuit speeds or functionality in the ultra deep sub-micron design (UDSM) of

integrated circuits.

1.2 Capacitance Extraction

During different stages in the whole VLSI design circle, capacitance extraction needs to be per-

formed pre- and post- routing and pre- and post- layout, with different accuracy requirements.

To extract parasitic capacitances from a given design, the following steps need to be performed:

4

1. Define the technology process and material data. This includes mask layers and di-

electrics, their thickness, conductivity, and permittivity constants.

2. Use the technology data as input to a 2D or 3D field solver to obtain capacitance coeffi-

cients.

3. Generate a rule file for a full-chip capacitance extractor using the obtained capacitance

coefficients.

4. Run the capacitance extractor using the generated rule file.

The above extraction flow is widely adopted by most industrial extraction tools, such as Cadence

Encounter Fire & Ice, Magma QuickCap, Mentor Graphics Calibre xRC, Synopsis Star-RC, and

etc. The most expensive step in the above extraction flow is to establish the rule file, which is

also called the capacitance look-up table (LUT), by using 2D or 3D field solvers.

Although many numerical methods can be used to solve the capacitance extraction prob-

lem [1–8], boundary element method (BEM) has been adopted as the main approach for 3D

capacitance calculation due to its capability to handle complex interconnect structure. How-

ever, BEM yields an extremely dense linear system, and hence direct matrix solving methods,

such as Gaussian elimination, requireO(n3) operation and greatly limit the tractable problem

size.

Many fast capacitance extraction algorithms have been proposed in the literature to solve

the dense linear system, such as [9–19]. FastCap [14] is based on fast multipole method (FMM)

for accelerating the dense matrix-vector multiplications required by iterative matrix solvers.

5

Other multipole accelerated BEM algorithms include Multi-scale [15] and [13]. HiCap [16] is

also FFM algorithm with kernel-independent hierarchical panel refinement. Normally, those

iterative algorithms requireO(n2) per iteration since the potential coefficient matrix is of order

n2. Other well-known algorithms include the precorrected fast Fourier transformation (FFT)

method [18] and singular value decomposition (SVD) method [19], they are ofO(nlogn) com-

plexity and withO(n) memory requirement.

Recently, PHiCap [17] proposes to construct cost-efficient preconditioners by applying an

orthogonal sparsification transformation. Albeit the iteration number is greatly reduced, the or-

thogonal matrix generation still requiresO(nlogn) operation and hence becomes the bottleneck

of the entire algorithm. Furthermore, the transformation matrix needs extra storage spaces and

makes the memory budget even tighter for large scale design applications.

Capacitance extraction problem becomes even more complicated as the semiconductor in-

dustry advances to 65nm technology node. Due to the ever-increasing complexity of VLSI de-

signs and IC process technologies, the mismatch between a circuit fabricated on the wafer and

the one designed in the layout tool grows ever larger. Therefore, characterizing and modeling

process variations of interconnect geometry has become an integral part of analysis and opti-

mization of modern VLSI designs. Process induced variations in the device and interconnect

structures are posing a significant challenge to parasitic modeling and signal integrity analysis.

To determine the extent of such effects, the distribution of various electrical parameters, such as

interconnect resistances and capacitances due to variations in the manufacturing process must

be determined.

6

Our work on capacitance extraction focuses on the development of an efficient BEM ca-

pacitance algorithm to solve 3D capacitance extraction problem and then extends to consider

process variations. The main contributions in this area are as follows:

1. A novel algorithm,ICCAP , which provides a completely different perspective to gener-

ate sparsified and reordered potential coefficient matrices, is presented. ICCAP reveals

that the intrinsic reason why the linear system arising from BEM is dense is due to the

selection of leaf panel charges as the basis. Therefore, ICCAP presents a linear-time ba-

sis panel selection algorithm (BPSA) to choose a new basis. Mathematically, selecting a

different basis is equivalent to perform consecutive congruence transformations to spar-

sify the original dense system, although no explicit matrix computations are required.

Furthermore, ICCAP proposes a cost-free Level-Oriented Reordering (LOR) method to

generate reordered potential matrices, so that preconditioners contain even less fill-ins

than explicitly applying minimum degree reordering (MMD). Experimental results show

that ICCAP is faster and consumes less memory than all previous algorithms, including

FastCap [14], HiCap [16], and PHiCap [17].

2. To efficiently evaluate process variation effects on parasitic capacitance, this work pro-

poses a comprehensive statistical capacitance extraction algorithm,STATCAP, to de-

velop an explicit quadratic form representation for parasitic capacitance in terms of dom-

inant process variation sources. The quadratic model can be easily extended to even

higher order to achieve higher accuracy. Also, STATCAP proposes a systematic way to

7

model interconnect surface fluctuation due to process variations and introduces principle

factor analysis to reduce the large number of random variables used to model the surface

fluctuation. Then STATCAP solves the capacitance quadratic representation by applying

random variable matching and taking the advantage of the efficiency of ICCAP.

3. To study the effects of process variations during the lithography process on the geome-

try of fabricated interconnects, we propose an analytic close-form formula, LITHSIM, to

directly generate mask images by exploiting the regular structure in VLSI designs. LITH-

SIM avoids Fourier and inverse Fourier transformation adopted by general aerial image

simulators to achieve significant speedup. Also, due to its analytic formulation, LithSim

eliminates the aliasing error introduced in the sampling process.

1.3 Inductance Extraction

Parasitic on-chip inductance is growing as another design concern as the very large scale inte-

gration (VLSI) technology marches toward ultra-deep sub-micron and the operation frequency

approaches the giga-hertz range. Inductive coupling effect becomes more important because of

higher frequency signal content, denser geometries, and reductions of both resistance and ca-

pacitance by copper and low-k devices. Inductance effect is present not only in IC packages but

also in on-chip interconnects such as power grids, clock nets, and bus structures. It causes sig-

nal overshoot, undershoot, and oscillations, and aggravates crosstalk and power-grid noises. All

of these seriously impact the on-chip signal integrity. The importance and difficulty of on-chip

8

inductance extraction and analysis have been addressed in [54] and [55].

One major problem of inductance modeling is the long range coupling effect and the uncer-

tainty of return paths. Since inductance is a function of a closed loop, the return path is difficult

to predict in advance before simulation. Fortunately, the PEEC method has been widely adopted

to deal with this issue [56]. However, since PEEC assumes that each conductor segment has

a current return path at infinity, inductive couplings are now among all conductor segments,

so that extremely dense partial inductance matrices are usually generated. For this reason, the

reluctance-based method [57, 58] has been proposed by Hao Ji et al to alleviate this problem.

Since reluctance has higher degree of locality similar to capacitance, only a small number of

neighbors need to be considered. Consequently, the reluctance matrix for circuit simulation is

very sparse compared to the partial inductance matrix.

Moreover, the traditional PEEC approach does not take substrate effects into consideration.

With continuous advances in radio frequency (RF) mixed-signal very large scale integration

(VLSI) technology, the creation of eddy currents in lossy multi-layer substrates has made the

already complicated interconnect analysis and modeling issue more challenging. Although sev-

eral previous works have been proposed to account for substrate losses, such as [59–65], most

of these approaches are based on traditional electromagnetic methods and use the numerical

finite difference method to discretize the entire substrate and hence are often computationally

prohibitive for today’s VLSI geometries. With the roaring clock frequency and the reduced sub-

strate resistivity, a large volume of silicon bulk needs to be spatially discretized into very tiny

cells to capture the substrate effects accurately. Therefore, the obtained equivalent circuit mod-

9

els are extremely prohibitive in sizes since inductive couplings are now among all conductor

segments and substrate cells.

Unsatisfied with the above facts, we propose an accurate and efficient interconnect model-

ing approach – EPEEC (Eddy-current-aware Partial Equivalent Element Circuit). Based on

complex image theory, EPEEC extends the traditional PEEC model to simultaneously take

multi-layer substrate eddy current losses and frequency dependent effects into consideration. To

accommodate even larger scale on-chip interconnect networks, EPEEC develops a new SPICE-

compatible reluctance extraction algorithm by applying sparsification in the inverse inductance

domain with an extended window algorithm. Comparing with several industry standard induc-

tance and full-wave solvers, such as FastHenry and SonnetR©, EPEEC demonstrates within1.5%

accuracy while providing over100X speedup.

1.4 Thesis Organization

This work presents an integrated framework to solve on-chip interconnect and package parasitic

capacitance and inductance extraction problem with and without the consideration of process

variations.

We begin with Chapter 2 to introduce the background and the problem definition of ca-

pacitance extraction. Most recent multipole and hierarchical algorithms based on BEM are

reviewed. The main idea and detailed implementation of ICCAP are presented. We mathemati-

cally proof that ICCAP is much more efficient than existing algorithms and is a linear algorithm

10

in terms of execution time and memory consumption. Extensive and meaningful experimental

results are presented to demonstrate the excellent features of ICCAP.

Chapter 3 presents STATCAP algorithm which extends ICCAP to consider process varia-

tions. STATCAP is the first to introduce random variable matching and random variable re-

duction techniques to the capacitance extraction literature. Also STATCAP proposes a general

framework to model interconnect geometry deviations due to process variations and generate

high accuracy quadratic representations of parasitic capacitances.

Chapter 4 devotes to LITHSIM, an analytic mask image simulation algorithm to model the

process variations introduced in lithography process. LithSim presents a close-form formula to

directly calculate mask image without sampling the mask and hence efficiently eliminates the

discretization aliasing error.

Chapter 5 presents EPEEC which combines complex image method with window-based

reluctance extraction to generate compact, accurate, and SPICE-compatible inductance model.

EPEEC extends complex image theory to handle multi-layer substrates and develops a reluctance-

based extraction algorithm to consider inductive and ohmic losses due to induced eddy currents

in a multi-layer substrate. Furthermore, EPEEC is SPICE-compatible by employing a reluc-

tance realization algorithm which converts one reluctance element to serial self inductance and

VCVSs. Extensive experiments demonstrate that EPEEC has high accuracy and can generate

very compact interconnect models.

Chapter 6 provides some conclusion remarks for this thesis and future works in the area of

parasitic extraction and lithography process variation modeling.

11

Chapter 2

Linear Time Hierarchical Capacitance

Extraction – ICCAP

2.1 Preliminaries

2.1.1 Capacitance Extraction

The purpose of the capacitance extraction is to calculate the capacitance matrix of a conductor

system containingm + 1 conductors.

Given each conductori is at a difference potentialVi and the(m + 1)th conductor is the

reference conductor at zero potential, the total charge on theith conductorQi will be the sum-

mation of contributions from all other conductors. The contribution from thejth conductor to

the charge on theith conductor is equal to the product of the coupling capacitanceCij and the

12

Q1

Q0

Q4 Q3

Q2 V2

V0=0

V4 V3

V1

Reference

Conductor

......

Figure 2.1: Capacitance extraction.

potential difference(Vi − Vj) between theith conductor and thejth conductor.

Q1 = C10V1 + C12(V1 − V2) + · · ·+ C1m(V1 − Vm)

Q2 = C21(V2 − V1) + C20V2 + · · ·+ C2m(V2 − Vm)

...

Qm = Cm1(Vm − V1) + Cm2(Vm − V2) + · · ·+ CmVm

Therefore, the capacitances betweenm non-reference conductors can be represented by the

capacitance matrixC ∈ Rm×m

CV = Q, (2.1)

whereV ∈ Rm andQ ∈ Rm are conductor potential and surface charge vectors respectively.

To determine thejth column of the capacitance matrix, the surface charge on each conductor is

13

computed by raising the potential on thejth conductor to one while grounding other conductors

andCij is equal to the surface charge on theith conductor. The procedure is repeatedm times

to compute all columns ofC.

2.1.2 Boundary Element Method

BEM capacitance extraction is equivalent to solve a first-kind integral equation

ψ(x) =

surface

G(x, x′)σ(x′)da′ (2.2)

to find the conductor charge distributionsσ given the conductor potentialsψ. G(x, x′) is the

Green’s function which has different formulas for uniform dielectric and multiple dielectrics.

To numerically solve the integral equation in Eq. 2.2, the surfaces ofm conductors are

discretized into much smaller panels and surface charges on those most delicate panels (leaf

panels) are assumed to be uniform. So the potential at the center of theith panel is the sum of

the contributions to that potential from the charge distribution on alln leaf panels,

vi =n∑

j=1

qj

aj

panelj

G(xi, x′)da′. (2.3)

Applying Eq. 2.3 to alln leaf panels leads to a dense linear system

Pq = v, (2.4)

where

Pij =1

aj

panelj

G(xi, x′)da′. (2.5)

14

P ∈ Rn×n is referred to as the potential coefficient matrix andq, v ∈ Rn are panel charge and

potential vectors respectively.

Then, to compute thejth column of the capacitance matrix, Eq. 2.4 must be solved forq,

given av vector wherevk = 1 if panelk is on thejth conductor, andvk = 0 otherwise [14].

ThenCij of the capacitance matrix is computed by summing all the panel charges on theith

conductor,

Cij =∑

k∈conductor i

qk. (2.6)

2.1.3 Hierarchical Capacitance Algorithms

The main obstacle of solvingq is that the coefficient matrix in Eq. 2.4 is very dense and di-

rect linear system solvers, such as Gaussian elimination or Cholesky decomposition, become

computationally intractable if the number of panels exceeds several hundred. Therefore, mul-

tipole accelerated [14, 15] and hierarchical algorithms [16, 17] have been proposed to address

this problem.

FastCap [14] accelerates matrix-vector multiplications in iterative matrix solvers by mul-

tipole and local expansions shown in Fig. 2.2: Charge points within an inner circle can be

replaced by a single charge equal to their sum if the distance between evaluation points and the

center of the circle is much larger than its radiusR; Potentials on evaluation points within a

small circle induced by faraway charge points are roughly the same as the potential evaluated

at the center.

HiCap [16] and PHiCap [17] are fast multipole algorithms with hierarchical panel refine-

15

charge

points

evaluation

points

charge

points

evaluation

points r

r j R

r j

Figure 2.2: BEM capacitance algorithms: FastCap.

ment. Hierarchical panel discretization can be represented by a multiple-tree structure as shown

in Fig. 2.3. The root panel of each tree structure corresponds to a conductor surface or a dielec-

tric interface. If the estimated potential coefficient between two panels is larger than a threshold

value, they are further divided into smaller panels. Otherwise, a link recording the potential

coefficient is created between these two panels.

PHiCap [17] proposes the use of a link matrixH ∈ RN×N and a structure matrixJ ∈ RN×n

to represent the hierarchical refinement, whereN is the number of all panels andn the number

of leaf panels. An exampleH andJ for the multiple-tree structure is also shown in Fig. 2.3.

Each row of the structure matrixJ corresponds to a panel, either leaf or non-leaf, and each

column corresponds to a leaf panel. The(i, j) entry inJ is 1 if paneli contains the leaf panelj,

and is0 otherwise [17]. For any two panels with no links in between, the corresponding entries

in H are zero. Otherwise, for panelsi andj, the corresponding entry can be calculated by Eq.

2.5.

Since in every elementary tree, the parent panel charge is the sum of charges on its two child

16

Figure 2.3: BEM capacitance algorithms: HiCap and PHiCap.

panels, all panel charges can be represented by charges on leaf panels,

qN = Jq, (2.7)

whereqN ∈ RN is the vector of all panel charges.

Let vN ∈ RN denote the vector of potentials induced by links on individual panels,

vN = HqN . (2.8)

Since the potential on a parent panel distributes to its two child panels, the leaf panel potential

vectorv ∈ Rn is equal to

v = JT vN . (2.9)

17

By using Eqs. 2.7, 2.8, and 2.9, the potential coefficient matrix can be formulated as

P = JT HJ. (2.10)

Therefore, FastCap, HiCap, and PHiCap all developP based on surface potential and

charges on leaf panels. We will show that this is the intrinsic reason why the linear system

in Eq. 2.4 is dense.

2.2 ICCAP Algorithm

To facilitate our following discussion, we first introduce the definition of basis charges and basis

panels.

Definition 1 Let S denote the variable space composed of charges on all leaf and non-leaf

panels

S = qi|surface charge on panel i, 1 ≤ i ≤ N.

If each panel charge inS can be represented by an unique linear combination of charges onn

panels, charges on those panels are basis charges and thosen panels are corresponding basis

panels.

For a given tree structure, except leaf panel charges, there are many possible bases. For

example, for the multiple-tree structure in Fig. 2.3, Fig. 2.4 shows another set of basis, which

includes two non-leaf panelsc ande. The corresponding structure matrixJ ′ of the new basis is

also shown in Fig. 2.4.

18

a b

c

ihg

fed

j

12 3 45

d f g c i e

a 1 1

b 1 1

c 1

d 1

e 1

f 1

g 1

h -1 1

i 1

j -1 1

e

i

c

g

f

d

q

q

q

q

q

q

New Basis

J' =

Figure 2.4: Different bases have different structure matrices and potential coefficient matrices

with different densities.

Since each basis has its distinct structure matricesJ ′, so that the related potential coefficient

matrix P ′ = J ′T HJ ′ has different densities. For example,P ′ related to the new basis in Fig.

2.4 contains several zeros whileP related to the old basis has no zeros. Therefore, it’s desirable

that one can choose a basis so that its related potential coefficient matrix is sparse.

Before presenting the method to choose a new basis, we first show leaf panel charges com-

pose the worst basis and the corresponding potential coefficient matrix is the densest one.

To prove this, it is necessary to clarify how links between non-leaf panels are filled into the

potential coefficient matrix. As shown in Fig. 2.5, paneli is a non-leaf panel and it containsk

underlying leaf panels. So the charge on paneli is equal toqi =∑k

n=1 qin. Similarly, the charge

on another non-leaf panelj is qj =∑l

n=1 qjn, wherel is the number of leaf panels under panel

j.

Assume there is a linkPij between paneli and j. The potential induced by linkPij on

paneli is given byPijqj = Pij

∑ln=1 qj

n and it distributes to all thek leaf panels under paneli.

19

i

,2

…i

k

ii qqq ,,1

…j

l

jj qqq ,,, 21

ijP

**

**

**

**

k leaf panels

under panel i

Fill - ins of P ij in P

l leaf panels

under panel j

j

Figure 2.5: Fill-ins introduced by a link between non-leaf panels.

Similarly, thel leaf panels under panelj gather the potential produced byPij on panelj which

is Pijqi = Pij

∑kn=1 qi

n. SoPij creates2kl fill-ins in the potential coefficient matrix and has the

pattern shown in Fig. 2.5.

Since leaf panels interact with each other through links between themselves or their upper-

level parent panels, every entry inP is non-zero, and hence the total number of fill-ins isn2.

Consequently, if we take all leaf panel charges as the basis, the corresponding potential coeffi-

cient matrix will be the densest one.

2.2.1 New Basis Panels Generation

Our basis panel selection algorithm (BPSA) is based on continuously performing an elementary

operation to generate a new basis.

Theorem 1 Assume the structure matrix and the potential coefficient matrix corresponding to a

possible basis areJ andP respectively. If the current basis contains two panelsj andk, which

are child panels in the same elementary tree, then arbitrarily eliminating one of them (sayk)

and adding their parent paneli to the basis generates another set of basis panels.

20

The new structure matrixJ ′ corresponding to the new basis can be obtained by

J ′j = Jj − Jk;

J ′i = Jk.

whereJi represents the column corresponding to paneli in J . And the new potential coefficient

matrixP ′ can be obtained by

P ′ = ET PE.

whereE is an elementary transformation matrix.

Without loss of generality, we use an example to gain a clear idea of this important operation.

As shown in Fig. 2.6.(a), leaf panels6 and7 are contained in the same elementary tree. Their

parent is panel4. The right hand side shows the corresponding structure matrixJ when all leaf

panels are selected as the basis.

Now, we apply the elementary operation and move one basis panel from panel7 to its parent

panel4 as shown in Fig. 2.6.(b). Apparently this movement results in a new basis since all panel

charges still can be represented by charges on the new basis panels. The structure matrixJ ′ is

shown on right hand side in Fig. 2.6.(b).

The column corresponding to panel4 in J ′ is identical with the column corresponding to

panel7 in J , since upper level panels originally gathering the charge on panel7 still collects the

charge on panel4 after the elementary operation. So the column of panel4 in J ′ “inherits” the

column of panel7 in J .

21

4

6

5

3

7

6

5

4

3

2

1

1100

0100

0010

1000

0001

1010

1011

q

q

q

q

q

q

q

q

q

q

q

6 4

(b)

3 5

76

54

32

1

7

6

5

3

7

6

5

4

3

2

1

1000

0100

0010

1100

0001

1110

1111

q

q

q

q

q

q

q

q

q

q

q

(a)

Basis Panels

6 73 5

76

54

32

1

4

6

2

3

7

6

5

4

3

2

1

1100

0100

1010

1000

0001

0010

0011

q

q

q

q

q

q

q

q

q

q

q

62 43

(c)

76

54

32

1

4

6

2

1

7

6

5

4

3

2

1

1100

0100

1010

1000

0011

0010

0001

q

q

q

q

q

q

q

q

q

q

q

62 41

(d)

76

54

32

1

Figure 2.6: The elementary operation of switching basis panels is equivalent to perform a con-

gruence transformation onP .

On the contrary, the column corresponding to panel6 is changed inJ ′, since the charge on

panel4 is the sum of charges on panel6 and7 and hence upper level panels now only need to

gather the charge on panel4. Panel6 is included in the new basis since the charge on panel7 can

be obtained only when the charge on panel6 is known. So the changed column corresponding

to panel6 in J ′ is

J ′6 = J6 − J7. (2.11)

Furthermore, Eq. 2.11 can be represented in a matrix form as

J ′ = JE (2.12)

22

whereE is an elementary transformation matrix expressed by

E =

. ..

1 0

−1 1

.. .

panel 6

panel 7

(2.13)

Consequently, by using Eqs. 2.12 and 2.13, the relation between the new potential coeffi-

cient matrixP ′ andP can be written as

P ′ = J ′T HJ ′ = (JE)T H(JE) = ET PE (2.14)

SoP ′ is obtained by a congruence transformation onP .

Based on Eqs. 2.13 and 2.14, it is important to notice that this transformation only changes

the column and row related to panel6. P ′ is obtained by subtracting the column and row of

panel7 from the column and row of panel6. We have shown in Section 3.1 that links on upper

level panels introduce identical fill-ins in columns and rows of panel6 and7. So the subtraction

cancels out identical terms and creates many zeros inP ′.

The elementary operation of moving basis panels upward can be executed continuously. As

shown in Fig. 2.7.(a), after moving basis panel7 to panel4, the elementary tree including panel

2, 4, and5 now has two basis panels (panel4 and panel5). So we can eliminate panel5 (or

panel4) and add its parent panel2. This operation cancels out identical terms in the column

and row of panel4 which inherits the column and row of panel7 in the previous step.

23

4

6

2

3

7

6

5

4

3

2

1

1100

0100

1010

1000

0001

0010

0011

q

q

q

q

q

q

q

q

q

q

q

62 4

4

6

2

1

7

6

5

4

3

2

1

1100

0100

1010

1000

0011

0010

0001

q

q

q

q

q

q

q

q

q

q

q

62 41

3

(b)

(a)

Basis Panels

76

54

32

1

76

54

32

1

Figure 2.7: Keep on moving basis panels upward is equivalent to apply consecutive congruence

transformations on the potential coefficient matrix without explicit matrix manipulations.

Notice that the subtractions are only performed on the column and row related to panel 4 in

P . The column and row of panel6 will not be affected and hence zeros created in the previous

step are preserved. Similarly, after this step, we can move panel3 to panel1 and again eliminate

identical terms in row and column of panel2.

Successively applying the elementary operation is equivalent to implicitly apply consecutive

congruence transformations on the potential coefficient matrix with the transformation matrix

E = E1E2E3 · · ·

24

In each step, many zeros are created by eliminating identical terms in the original potential

coefficient matrix and previously created zero entries will not be destroyed the later steps.

Assume we start from the basis including all leaf panels, and then we apply the elementary

operation to consecutively push basis panels from bottom to top. At the end, the result basis

will only include root panels and left-hand side (LHS) panels. This process is equivalent to

consecutively apply congruence transformations to cancel out duplicated terms introduced by

the same link. So in the new potential matrixP ′, the number of non-zeros is comparable with

the total number of links in the multiple tree structure, which has been proven to beO(n) [16].

This property has also been observed in the experiment as shown in Fig. 2.8.

10 3

10 4

10 5

10 6

10 7

10 3

10 4

10 5

10 6

slope<1

Matrix Demension

Number of non-zeros in H

Number of non-zeros in P'

Figure 2.8: Comparison of non-zero entries inH andP ′.

25

Theorem 2 The basis includes all root panels and all left-hand side panels will lead to a sparse

potential coefficient matrix containing O(n) non-zero entries.

The selection of basis panels is not unique since in each elementary operation, we can either

eliminate right-hand side (RHS) panels or LHS panels. However, the construction ofJ ′ will be

simplified by choosing the basis in Theorem 2.

2.2.2 Direct Formulation of J ′ in Linear Time

One way to constructJ ′ is based on Theorem 1. One can first generate the structure matrixJ

corresponding to the basis containing leaf panels. Then we apply the elementary operation to

push basis upwards. In each operation, we simultaneously updateJ based on Theorem 1. Since

the basis ofn leaf panels is switched to another set ofn panels, at mostn column subtractions

are performed. However, the disadvantage is we need to first constructJ . So we propose the

second method to directly constructJ ′.

Lemma 1 In the columnJ ′i corresponding to a basis paneli, each entryJij is 1 if panel i

contains the right-hand side panelj. If paneli is not a root panel, then each entryJij is−1 if

the parent of paneli contains the right-hand side panelj.

Lemma 1 can be illustrated by a small example in Fig. 2.9. Panel2 is a LHS panel and

has been included in the new basis. Panel5 and7 are its underlying RHS panels and hence the

corresponding entries inJ ′ are filled by1. The parent of panel2 contains RHS panel3, so that

26

the corresponding entry inJ ′ is −1. A detailed implementation of Lemma 1 is presented in

Table 2.1.

6 7

4 5

2 3

1Level 0

Level 1

Level 2

Level 3

1

1

1 2

1 2 1 3

2

1 3

1 2 1 4

2

2

7

5

3

2

1

1

1

1

q

q

q

q

q

(a) (b)

Figure 2.9: Efficient construction of the new structure matrixJ ′.

Theorem 3 The new structure matrixJ ′ corresponding to the new basis in Theorem 2 hasO(n)

entries.

Assume a complete tree structure withn leaf nodes andm = lgn levels where root node is in

level 0. In level i, there are2i−1 LHS panels. Each LHS panel introduces2m − 2i + 1 fill-ins.

So the total number of fill-ins inJ ′ is given bym +∑m

i=1 2i−1(2m− 2i + 1) = 3n + lgn− 3.

So non-zeros inJ ′ is O(n). This property has been observed in practice as shown in Fig. 2.10.

Similarly, we can prove that the original structure matrixJ containsO(nlgn) non-zeros.

That is the reason why Phicap [17] hasO(nlogn) runtime and memory consumption.

27

0 1x103

2x103

3x103

5.0x103

1.0x104

1.5x104

2.0x104

2.5x104

Number of Leaf Panels

Number of non-zeros in J

Number of non-zeros in J'

0

Figure 2.10: Comparison of non-zero entries inJ andJ ′.

2.2.3 ExtractingE from J ′

We have shown that the new potential coefficient matrixP ′ is obtained by applying congruence

transformations on the originalP matrix. By substitutingP ′ = ET PE into P ′q′ = v′, we get

ET PEq′ = v′. (2.15)

Also we know that the original system in Eq. 2.5 is given byPq = v. So these two equations

can be satisfied by setting

v′ = ET v, (2.16)

q = Eq′. (2.17)

From q = Eq′, we can see thatE is the coefficient matrix when leaf panel charges are

28

represented by charges on new basis panels. Since all panel charges can be expressed byqN =

J ′q′, so thatE has been included in theJ ′ matrix and hence can be obtained directly.

2.2.4 SolvingP ′q′ = v′ for Uniform- and Multiple-dielectric Media

ICCAP provides a general sparsification technique that does not depend on specific matrix

solvers. For uniform dielectric, we can adopt incomplete Cholesky decomposition followed by

applying preconditioned conjugate gradient (PCG). For multiple-dielectric media, the sparse

linear systemP ′q′ = v′ is unsymmetrical. In this scenario, the preconditioner is computed from

incomplete LU factorization. Then we use preconditioned GMRES method to solve the system.

Since the new basis includes all root panels, after solvingq′, root panel charges are already

contained inq′ and hence no additional matrix operations are required.

2.2.5 Potential Coefficient Matrix Reordering

The distribution of non-zeros inP ′ affects the number of fill-ins in preconditioners produced

by incomplete Cholesky or LU factorization. AlthoughP ′ is sparse, directly apply minimum

degree reordering (MMD) may still be expensive for large-scale design applications. So we

propose a heuristic cost-free reordering method called Level-Oriented Reordering (LOR).

According to the new basis generation process, it is reasonable to expect that columns and

rows related to lower level basis panels contain more zeros than upper level basis panels, since

fill-ins introduced by links on their upper level panels can mostly be eliminated. So the basic

idea of LOR is to assign basis panels in upper levels with larger indexes, thus the dense part

29

will be in the low right-hand side corner ofP ′.

LOR can be easily done during the panel refinement process by implementing a stack-like

data structure. When one panel is divided into two smaller ones, those two children are pushed

onto the top of the stack such that lower level panels will finally get smaller indexes. By using

the simple reordering scheme, LOR can lead to even less fill-ins in preconditioners than MMD

which will be shown in the experimental section.

2.2.6 Complexity Analysis

The extraction flowchart of ICCAP and its comparison with PHiCap [17] is presented in Fig.

2.11. The first step of ICCAP to selectn basis panels based on Theorem 2 can be done by

scanning allN = 2n− 1 panels to determine which are roots and LHS panels and hence takes

O(n) time. The second step of constructingJ ′ is equivalent to insertO(n) non-zeros inJ ′ and

hence is alsoO(n). E is contained inJ ′ and does not require extra time.H has been proved

to containO(n) non-zeros [16], so that the construction ofP ′ = J ′T HJ ′ can also be done in

O(n).

2.3 Practical Implementation

In this section, we will discuss the detailed implementation of ICCAP. First we will discuss how

to estimate potential coefficients between panels of various shapes and the hierarchical panel

refinement scheme used in ICCAP. Furthermore, in Fig. 2.11, we have presented the primitive

30

New Structure Matrix J'

Basis PanelsSelection

Leaf PanelsSelection

ICCAP PHiCap

Preconditioned Iterative Matrix Solver

Root PanelCharges in q'

Leaf PanelCharges q

''' HJJPT

'JE

HH

0

WFJ

~~~

vqP

'q

''' vqP

~

qWq

F

W

**

*~

PFHF

T

WvWWvT 1

~

)(

Root PanelCharges

Structure Matrix J

v

v

Panel Refinement and Link Matrix H Construction

(In J')

Directly GenerateSparse System

Explicit OrthgonalSparsification

1

2

5

vEvT

'

43

Figure 2.11: Extraction flowchart of ICCAP and PHiCap.

extraction flowchart of ICCAP and its comparison with PHiCap. In practice, the extraction flow

can be greatly simplified by discovering the facts that the sparse potential coefficient matrixP ′

and the right hand sidev′ can be directly constructed without using the link matrixH and the

new structure matrixJ ′. Thus we not only save memory spaces, but avoid many matrix-matrix

and matrix-vector multiplications.

The simplified extraction flow in shown in Fig. 2.12. We will discuss the detailed extraction

flow in the following sections.

31

Construct structure matrix J’

Panel Refinement and Construct link matrix H

' ' ' v q P = ' ' ' v q P =

Construct potential coefficient matrix P’

Construct v’

Iterative matrix solver

Directly

construct P’

V’

Figure 2.12: ICCAP capacitance extraction flow.

2.3.1 Potential Estimation

The self potential coefficient of one panel can be approximated by3.5 divided by the area of

that panel and the coupling potential coefficient between two panels is equal to the inverse of

the distance between the centroids of these two panels. In this section, we will present how to

efficiently calculate the area and centroid of one quadrilateral or triangular panel.

The centroid of a body is the center of its mass (or masses), the point at which it would be

stable, or balance, under the influence of gravity. There are other names for the same point. It

is also often called the center of gravity and the geocenter and barycenter.

There are three common ”centers of gravity” that are studied in math, science and engineer-

ing. The most common in math is the center of masses located at the vertices of a polygon. A

second approach is to treat the area of the polygon as if it were a sheet of uniform density. The

third, and least common, approach is to represent the sides of the polygon as wire rods of uni-

form density. The three centers of gravity are usually different points in other non-symmetric

32

polygons. The first approach is the one we will adopt to calculate the centroid or center of

gravity in ICCAP.

A

B C

G

Figure 2.13: Centroid of triangular panel.

For a triangular panel, the center of balance for the uniform sheet and also of point masses

at the vertices, that is almost universally referenced as the centroid of a triangle. The centroid

of a triangle is a point at the intersection of the three medians of the triangle. One of the basic

ideas known about the centroid is that it divides the medians into a 2:1 ratio. The part of the

median nearest the vertex is always twice as long as the part near the midpoint of the side. If the

coordinates of the triangle are known, then the coordinates of the centroid are the averages of

the coordinates of the vertices. If we call the three verticesA = (x1, y1, z1), B = (x2, y2, z2),

andC = (x3, y3, z3), then the coordinates(xc, yc, zc) of the geocenter would be

xc =x1 + x2 + x3

3, yc =

y1 + y2 + y3

3, zc =

z1 + z2 + z3

3. (2.18)

In a quadrilateral, the line joining the midpoints of two opposite sides is called a bimedian.

The centroid of masses located at the vertices of a quadrilateral is also the intersection of the

bimedians of a quadrilateral. Another property of the quadrilaterals centroid is that it is also the

midpoint of the segment joining the midpoints of the diagonals. Therefore, the centroid of a

33

G G

Figure 2.14: Centroid of quadrilateral panel.

quadrilateral shape will be

xc =x1 + x2 + x3 + x4

4, yc =

y1 + y2 + y3 + y4

4, zc =

z1 + z2 + z3 + z4

4. (2.19)

Calculating the area of a triangle is an elementary problem encountered often in many dif-

ferent situations. Various approaches exist, depending on what is known about the triangle.

An important theorem in plane geometry, also known as Heron’s formula. Given the lengths

of the sidesa, b, andc and the semi-perimeters

s =1

2(a + b + c) (2.20)

of a triangle, Heron’s formula gives the areaA of the triangle as

A =√

s(s− a)(s− b)(s− c). (2.21)

Also the area of a quadrilateral shape is equal to the summation of areas of two non-overlapping

triangular shapes.

2.3.2 Panel Refinement Scheme

As shown in Fig. 2.12, the first step of ICCAP is the hierarchical panel refinement. This process

is much more complicated than it sounds like, and hence deserves some in-depth explanation.

34

Panels are hierarchically discretized based on the couplings between different panels. In

functionRefineScheme , Refine is called to discretized root paneli and root panelj. It is

important to notice thatRefine is only applied to different root panels. So after this process,

we still need to consider self couplings, which is done by functionSelfLinkInsert . In

functionRefine , Peps andLengthguard are parameters that can be specified in the command

line. The detailed implementation of those functions are presented in Tables 2.2, 2.3, and 2.4.

2.3.3 Direct Construction ofP ′

Before presenting the direct construction ofP ′, we introduce two definitions that will facilitate

our discussion.

Definition 2 If one link is between two leaf panels, it is called basic link; otherwise if it is

called high level link.

Definition 3 If one link is between two basis panels, it is called type I link; If one link is between

one basis panel and one non-basis panel, it is called type II link; otherwise if one link is between

two non-basis panels, it is called type III link.

The type of a given link depends on the current selection of basis. For example, in Fig. 2.15,

the link between panela and panelb is a high level link and also a type III link, since both panel

a and panelb are non-leaf panels.

High level links basically are approximations of basic links. For example, in Fig. 2.15, the

link between panela and panelb is an approximation of four links between panelsc, d and

35

a b

c d e f

a b

c d e f

Figure 2.15: High level link and basic link.

panelse and panelf . Therefore, when choosing leaf panels as basis, the link between panela

and panelb will be inserted intoP multiple times.

Our goal now is to find out how different types of links are inserted intoP ′ when we use

our new basis. In the new linear system by selecting the new basisP ′q′ = v′, q′ is the charge

vector of charges on those new basis panels andv′ is the potentials induced on those new basis

panels. For type I links, they will be inserted intoP ′ once. For example, the linkP1 shown in

Fig. 2.16.(a) will be inserted toP ′ab. Type II and type III links need to be inserted into multiple

places. For example, in Fig. 2.16.(b), the potential on panelc induced by the linkP2 is equal to

P2qf = P2(qb− qe). Therefore,P2 will be inserted intoP ′cb while−P2 will be inserted intoP ′

ce.

Similarly, the type III link in Fig. 2.16.(c) will be inserted multiple times. Thus we can directly

constructP ′ without the construction ofH andJ ′ and the matrix multiplicationP ′ = J ′T HJ ′.

2.3.4 Direct Construction ofV ′

In previous chapter, we have shown that the new right hand sidev′ can be obtained byv′ = ET v.

Also we have shown thatq = Eq′ andE is contained in the new structure matrixJ ′. E contains

36

a b

P1

c d e f

a b

P2

c d e f

a b

P3

c d e f

(a) Type I (b) Type II (c) Type III

Figure 2.16: Direct construction of the sparse potential coefficient matrixP ′.

rows inJ ′ corresponding to leaf panels.

1

2 3

4 5 6 7

Figure 2.17: Direct construction of the new right hand sidev′.

Let’s further study the algorithm for constructingJ ′ which is presented in Theorem 3. As

shown in Fig. 2.17, according to Theorem 3, the basis panel2 will affect two rows correspond-

ing to leaf panels5 and7. In the row of leaf panel5, J ′52 is 1, while in the row of leaf panel7,

J ′72 is−1. Therefore, one can see that for every LHS panelj,

n∑i=1

Eij = 0. (2.22)

However, for every root panelk,

n∑i=1

Eik = 1. (2.23)

37

Therefore we conclude that in the new right hand sidev′, if we currently calculate theith

column of the capacitance matrix, only entries corresponding to root panels belong to conductor

i need to be set to1 and all other entries are0.

Therefore, we can directly construct the new potential coefficient matrixP ′ and the right

hand sidev′ without any extra efforts. Furthermore, after solving the new basis panel charge

vectorq′, the root panel charges has already been included inq′ and hence no further steps are

required. The final ICCAP extraction flow has been presented in Fig. 2.12.

2.4 Experimental Results

ICCAP is implemented inC + + language and Matlab. All experiments are executed on Sun-

Blade 2500 with two 1.28-GHz UltraSPARC IIIi processors,8G RAM and OS Sorlaris 9. The

main test examples arek × k bus crossing conductors fork = 2 to 16, generated by busgen in

FastCap released package [14].

The density of the new potential coefficient matrixP ′ related to the new basis is plotted. The

density is defined as the total number of non-zeros inP ′ divided by its dimension. As shown

in Fig. 2.18, as the number of leaf panels goes over one thousand,P ′ is very sparse and the

density ofP ′ becomes well below10%.

We also test the Level-Oriented Reordering (LOR) method embedded in the panel refine-

ment process by using the bus4× 4 benchmark. Without using LOR, original lower and upper

triangular factors from incomplete LU factorization contain29017 and24546 non-zeros respec-

38

0 1x103

2x103

3x103

0

10

20

30

40

50

B-Spline CurveDe

ns

ity

of

P' (%

)

Number of Leaf Panels

Figure 2.18: Density of the new potential coefficient matrixP ′.

tively. By adopting LOR, the number of fill-ins is dramatically reduced by30%. The result is

comparable with directly applying MMD which in this case results in22129 and19633 fill-ins

in L andU .

Table 3.3 compares the performance of three algorithms : FastCap [14] with expansion order

2, HiCap [16], and the new algorithm, ICCAP. The convergence tolerance is set to0.01, and

error is calculated with respect to FastCap (-o2). Iteration is the average number of iterations

per conductor. ICCAP is the fastest one in these three algorithms. Compared with FastCap,

ICCAP is 30 − 40 times faster and with much less memory. Compared with HiCap, for the

bus12 × 12 benchmark, ICCAP exhibits nearly10 times speedup. HiCap representsP as a

block matrix instead of implementing it directly, and hence the real storage ofP is O(n). All

39

0

500

1000

1500

0 500 1000 1500

0

500

1000

1500

0 500 1000 1500

0

500

1000

1500

0 500 1000 1500

P':108228 L:22129 U:19633

0

500

1000

1500

0 500 1000 1500

0

500

1000

1500

0 500 1000 1500

0

500

1000

1500

0 500 1000 1500

P':108228 L:21761 U:18783

0

500

1000

1500

0 500 1000 1500

0

500

1000

1500

0 500 1000 1500

0

500

1000

1500

0 500 1000 1500

P':108228 L:29017 U:24546 Without Reordering

Reordered by LOR

Reordered by MMD

Figure 2.19: preconditioners from incomplete LU factorization with different reordering

schemes (RelativeResidue = 0.01).

H, J , andP ′ in ICCAP containO(n) non-zeros, so that the memory consumptions of ICCAP

and HiCap are in the same order. The actual accuracy and memory consumption of HiCap and

ICCAP depend on the refinement parameters. When the number of leaf panels is roughly the

same, HiCap and ICCAP have comparable accuracy.

We do not have access to PHiCap [17] and cannot compare with it explicitly. Published

results show PHiCap is2 − 3 times faster than HiCap for the testing benchmarks in Table 2.

40

Based on the comparison with HiCap, we can expect ICCAP is faster than PHiCap as well.

Also notice that for testing cases in Table 2, normally ICCAP converges in less than 2 iterations

while PHiCap needs about 3 iterations. Also, the main disadvantage of PHiCap is its memory

consumption due to the explicit formulation of transformation matrix while ICCAP directly

formulates the sparse matrixP ′. Also [17] shows that PHiCap has lower accuracy than HiCap.

So ICCAP can be superior to PHiCap in terms of memory and accuracy.

Also we use ICCAP and HiCap to test large files containing more conductors. The result is

shown in Table 3. For these test files, ICCAP can converge within three iterations and shows

7− 8 times speedup compared with HiCap.

41

GenerateNewJ(NewBasis)

for ( i=0; i<NewBasis.size(); i++ )

p = NewBasis[i];

InsertEntryNewJ (p, i, 1);

while ( panel[p] is a non-leaf panel )

p = panel[p].GetRight();

InsertEntryNewJ (p, i, 1);

p = NewBasis[i];

if ( ! panel[p] is a root panel )

p = panel[p].GetParent();

while ( panel[p] is a non-leaf panel )

p = panel[p].GetRight();

InsertEntryNewJ (p, i, -1);

Table 2.1: Algorithm of directly constructingJ ′.

42

RefineScheme(OriginalPanelNum)

for ( i=0; i<OriginalPanelNum; i++ )

for ( j=i+1; j<OriginalPanelNum; j++ )

Refine(i,j);

Table 2.2: Hierarchical panel refinement scheme.

43

Refine(PanelAi, PanelAj)

Pij = PotentialEstimate(Ai, Aj);

Ri = Longest side of panelAi;

Pj = Longest side of panelAj;

if ( (Pij ∗Ri < Peps && Pij ∗Rj < Peps) ||

(max(Ri, Rj) ≤ Lengthguard) )

RecordLink(i,j,Pij);

else if( Ri > Rj )

Subdivide(Ai);

Refine(Ai.left, Aj);

Refine(Ai.right,Aj);

else

Subdivide(Aj);

Refine(Aj.left, Ai);

Refine(Aj.right,Ai);

Table 2.3: Refinement of two panels.

44

SelfLinkInsert()

for (i=0; i<Basis.size(); i++)

for (j=i; j<Basis.size(); j++)

if (panel[Basis[i]].root == panel[Basis[j]].root)

Pij = PotentialEstimate(Basis[i],Basis[j]);

RecordLink(Basis[i],Basis[j],Pij);

Table 2.4: Self coupling insertion.

45

4× 4 Bus, Unit List: Time(Sec), Memory(MB)

Algorithm Time Iteration Memory Error Panels

FastCap 8.03 18.63 26.27 – 2736

HiCap 0.77 8.7 0.99 0.72% 2176

ICCAP 0.39 1.12 0.581 0.76% 2112

6× 6 Bus, Unit List: Time(Sec), Memory(MB)

Algorithm Time Iteration Memory Error Panels

FastCap 35.55 14.4 65.19 – 5832

HiCap 3.19 14.5 1.85 1.42% 3168

ICCAP 0.7 1.08 1.54 1.50% 3168

8× 8 Bus, Unit List: Time(Sec), Memory(MB)

Algorithm Time Iteration Memory Error Panels

FastCap 67.4 12 114.5 – 10080

HiCap 14.64 13.4 5.03 1.63% 8448

ICCAP 2.84 1.43 3.58 1.91% 8320

12× 12 Bus, Unit List: Time(Sec), Memory(MB)

Algorithm Time Iteration Memory Error Panels

FastCap 357.99 18.1 297.8 – 22032

HiCap 76.53 15.1 12.72 1.08% 12864

ICCAP 7.21 1.41 11.87 1.18% 12480

Table 2.5: Simulation results comparison.

46

Cond Num 36 48 68

Algorithm HiCap ICCAP HiCap ICCAP HiCap ICCAP

Time 159.03 21.8 427.37 53.6 1932.8 164.7

Iteration 15.8 2.58 18.6 3.26 23.2 3.15

Memory 14.6 13.4 24.5 20.1 47.3 37.3

Panels 13440 12876 19040 18156 33040 31356

Table 2.6: Comparison with HiCap for some large benchmarks.

47

Chapter 3

Statistical Capacitance Extraction –

STATCAP

3.1 Preliminaries

Due to the ever-increasing complexity of VLSI designs and IC process technologies, the mis-

match between a circuit fabricated on the wafer and the one designed in the layout tool grows

ever larger. Therefore, characterizing and modeling process variations of interconnect geome-

try has become an integral part of analysis and optimization of modern VLSI designs. In this

chapter, we present a systematic methodology to develop a closed form capacitance model,

which accurately captures the nonlinear relationship between parasitic capacitances and domi-

nant global/local process variation parameters. The explicit capacitance representation applies

the orthogonal principle factor analysis to greatly reduce the number of random variables as-

48

sociated with modeling conductor surface fluctuations while preserving the dominant sources

of variations, and consequently the variational capacitance model can be efficiently utilized by

statistical model order reduction and timing analysis tools. Experimental results demonstrate

that the proposed method exhibits over100× speedup compared with Monte Carlo simulation

while having the advantage of generating explicit variational parasitic capacitance models of

high order accuracy.

3.1.1 Process Variations

As VLSI circuits have entered deep sub-micron dimensions, increasing complexity of VLSI

designs and IC process technologies increases the mismatch between design and manufacturing.

Process induced variations in the device and interconnect structures are posing a significant

challenge to parasitic modeling and signal integrity analysis. To determine the extent of such

effects, the distribution of various electrical parameters, such as interconnect resistances and

capacitances due to variations in the manufacturing process must be determined. Once this

distribution is known, which is also called the design envelope, the design corners can then be

identified.

During the modern Damascene process, the dielectric is usually patterned by reactive ion

etching (RIE), followed by the linear and metal (Cu) deposition. Then chemical-mechanical

planarization (CMP) is applied to remove excessive metal and provide a global planarization.

During RIE, the ideal eroded rectangular trenches in dielectric, and hence later deposited metals

and liners, may become trapezoidal due to the aspect dependent etch rate (ARDE) effect. During

49

the CMP overpolishing process, regions of high metal pattern density tend to erode faster and

hence show higher metal and dielectric removal rates than regions of low metal pattern density

[20]. The non-uniform metal removal rates across the wafer can lead to varying metal line

thickness for interconnects sited in the same metal layer. Also during the pattern transferring in

lithography process, photomask geometries may be distorted due to nonlinear distortions caused

by optical diffraction and resist process effects, so that the tips and corners of interconnect will

become round shape.

0.447

0.368

0.375

0.421

0.414

M7

M6

M5

Eroded dielectric

High pattern density Low pattern density

(b)

(a)

(c)

Figure 3.1: Process variations due to (a) chemical-mechanical planarization, (b) optical diffrac-

tion, and (c) chemical etching. (Picture courtesy of TSMC, Hsin-Chu, Taiwan.)

Therefore, for deep submicron technologies, a combination of device physics, die location

50

dependence, optical proximity effects, micro-loading in etching and deposition may lead to het-

erogeneous and non-monotonic relationships among the process random variables. Also para-

sitic capacitance does not change monotonically or linearly according to those random parame-

ters, which have varying effects on interconnect geometries depending on local characteristics

of the layout and uncertainties in fabrication. Since all these process variations are random in

nature, statistical parasitic capacitance models having the ability to capture those complicated

nonlinear relationships become indispensable.

Furthermore, capacitance extraction with process variations can never be the final goal. Ca-

pacitance variation analysis needs to provide a model fully compatible with statistical model

order reduction and statistical timing analysis tools, most of which require representing para-

sitic capacitances as functions of some common random variables [21–25]. Also recent study

shows that the first order canonical model is not sufficient enough to represent the nonlinear

dependency of parasitic capacitances on many variation sources [26]. To our best knowledge,

although many efficient 3D capacitance extraction algorithms [9,13–19] have been proposed in

the literature and there have been some pioneer works [26, 27] on capacitance extraction with

the consideration of process variations, no algorithm has the functionality to efficiently supply

an explicit statistical capacitance model with high order accuracy.

51

3.2 Statistic Capacitance Extraction

The following four issues will be discussed in this section for modeling parasitic capacitance

variations: (1) how to efficiently solve the system equations associated with the variational

capacitance model; (2) how to mathematically model the surface fluctuation due to process

variations; (3) how to reduce the large number of random variables used to model the surface

fluctuation; (4) how to obtain the probability density function without using time consuming

Monte Carlo simulation.

3.2.1 Variational Capacitance Approximation

Assume for now that process variations induce some perturbations in the nominal potential

coefficientPkl between panelk and panell, and the variational potential coefficientPkl can be

represented in terms of the nominal valuePkl andk normal random variablesδ = [δ1 δ2 · · · δk]T

as

Pkl = Pkl +∑

i

∆P iklδi +

∑i,j

∆P ijkl δiδj + h.o.t. (3.1)

How to representPkl in the such a form will be presented in the following sections.

The expression ofPkl in terms ofδ can be extended to higher orders. If the first three terms

is used, Eq. 3.1 is the quadratic form of the potential coefficientPkl. The second term represents

the canonical linear model while the third term captures the nonlinear relationship betweenPkl

andδ. In the rest of this chapter, our discussion will be based on the quadratic form, since

higher order approximations can be easily extended using the presented derivation.

52

Since each entry of the variational link matrixH has the form shown in Eq. 3.1, the entire

H can also be expressed in a quadratic form as follows:

H = H +∑

i

∆H iδi +∑i,j

∆H ijδiδj, (3.2)

whereH, ∆H i, ∆H ij ∈ RN×N are constant coefficient matrices.

By using Eq. 2.10, the variational potential coefficient matrixP can also be represented in

terms ofP andδ

P = JT HJ +∑

i

JT ∆H iJδi +∑i,j

JT ∆H ijJδiδj,

= P +∑

i

∆P iδi +∑i,j

∆P ijδiδj

︸ ︷︷ ︸∆P

, (3.3)

where∆P i = JT ∆H iJ and∆P ij = JT ∆H ijJ . P is the potential coefficient matrix without

considering the process variations, and∆P , which is the summation of the second and third

terms in Eq. 3.3, represents the variational part ofP .

Let q denote the variational charge distribution vector, our goal is then to expressq in a

quadratic form, such that

q = q +∑

i

∆qiδi +∑i,j

∆qijδiδj

︸ ︷︷ ︸∆q

, (3.4)

whereq, ∆qi, ∆qij ∈ Rn×1. From Eq. 3.4, it is clear that the quadratic expressions of self and

coupling capacitances can be easily obtained.

From Eq. 3.3 and Eq. 3.4, the variational linear system can be then represented as

(P + ∆P )(q + ∆q) = v. (3.5)

53

Substituting the normal equation in Eq. 2.4 into Eq. 3.5 and applying the Taylor expansion,∆q

can be expressed as

∆q = −(I + P−1∆P )P−1∆Pq

= −P−1∆Pq︸ ︷︷ ︸∆q1

+ P−1∆PP−1∆Pq︸ ︷︷ ︸∆q2

+ · · ·

= Aq + A2q + · · · =∞∑i=1

Aiq, (3.6)

whereA = −P−1∆P .

Theorem 4 The variational charge distribution vector∆q can be represented as∆q =∑∞

i=1 Aiq,

whereA = −P−1∆P . The Taylor expansion series of∆q converges under the condition

‖ P−1∆P ‖p< 1. So high order terms can be iteratively calculated by using the following

equation

P∆qi+1 = −∆P∆qi. (3.7)

Since in practice, the perturbation matrix∆P is normally smaller than the normal poten-

tial coefficient matrixP , the convergence condition can be almost always satisfied. Let the

quadratic form representation of the first term on the right hand side of Eq. 3.6,∆q1, to be

∆q1 =∑

i

∆qi1δi +

∑i,j

∆qij1 δiδj. (3.8)

By using Eq. 3.3 andP∆q1 = −∆Pq, we can get

P∆qi1 = −∆P iq,

P∆qij1 = −∆P ijq, (3.9)

54

Therefore, the quadratic expression of∆q1 can be calculated by solving(k+k2) linear systems.

SinceP is sparse, each linear system in Eq. 3.9 can be efficiently solved by preconditioned

iterative methods withO(n) complexity. So the total complexity of solving∆q1 isO((k2+k)n).

Usually, the number of random variables,k, is much smaller than the total number of leaf panels

n.

The second term∆q2 =∑

i,j ∆qij2 δiδj in Eq. 3.6 can be obtained by using∆q1

P∆q2 = −∆P∆q1. (3.10)

Let the right hand side vector in Eq. 3.10 to beq1 = ∆P∆q1, then the quadratic approximation

of q1 can be expressed as

q1 =∑i,j

∆P i∆qj1δiδj + h.o.t. (3.11)

Therefore, the coefficient vectors of∆q2 can be obtained by

P∆qij2 = −∆P i∆qj

1, (3.12)

So the quadratic expression of∆q2 requires the solving ofk2 linear systems and hence the

complexity isO(k2n).

Therefore, by using the quadratic expressions of∆q1 and∆q2, the quadratic expression of

∆q is then obtained by

∆qi = ∆qi1,

∆qij = ∆qij1 + ∆qij

2 . (3.13)

55

So the total computational complexity of calculating the quadratic form of∆q is O(k2n).

Also, one may notice that the first order terms are only generated by∆q1 while the second

order terms are generated by∆q1 and∆q2. Therefore, for the quadratic form approximation,

when i > 2, ∆qi does not contain the first and second order terms, and hence can be safely

truncated. In the follow subsections, we will present how to express the variational potential

coefficients in a form in terms ofk normal random variables.

3.2.2 Process Variation Modeling

After the hierarchical panel discretization process, the positions of those most delicate panels,

leaf panels, may be varying due to process variations. The surface fluctuation of a conductor

can be described as a statistical perturbation on each nominal leaf panel smooth surface along

its normal direction as shown in Fig. 3.2.

Nominal smooth surface

Rough surface

nj

ni

correlation betweennjandni

Figure 3.2: Process variation modeling with correlated statistical position perturbations on leaf

panels.

Although leaf panel position variations may not be truly random, they can often be ac-

56

curately modeled by assuming an appropriate spatial correlation [27]. We denote leaf panel

position variations as a random variable vector∆n, where theith element in∆n, ∆ni, is the

random perturbation on the leaf paneli. For simplicity, one can assume that the expectation of

∆n is µ(∆n) = 0.

Obviously, the larger the distance between two leaf panels, the weaker the correlation will

be. This spatial relationship can be accurately modeled by using the Gaussian correlation func-

tion [27]. For two leaf panelsi andj, the correlation between them is determined by

Γij = e−‖xi−xj‖2/η2

, (3.14)

wheree is Euler constant andη is user-specified correlation length.xi andxj are the centers of

leaf panelsi andj, respectively. Then the correlation matrix can be written as

Γ(∆n) = (Γij)n×n. (3.15)

Many small terms inΓ(∆n) can be truncated to make it sparse if the corresponding two leaf

panels are separated faraway enough. Also if the variance on leaf paneli is assumed to beσi,

then the variance-covariance matrixΣ of ∆n can be obtained as

Σ(∆n) = (Γijσiσj)n×n. (3.16)

Therefore, the surface fluctuation can be modeled by the random vector∆n with meanµ(∆n) =

0 and the variance-covariance matrixΣ(∆n) given in Eq. 3.16.

57

3.2.3 Random Variable Reduction

Although the process variations can be modeled as position perturbations on leaf panels, the

number of random variables can easily exceeds several thousand and this may greatly limit the

size of the problem that can be analyzed.

The position perturbations of leaf panels may be caused by many unobservable variation

sources, either global or local. However, some of them may have significant effects on the

conductor surface fluctuation while others may not, and hence those non-significant factors can

be safely neglected in our modeling process. In multivariate statistics, determining the dominant

unobservable variation sources can be performed by principle factor analysis (PFA) [28] based

on either the correlation matrixΓ(∆n) in Eq. 3.15 or the variance-covariance matrixΣ(∆n) in

Eq. 3.16.

The random variable vector∆n representing the perturbations on leaf panels is observable,

and hasn components with the mean vectorµ(∆n) = 0 and the variance-covariance matrix

Σ(∆n) given in Eq. 3.16. The principle factor analysis postulates that∆n is linearly dependent

uponk (k << n) unobservable random variablesδ, called common factors. Thosek common

factors are used to model the unknown and unobservable dominant process variation sources

that inherently induce the perturbations on leaf panels.

Furthermore, the orthogonal principle factor analysis (OPFA), also referred to as principle

58

component model, assumes that

µ(δ) = 0,

Σ(δ) = I. (3.17)

The goal of orthogonal principle factor analysis is to find a loading matrixL ∈ Rn×k, such that

∆n = L × δ.

(n× 1) (n× k) (k × 1)

(3.18)

From the OPFA model in Eq. 5.32 and by using Eq. 5.31, one can easily obtain that

Σ(∆n) = LΣ(δ)L′ = LL′. (3.19)

Let Σ(∆n) have eigenvalue-eigenvector pairs(λi, ei) with λ1 ≥ λ2 ≥ · · · ≥ λn ≥ 0. Then the

eigen-decomposition ofΣ(∆n) is given by

Σ(∆n) = λ1e1e′1 + λ2e2e

′2 + · · ·+ λnene′n

=

[√

λ1e1

√λ2e2 · · ·

√λnen

]

√λ1e1

√λ2e2

...

√λnen

. (3.20)

So if the loading matrix equal is equal toL = [√

λ1e1 · · ·√

λnen], then we can obtainΣ(∆n) =

LL′ as in Eq. 3.19.

However, in this case, the principle factor analysis is not particularly useful since it employs

as many common factors as there are random variables and does not lead to any approximation

59

of Σ(∆n), although the correlative relationships among∆n have been decoupled. We prefer

models that explain the variance-covariance matrixΣ(∆n) in terms of just a few common

factors.

When the last(n−k) eigenvalues are small, one can neglect the contribution ofλk+1ek+1e′k+1+

· · ·+ λnene′n to Σ(∆n) in Eq. 3.20. So if one let

L = [√

λ1e1

√λ2e2 · · ·

√λkek], (3.21)

then neglecting this contribution leads to the approximation

Σ(∆n) ≈ λ1e1e′1 + λ2e2e

′2 + · · ·+ λkeke

′k = LL′. (3.22)

Furthermore, OPFA provides a easy way to determine how many number of common factors

are necessary to achieve the user specified accuracy. Since theith factor basically corresponds

to theith eigenvalue as shown in Eq. 3.20 and∑n

i=1 λi = tr(Σ(∆n)), the contribution of the

ith factor toΣ(∆n) can then be estimated by

ci =

λi

tr(Σ(∆n))factor analysis usingΣ(∆n)

λi

nfactor analysis usingΓ(∆n)

. (3.23)

So if∑k

i=1 ci of the firstk largest eigenvalues is larger than a user specified value depending on

accuracy requirement, the resultk number of factors will be applied to approximate∆n.

3.2.4 Potential Coefficient Approximation

For one pair of panelsk and l without process variations, the potential coefficient between

them is evaluated by Eq. 2.3. If panelsk and l have variations∆nk and ∆nl along their

60

normal direction, then the variation potential coefficientPkl is a function of∆nk and ∆nl,

Pkl = f(xk, xl, ∆nk, ∆nl). By expandingPkl into Taylor series around∆nk and∆nl, one can

obtain that

Pkl = Pkl + akl∆n + ∆n′Akl∆n + h.o.t, (3.24)

whereakl is a1 × 2 vector andAkl is a2 × 2 matrix. ∆n = [∆nk ∆nl]T is a random vector

containing∆nk and∆nl.

During the hierarchical panel refinement process, the recorded links may or may not be

created between two leaf panels as we have shown in Fig. 2.5. So∆n could contain the

variations on some non-leaf panels. Since our process variations and principle factor analysis

are performed in terms of variations on leaf panels, it is necessary to represent∆n in terms of

∆n.

Without loss of generality, we assume that the position variations of leaf panels are along

their normal direction. Then if two panelsi andj have variations∆ni and∆nj, the variation

on their parent panelk will be ∆nk = 1/2(∆ni + ∆nj). So all panel variations∆n, either leaf

or non-leaf, can be expressed in terms of variations on its underlying leaf panels

∆n = R∆n, (3.25)

whereR ∈ RN×n is a provable sparse matrix. For example, for the small tree structure shown

on the right hand side in Fig. 3.3, panels 1, 2, and 4 are leaf panels. Panel 3 is the parent of

panels 1 and 2, and hence∆n3 = 1/2(∆n1 + ∆n2). Panel 5 is the parent of panels 3 and 4,

and hence∆n5 = 1/2(∆n3 + ∆n4) = 1/4(∆n1 + ∆n2) + 1/2∆n4. The detailed algorithm for

61

constructing the random variable transformation matrixR is presented in Fig. 3.4.

4

2

1

5

4

3

2

1

2/14/14/1

100

02/12/1

010

001

n

n

n

n

n

n

n

n

1n

2n

3n

R

4

a5

3

1 2

Figure 3.3: Random variable transformation.

Therefore, by using Eq. 3.25,∆n can be expressed in terms of∆n as

∆n =

Rk

Rl

∆n, (3.26)

whereRk andRl are thekth and thelth rows in the transformation matrixR. And then the

variational potential coefficient between panelsk andl, Pkl, can be written as according to∆n

Pkl = Pkl + akl∆n + (∆n)′Akl∆n, (3.27)

where

akl = akl

Rk

Rl

, (3.28)

and

Akl =

Rk

Rl

Akl

Rk

Rl

. (3.29)

62

Furthermore, since the leaf panel variations can be represented usingk common factors, the

variational potential coefficient between panelsk andl, Pkl, can be further represented in terms

of thek dominant common factors

Pkl = Pkl + aklδ + δ′Aklδ, (3.30)

where

akl = aklL, (3.31)

Akl = L′AklL. (3.32)

The ith element in the vectorakl is equal to∆P ikl while ∆P ij

kl is equal to2(Akl)ij if i 6= j and

(Akl)ij if i = j. So the method presented in section 3.1 can be used to solve∆q.

3.2.5 Distribution of Parasitic Capacitance

After obtaining the quadratic expression of parasitic capacitance, Monte Carlo simulation can

be applied to determine the corresponding probability density distribution (PDF). However, in

this section, we will present a way to directly calculate the PDF of a parasitic capacitance given

its quadratic form.

To compute the PDF of the parasitic capacitance, we first need to calculate its characteristic

function. For a random variableX, its characteristic function is defined as

CX(ξ) = E(ejξX) =

∫ +∞

−∞ejξxfX(x)dx, (3.33)

wherefX(x) is the probability density function (PDF) ofX.

63

Since the characteristic function is actually an inverse Fourier transform of the PDF, the

PDF of the random variableX can easily computed if its characteristic function is known

fX(x) =1

∫ +∞

−∞e−jξxCX(ξ)dξ. (3.34)

The formal proof of this conclusion can be found in [29].

For a parasitic capacitance defined in the quadratic form

C = C + aδ + δ′Aδ, (3.35)

whereδ ∼ N(0, Σ), its exact characteristic function can be analytically computed by [30]

CC(ξ) = |Ω|− 12 expjξm− 1

2ξ2a′Σ

12 Ω−1Σ

12 a, (3.36)

where|Ω| is the determinant of matrixΩ = I − 2jξΣ12 AΣ

12 . Once we obtainCC(ξ), the PDF,

and then the cumulative distribution function (CDF), can be computed from Eq. 3.34.

Clearly, there will be one step of eigenvalue decomposition (computingΣ12 ) and one step

of Fourier transformation in order to analytically compute the distribution of a parasitic capac-

itance. Since our principle factor vector isδ ∼ N(0, I), the Ω matrix can be simplified to

Ω = I − 2jξA. So thatCC(ξ) = |Ω|− 12 expjξm− 1

2ξ2a′Ω−1a and the eigenvalue decompo-

sition can be eliminated.

3.3 Experimental Results

The proposed capacitance variability modeling approach has been implemented in C/C++ lan-

guage. All experiments are executed on a Pentium(R) 4 CPU 1.4GHz machine with 1GB RAM.

64

Monte Carlo simulation with10, 000 runs is used for comparison purpose.

First, for the2× 2 bus crossing problem, probability density functions (PDF) obtained from

the canonical linear model and the quadratic model are shown in Fig. 3.5 and compared with

that from Monte Carlo simulation. It is illustrated that there is a significant accuracy improve-

ment by using the second order quadratic model instead of the canonical model. The accuracy

improvement of the quadratic model is mostly due to the probability distribution region corre-

sponding to larger capacitance values, which is actually more critical for circuit performance

and timing analysis. The canonical model will tend to underestimate the possible capacitance

value in the high probability region. This underestimation, in reality, will result in optimistic

design and excessive chip failure. This example clearly shows the necessity of the quadratic

model in today’s technology where process variation can no longer be ignored..

In the second experiment, the CDFs and PDFs of the second order quadratic models with

different number of dominant factors are compared. Without applying PFA, the number of

random variables is equal to the total number of leaf panel, which is1126 for bus2 × 2. In

practice, how many number of dominant factors need to be preserved is determined by the

Gaussian correlation length in Eq. 3.14. The setup of Gaussian correlation length depends on

the detailed processing techniques and the local layout characteristics. For different regions

and different panel orientations, we can assign different correlation lengths. In this test, PFA

with only ten factors is very close to the result CDF and PDF from Monte Carlo simulation,

so that ninety percent random variable reduction has been achieved by PFA. And in this case,

the error in CDF compared with Monte Carlo is less than3%. Furthermore, as the number

65

of factors increases, the CDFs and PDFs from the quadratic models quickly converge to those

from Monte Carlo simulation.

In table 3.3, the run times of Monte Carlo method and the quadratic model with10 dominant

factors for different bus crossing benchmarks are compared. It is clear that the quadratic model

exhibits over100× speedup compared with Monte Carlo simulation. Statistical distribution-

related parameters, such as mean value, standard deviation, and skewness are normally within

3% errors. Combined with the results from previous experiments, We can safely conclude that,

currently, the second order approximation is accurate enough for variational parasitic capaci-

tance modeling.

66

Procedure ConstructR

Input: (a) VectorPanel contains the indexes of all panels;

(b) VectorBasis contains the indexes of leaf panels.

Output: R ∈ RN×n, such that∆n = R×∆n.

1: n = Basis.size();

2: for i = 1 · · ·n do

3: X = Basis[i];

4: InsertEntry(R,X, i, 1);

5: value = 1/2;

6: while Panel[X].parent! = NULL do

7: X = Panel[X].GetParent();

8: InsertEntry(R, X, i, value);

9: value = 1/2× value;

10: end while

11: end for

Figure 3.4: An efficient algorithm for constructing the random variable transformation matrix

R. The functionInsertEntry(R, i, j, value) fills value into the entry(i, j) of R.

67

−400 −300 −200 −100 0 100 200 300 4000

0.002

0.004

0.006

0.008

0.01

0.012

0.014Parasitic capacitance PDF comparison with different model

Parasitic Capacitance

Pro

babi

lity

Den

sity

Linear Model

Quadratic Model

Monte Carlo

Figure 3.5: First and second order capacitance models and their comparisons with Monte Carlo

method for the bus2× 2 benchmark (σ = 20%).

68

−400 −300 −200 −100 0 100 200 300 4000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1Second order parasitic capacitance CDF comparison

Parasitic Capacitance

Pro

babi

lity

2 factors

5 factors

10 factorsMonte Carlo

< 3% error

−400 −300 −200 −100 0 100 200 300 4000

0.002

0.004

0.006

0.008

0.01

0.012

0.014Second order parasitic capacitance PDF comparison

Parasitic Capacitance

Pro

babi

lity

Den

sity

2 factors

5 factors 10 factors

Monte Carlo

Figure 3.6: Second order parasitic capacitance modeling with different number of factors and

the comparison with Monte Carlo method for bus2× 2 benchmark.

69

2× 2 Bus

Method Time Meanµ Std Variationσ Skewnessη

M.C. 1826 -78.56 106.01 1.868

QuadMod 9.78 -81.43 103.64 1.927

Speedup/Err 186.7× 3.7% 2.2% 3.2%

4× 4 Bus

Method Time Meanµ Std Variationσ Skewnessη

M.C. 4673 -194.89 85.62 -1.78

QuadMod 16.88 -192.45 83.78 -1.72

Speedup/Err 276.8× 1.3% 2.1% 3.4%

6× 6 Bus

Method Time Meanµ Std Variationσ Skewnessη

M.C. 8568 -195.49 89.34 -1.42

QuadMod 69.56 -190.71 85.52 -1.37

Speedup/Err 123.2× 2.4% 4.3% 3.5%

Table 3.1: Simulation runtime comparison for bus crossing benchmark. (1) Monte Carlo (M.C.);

(2) Quadratic Model (QuadMod).

70

Chapter 4

Fast Analytic Lithography Simulation –

LITHSIM

In Chapter 3, we present statistical capacitance extraction algorithm StatCap for considering

geometry fluctuation induced by process variations. To study the real process variation intro-

duced in the lithography process, in this chapter, we propose an analytic optical projection sys-

tem simulation algorithm by using a simplified lithography system model to directly generate

photomask image on the wafer.

Integrated circuits are made using optical lithography, a process similar to photographic

printing, in which the patterns that will become layers of an integrated circuit are exposed on

a semiconductor wafer, one layer at a time [50]. During the lithography process, the wafer is

first spin-coated with photoresist, which is a light-sensitive organic polymer. The photoresist

is exposed to ultra violet light passing through the photomask which composes of opaque and

71

transparent regions. For a positive photoresist, exposed areas become soluble and non-exposed

areas remain hard. The soluble photoresist is chemically removed (development) from wafer

surface and the patterned photoresist will serve as an etching mask for the silicon dioxide.

3. Development

Substrate

1. Photoresist coating

SiO 2

Photoresist

Substrate

Substrate

Mask

Ultra violet light Opaque

Exposed Unexposed

2. Exposure

Figure 4.1: General optical lithography process: (1) Photoresist coating (2) Exposure (3) De-

velopment.

Optical lithography is widely used for mass production of ultra large scale integrated (ULSI)

devices mainly due to its superiority in economic terms [36–42]. However, as the sub-wavelength

gap (Fig. 4.2) between the wavelength of light used in the lithography projection system and

the size of the features is growing, lithography has become the bottleneck controlling the device

scaling, circuit performance, and magnitude of integration for silicon semiconductors.

Furthermore, the sub-wavelength gap gives rise to optical distortions which manifest them-

selves in the form of unprinted patterns and distorted geometries. These so-called optical prox-

imity effects may cause significant performance degression, or even worse, cause missing, in-

complete, or shorted structures that result in hard failure [32]. Therefore, efficient lithography

simulation algorithms have been found indispensable in the investigation of optical proximity

effects and the application of non-equipment based resolution enhancement techniques (RET),

72

Figure 4.2: Subwavelength gap between IC future size and light wavelength. (Picture courtesy

of Numerical Technologies, Inc and Synopsis, Inc.)

e.g. phase shifting mask (PSM) [43–46] and optical proximity correction (OPC) [47], to achieve

improved performance in the sub-wavelength realm [35].

photoresist is exposed to ultra violet light passing through the photomask which a positive

photoresist, exposed areas become soluble and removed (development) from wafer surface

superiority in economic terms. However, as the minimum feature sizes required for fabri-

cation of packing density increasing, the wavelength of the light used to project the resolve the

ever-shrinking details of each generation of reduced exposure wavelength, but shortcoming of

projected mask shapes to vary depending upon the density, size, and location of nearby prox-

imity effects manifest themselves in the form of unprinted patterns and performance degression,

or even worse, cause missing, subwavelength gap poses significant intentions.

73

realm. Phase shifting mask (PSM) enables significantly smaller geometries; optical prox-

imity fix subwavelength distortions. Combined, these technologies offer of existing, available

optical lithography

To perform the aerial image calculation, photomasks must be sampled before being repre-

sented on the computer. However, one of the main challenges to using general purpose aerial

image simulators, such as SPLAT [48], in IC applications is the formidable size of the data rep-

resenting a typical mask pattern [49]. To illustrate this point by example consider a moderately

sized IC that occupies1cm × 1cm of silicon with a minimum feature size of1µm. A fairly

sparse sample spacing of0.25µm along each side immediately results in1.6 × 109 points to

represent the image of the chip. Hence, it is extremely important to reduce both the execution

time and the memory consumption of lithography simulation in VLSI applications.

Furthermore, point sampling of the mask in the spatial domain is mathematically attained

by multiplying the mask function by an Dirac delta impulse lattice in a finite domain. Multi-

plication by the Dirac delta function in the spatial domain corresponds to the convolution of

the frequency spectrum of the mask function with the Fourier transformation of the impulse

lattice, which is another delta function, in the frequency domain. Therefore, the spectrum of the

sampled mask function consists of replica of the spectrum of the photomask. Since the spec-

trum of the mask function is not bandlimited, the high frequency components are mixed into

the low frequency components and aliasing error takes place. When aliasing takes place, it is

not possible to distinguish the high frequency parts from low frequency parts, because they are

tightly combined with each other. Consequently, the final calculated image may then be a poor

74

approximation of the real mask image [33].

In this chapter, we propose a fast mask image calculation algorithm, LithSim. By exploit-

ing the regular structure within general IC masks, LithSim provides a close-form formula to

directly generate mask images. One of the main advantages of LithSim is that LithSim does

not require the sampling of the mask and can directly calculate the intensity for an arbitrary

point on the image plane, and hence it eliminates the aliasing error in the discretization process.

Furthermore, by discovering the fact that the mask image is the summation of images of many

simpler structures, the entire mask function can be represented in terms of those regular struc-

tures instead of sampling points, so that the memory consumption is greatly reduced. A careful

analysis demonstrates that the complexity of LithSim is proportional to the total number of cal-

culation points while the traditional discrete Fourier transformation approach is of complexity

O(nlogn).

4.1 Preliminaries

Optical lithography comprises four basic elements: an illumination system, a reticle, an expo-

sure system, and a wafer coated with photoresist.

The illumination system, which consists of a light source and a condenser lens, plays an

important role in the lithography modeling process. The illumination system delivers light to

the mask with the specified intensity, uniformity, spectral characteristics, and spatial coherence.

Traditional optical lithography uses circular light source to maintain directional uniformity such

75

that the same feature are replicated identically regardless of their orientations [51]. However,

circular light source is partial coherent, which is the main obstacle for efficient lithography

simulation. To simplify our discussion, we first assume that the light source is a point source,

which will lead to a simplified optical system model. After the simplified model is obtained, we

will extend it to consider more general light sources and the well-known Hopkins model will

be introduced.

4.1.1 Simplified Projection System Model

By using the Kohler’s method, the point light source is placed in the focal plane of the condenser

and the rays therefore illuminate the mask as a parallel beam as shown in Fig. 4.3.

2. Low Pass Filter

Projection Lens

Mask

Pupil

Numerical Aperture

Photoresist

1. Fourier Transformation

3. Inverse Fourier Transformation

Parallel Illumination

a

Figure 4.3: Generic exposure system in optical projection lithography.

Once the light passes through the mask, Fraunhofer diffraction effects come into play. Be-

fore applying resolution enhancement techniques, the mask can be described by a two dimen-

76

sional mask function

f(x, y) =

1 in clear regions

0 in opaque regions

(4.1)

After the mask diffracts the light, energy transmitted through the photomask forms a distribution

at the entrance to the pupil plane and can be described by the Fraunhofer diffraction integral in

the far field region, which is equivalent to the Fourier transformation of the mask function:

F (fx, fy) =

∫ +∞

−∞

∫ +∞

−∞f(x, y)ej(fxx+fyy)dxdy (4.2)

where

fx = κx′/R, fy = κy′/R (4.3)

are called spatial angular frequencies of the diffraction pattern.κ = 2π/λ is the spatial fre-

quency of the illumination andR is the distance between the mask and the surface of the pupil

plane.

From Eq. 4.3, low spatial frequency components closer to the center to the pupil pass

through the pupil plane, while high frequency components near the peripheral of the pupil are

cut off. Therefore, the pupil acts as a low pass filter that truncates high frequency components

from the spectrum of the mask function. For a pupil with radiusa, the pupil function in the

frequency domain can be described as [31]:

P (fx, fy) =

1√

f 2x + f 2

y ≤ κa/R = κ×NA

0 otherwise

(4.4)

77

NA is defined as the numerical aperture of the pupil.

After the light passes through the pupil, the objective or projection lens is required to collect

as much of the diffract light as possible and focus it onto the resist layer on the wafer. Due to

the Fourier transforming property of the lens, the light field transmitted through the condenser

lens can be represented as:

ε(x, y) = F−1Ff(x, y)P (fx, fy) (4.5)

Eq. 4.5 is the mathematical model commonly used to describe the projection exposure system.

The irradiance, which is the average energy per unit area per unit time, is then proportional to

the square of the amplitude of the light field in Eq. 4.5:

I(x, y) = ‖ε(x, y)‖2 (4.6)

4.1.2 General Lithography System Model

Eq. 4.5 is the simplified model for an optical projection system since we assume that the light

rays come from a single point light source and become parallel after passing through the con-

denser. In this scenario, the illumination system is completely coherent. However, in reality, the

photomask is illuminated by light rays traveling in different directions since the light source is

circular instead of a point and hence the illumination system is partially coherent. Partial coher-

ent illumination improves the theoretical resolvable minimum feature but makes the projection

system model much more complicated.

In previous subsection, the point source is assumed to be on the axis and the correspond-

78

ing spectrum of the photomask isF (fx, fy) as shown in Eq. 4.2. For a general point light

source which is located off-axis, if we assume that the optical system is shift-invariant, the light

intensity then will be a shifted version of the one described by Eq. 4.5:

ε(x, y, fx, fy) = F−1F (fx − fx, fy − fy)P (fx, fy) (4.7)

wherefx andfy are determined by the location of the off-axis point light source.

The shift of the spectrum of the photomask is equivalent to shift the pupil function in the

frequency domain as shown in Fig. 4.4. For a light source containing many off-axis point

f x f x

f y f y f y ^

f x ^

Figure 4.4: Shift photomask spectrum is equivalent to shift pupil function.

sources, the light fields generated by each pair of point light sources, which produce waves

traveling in different directions, interference with each other, and hence the Hopkins model is

obtained

I(x, y) =

∫...

∫J(fx, fy)P (fx + f ′x, fy + f ′y)P

∗(fx + f ′′x , fy + f ′′y )

F (f ′x, f′y)F

∗(f ′′x , f ′′y )e−i2π[(f ′x−f ′′x )x+(f ′y−f ′′y )y]dfxdfydf ′xf′yf

′′x f ′′y . (4.8)

J(fx, fy) is the effective light source, which is the image of the illumination source on the pupil

79

plane in the absence of the photomask. Therefore,J(fx, fy) is basically the spectrum of the

light source. For a circular illumination light source, the effective sourceJ(fx, fy) will fill in a

circle with radiusσ in the pupil plane and can be represented as:

J(fx, fy) =

1πσ2 if

√f 2

x + f 2y ≤ σ

0 otherwise

. (4.9)

whereσ is refer to as the partial coherent factor.

As we can see from the Hopkins model in Eq. 4.8 that each pair of shifted photomask

spectrum is weighted by a factor known as the transmission cross-coefficient (TCC):

TCC(f ′x, f′y, f

′′x , f ′′y ) =

∫ ∫J(fx, fy)P (fx + f ′x, fy + f ′y)P

∗(fx + f ′′x , fy + f ′′y )dfxdfy (4.10)

For a circular light source,J(fX , fy) is of circular shape as defined in Eq. 4.9.P (fx+f ′x, fy+f ′y)

andP (fx+ f ′′x , fy + f ′′y ) are shifted pupil functions which are also circles centered at(−f ′x,−f ′y)

and(−f ′′x ,−f ′′y ) respectively. So TCC is the overlap area intersected by those circles as shown

in Fig. 4.5. Based on the definition of TCC, the Hopkins model can then be rewritten as:

I(x, y) =

∫ ∫ ∫ ∫TCC(f ′x, f

′y, f

′′x , f ′′y )F (f ′x, f

′y)F

∗(f ′′x , f ′′y )e−i2π[(f ′x−f ′′x )x+(f ′y−f ′′y )y]f ′xf′yf

′′x f ′′y

Therefore, TCC couples the inverse Fourier transformations of two shifted photomask spectrum

together and hence greatly increases the complexity of lithography simulation.

4.2 LithSim Algorithm

LithSim is based on the simplified model in Eq. 4.5 by assuming that the light rays illuminating

the photomask are parallel. The main computational advantages of LithSim we propose are

80

f x

f y

( , ) x ^

-f'' -f'' y ^

( , ) x ^

-f ' -f ' y ^

TCC

Effective light source

Shifted pupil fuction I

Shifted pupil fuction II

Figure 4.5: Transmission cross-coefficient (TCC).

realized by exploiting the structure inherent in IC mask patterns. Although features on the

photomasks have a wide variety of shapes and dimensions, most of them can be approximated

by one of the three types: line, spaces, and contacts. As shown in Fig. 4.6, IC masks can be

decomposed into rectangular slits with different width, height and location.

Mathematically, assume thatf(x, y) is the mask function, it can be represented as a summa-

tion of N much simpler slit functions, each of them corresponding to a simple two dimensional

slit:

f(x, y) =N∑

i=1

fi(x, y) (4.11)

81

Figure 4.6: Mask decomposition.

where

fi(x, y) =

1 x0i ≤ x ≤ x1

i andy0i ≤ y ≤ y1

i

0 otherwise

(4.12)

Let p(x, y) be the inverse Fourier transformation of the pupil functionP (fx, fy) in the spa-

tial domain and substitutingf(x, y) =∑N

i=1 fi(x, y) into the image formulation equation, we

obtain that:

ε(x, y) = F−1Ff(x, y)P (fx, fy)

=N∑

i=1

F−1Ffi(x, y)Fp(x, y) (4.13)

By applying the convolution theorem, Eq. 4.13 can be further simplified to:

ε(x, y) =N∑

i=1

F−1Ffi(x, y) ~ p(x, y)

=N∑

i=1

fi(x, y) ~ p(x, y) (4.14)

82

Therefore, the real image functionε(x, y) of maskf(x, y) composing ofN slit function

fi(x, y) is the algebraic summation ofN εi(x, y), which is the image formulated by an individ-

ual slitfi(x, y):

εi(x, y) = fi(x, y) ~ p(x, y)

=

∫ +∞

−∞

∫ +∞

−∞fi(u, v)p(x− u, y − v)dudv

=

∫ x1i

x0i

∫ y1i

y0i

p(x− u, y − v)dudv (4.15)

Consequently, if we can efficiently compute the image of a single slit by using Eq. 4.15, the

entire complex image of an arbitrary mask can be obtained by the superposition theorem. Since

the shape of the pupil is much simpler than that of the mask, the convolution in Eq. 4.15 can be

obtained explicitly.

4.2.1 Rectangular Pupil

First we consider a rectangular pupilP (fx, fy), which can be represented in the frequency

domain as:

P (fx, fy) =

1 fx ≤ |Kx| andfy ≤ |Ky|

0 otherwise

(4.16)

Its inverse Fourier transformation in the spatial domain is then as follows:

p(x, y) =1

(2π)2

∫ Kx

−Kx

∫ Ky

−Ky

e−j(fxx+fyy)dfxdfy

=1

π2KxKysinc(Kxx)sinc(Kyy) (4.17)

83

wheresinc(x) is the well-known sinc function defined assinc(x) = sin(x)/x. Sinc function

basically is a sinusoidal function modularized by1/x, and hencesinc(x) = 1 whenx = 0 and

sinc(x) = 0 whenx →∞.

−20

−10

0

10

20

−20

−10

0

10

20−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

x

An inverse Fourier transformation p(x,y) of a rectangular pupil function P(fx,f

y)

y

p(x,

y)

Figure 4.7: Inverse Fourier transformation of a rectangular pupil.

By substituting Eq. 4.17 into Eq. 4.15 and given the slit functionfi(x, y) in Eq. 4.12, we

get

εi(x, y) =1

π2

∫ Kx(x−x0i )

Kx(x−x1i )

sinc(u)du

∫ Ky(y−y0i )

Ky(y−y1i )

sinc(u)du

=1

π2[Si(Kx(x− x0

i ))− Si(Kx(x− x1i ))][Si(Ky(y − y0

i ))− Si(Ky(y − y1i ))]

Fig. 4.8 shows the image of a square slit calculated by using Eq. 4.18.

Si(x) in Eq. 4.18 is the sine integral function defined as

Si(x) =

∫ x

0

sinc(u)du (4.18)

84

−20

−10

0

10

20

−20

−10

0

10

20−0.2

0

0.2

0.4

0.6

0.8

1

1.2

x

Image Function εi(x,y) for a 1µm×1µm square.

y

Figure 4.8:εi(x, y) of a1µm× 1µm slit.

Therefore, efficient calculation of image formed by slitfi(x, y) and a rectangular pupil depends

on whether we can efficiently solve the sine integral. Fortunately, sine integral has been exten-

sively studied in Mathematics due to its great importance in Fourier analysis.

Three methods are generally used in the literature to calculate sine integral function. (1)

Taylor expansion; (2) Chebyshev expansion; (3) Spline curve fitting. By comparison of these

methods, the first method, e.g. Taylor expansion is adopted in our scenario due to its easy

representation and sufficient accuracy. The sine integral can be expanded into an infinite Taylor

series as follows:

Si(x) =

∫ x

0

sinc(u)du =∞∑

k=1

(−1)k−1 x2k−1

(2k − 1)(2k − 1)!(4.19)

By using Eq. 4.19, Eq. 4.18 turns out to be an analytic formula to calculate images in the case

that the pupil is rectangular.

85

−30 −20 −10 0 10 20 30−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

X

Sin

e In

tegr

al

π /2

−π /2

Figure 4.9: Waveform of sine integral function.

4.2.2 Circular Pupil

For a circular pupil which is used in most projection optical lithography system, it can be repre-

sented in the frequency domain as in Eq 4.4. First, we need to calculate its correspondingp(x, y)

in the spatial domain. The evident circular symmetry suggests the use of polar coordinates, and

so Let

fx = kcosα fy = ksinα x = rcosθ y = rsinθ (4.20)

By switching to polar coordinates, we get

p(r, θ) = F−1P (k, α)

=1

(2π)2

∫ a

0

∫ 2π

0

e−jkrcos(α−θ)kdαdk (4.21)

Inasmuch asP (k, α) is circularly symmetric, its inverse Fourier transform must be circularly

86

symmetric as well, which implies thatp(r, θ) is independent ofθ. So the integral can be simpli-

fied by lettingθ equal some constant value, which we choose to be zero, whereupon,

p(r) =1

(2π)2

∫ a

0

k∫ 2π

0

e−jkrcosαdαdk (4.22)

The quantity which arises quite frequently in the Mathematics of physics

J0(u) =1

∫ 2π

0

ejucosαdα (4.23)

is known as the Bessel Function (of the first kind) of order zero. More generally,

Jm(u) =i−m

∫ 2π

0

ej(mα+ucosα)dα (4.24)

represents the Bessel function of orderm. Another general property of Bessel functions, refer

to as a recurrence relation, is

d

du[umJm(u)] = umJm−1(u) (4.25)

Whenm = 1, this clearly leads to

∫ u

0

wJ0(w)dw = wJ1(u) (4.26)

By using the recurrence relation of Bessel function, Eq. 4.22 can be simplified to

p(r) =a2

J1(ra)

ra(4.27)

Sincer =√

x2 + y2, the inverse Fourier transformation represented in the rectangular coordi-

nates is

p(x, y) =a2

J1(√

x2 + y2a)√x2 + y2a

(4.28)

87

By using Eq. 4.28 and substituting the Taylor expansion of the Bessel function

J1(x) =∞∑

k=0

(−1)k (x/2)2k+1

k!(k + 2)!(4.29)

into Eq. 4.15, we get:

εi(x, y) =a2

∞∑

k=0

(−1)k(a/2)2k

k!(k + 2)!

k∑i=0

i

k

u2k−2i+1|x−x0x−x1

v2i+1|y−y0y−y1

(2k − 2i + 1)(2i + 1)

(4.30)

4.2.3 LithSim Simulation Flow

The main advantage of LithSim is that we have a close-form formula to calculate the intensity

at an arbitrary point on the image plane, thus we avoid the sample process of mask and pupil,

and hence eliminate the aliasing error introduced in the discretization.

For each slit, we can adopt a windowing method to greatly reduce the calculation cost. Fig.

4.8 shows that the irradiance becomes very small as the calculation point gets far away from

the slit. Therefore, we only need to calculate nearby regions surrounding that slit (Fig. 4.10).

Assume the total number of calculation points on the image plane isN , then the complexity of

LithSim will be O(cN), wherec is a constant depending on the windowing size we use.

Furthermore, sine integral and Bessel integral can be tabulated to avoid repeated calculation.

As a summary, LithSim simulation flow is shown in Fig. 4.11.

88

1 2 3

4 5

5

4

1 2 3

Figure 4.10: Windowing method to reduce computational cost.

4.3 Experimental Results

LithSim is implemented inC + + language and Matlab. All experiments are executed on a

Pentium(R) 4 CPU 1.4GHz machine with 1GB RAM. We also implement the discrete FFT

(DFFT) and discrete convolution (DCONV) in Matlab and compare the three algorithms with

respect to continuous convolution (CCONV).

First, the irradiance matrix of a simple mask containing three parallel slits is calculated by

using the above four methods. The width and the height of three slits are3µm and17µm respec-

tively. The edge-to-edge spacing between slits is5µm. The cut-off angular spatial frequency of

the pupil is set to0.86 cycles perµm. Compared to the continuous convolution, for this small

test case, the discretization based methods, discrete FFT shows above10% error and discrete

convolution exhibits about8% error. From Fig. 4.12, we can see that the intensity generated by

DFFT andDCONV exhibits excessive higher peaks, which is related to the high frequency

components mixed into the low frequency parts introduced in the sampling process. On the

89

Mask PatternSpecification(GDSII, CIF)

Mask PaternDecomposition

Generate ComputationWindow of Slit i

Slit List

LUT of SineIntegral and

Bessel Integral

For each point(x,y) in window i,calculate

i(x,y)

End

Y

N

width and height of slit i

Finish All Slits ?

x and y

i(x,y)

Figure 4.11: LithSim Optical Lithography Simulation Flow.

country, LithSim avoids the discretization and hence naturally eliminates the aliasing error. For

this small test case, LithSim shows less than1% error as shown in Fig. 4.13. From Fig. 4.14,

we can see that the image calculated by LithSim almost cannot be distinguished from the one

generated using continuous convolution.

90

0

10

20

30

0

10

20

300

0.5

1

1.5

Discrete Fourier Transformation

0

10

20

30

0

10

20

300

0.5

1

1.5

Discrete Convolution

0

10

20

30

0

10

20

300

0.5

1

1.5

LITHSIM

0

10

20

30

0

10

20

300

0.5

1

1.5

Continious Convolution

Figure 4.12: Irradiance calculated by using discrete Fourier transformation, discrete convolu-

tion, LithSim, and continuous convolution.

91

0

10

20

30

0

10

20

300

0.05

0.1

0.15

0.2

Discrete Fourier Transformation

0

10

20

30

0

10

20

300

0.02

0.04

0.06

0.08

0.1

0.12

Discrete Convolution

0

10

20

30

0

10

20

300

2

4

6

8

x 10−3

LITHSIM

Figure 4.13: Errors in irradiance matrices calculated by using discrete Fourier transformation,

discrete convolution and LithSim compared to continuous convolution.

92

Discrete Fourier Transformation

5 10 15 20 25

5

10

15

20

25

0.2

0.4

0.6

0.8

1

1.2

Discrete Convolution

5 10 15 20 25

5

10

15

20

25

0.2

0.4

0.6

0.8

1

1.2

LITHSIM

5 10 15 20 25

5

10

15

20

25

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

1.1

Continious Convolution

5 10 15 20 25

5

10

15

20

25

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

1.1

Figure 4.14: Images (contours) calculated by using discrete Fourier transformation, discrete

convolution, LithSim, and continuous convolution.

93

111× 111 Points, 42 Slits

Algorithm LITHSIM DFFT DCONV

Execution Time (s) 1.359 7.31 6.92

Percentage Error (%) 0.63 8.74 6.57

759× 759 Points, 75 Slits

Algorithm LITHSIM DFFT DCONV

Execution Time (s) 1.671 187.6 137.5

Percentage Error (%) 0.86 9.26 8.47

4351× 4351 Points, 167 Slits

Algorithm LITHSIM DFFT DCONV

Execution Time (s) 3.546 > 1h > 1h

Percentage Error (%) 0.82 9.67 8.78

10239× 10239 Points, 393 Slits

Algorithm LITHSIM DFFT DCONV

Execution Time (s) 28.25 > 1h > 1h

Percentage Error (%) 0.74 8.92 8.35

Table 4.1: Extraction time and error comparison.

94

Chapter 5

Efficient Inductive Effect Extraction with

Lossy Substrate – EPEEC

5.1 Introduction

The industry trend of integrating higher levels of circuit functionality on one chip and the wide-

spread growth of wireless communication have triggered the proliferation of mixed analog-

digital systems. However, the development of efficient interconnect models for such a system

is made more difficult because of the lossy nature of the silicon substrate. In particular, the

creation of substrate eddy currents can lead to considerable interconnect inductive and ohmic

losses. As the behavior of on-chip interconnects becomes a dominant factor in overall circuit

performance at high frequencies, an interconnect system analysis without considering the lossy

substrate effects will result in an over-designed network and seriously waste chip resources [52].

95

With the increasing clock frequency and integration density, intentional and unintentional

inductive effects gradually rise in VLSI design. Inductance computation is a difficult task since

inductance depends on the current return path, which is unknown prior to the extraction and

simulation of a circuit model [53–55].

Fortunately, the PEEC method has been widely adopted to deal with this issue [56]. How-

ever, since PEEC assumes that each conductor segment has a current return path at infinity,

inductive couplings are now among all conductor segments, so that extremely dense partial in-

ductance matrices are usually generated. For this reason, the reluctance-based method [57, 58]

has been proposed by Hao Ji et al to alleviate this problem. Since reluctance has higher de-

gree of locality similar to capacitance, only a small number of neighbors need to be considered.

Consequently, the reluctance matrix for circuit simulation is very sparse compared to the partial

inductance matrix.

Moreover, the traditional PEEC approach does not take substrate effects into consideration,

and hence cannot capture inductive and ohmic losses due to the formation of eddy currents

in the conductive substrate. Although several previous works have been proposed to resolve

this issue by constructing three dimensional linear substrate models, such as [59–65], most

of these approaches are based on the numerical finite difference method. With the roaring

clock frequency and the reduced substrate resistivity, a large volume of silicon bulk needs to be

spatially discretized into very tiny cells to capture the substrate effects accurately. Therefore, the

obtained equivalent circuit models are extremely prohibitive in sizes since inductive couplings

are now among all conductor segments and substrate cells.

96

In this chapter, we propose EPEEC, an accurate, compact, and efficient interconnect model-

ing methodology to extend the PEEC model to consider multi-layer substrates based on complex

image theory [66], which has recently been used in RFIC regime to consider microstrips and

spiral conductors over a single layer substrate [67–69]. To deal with multi-layer substrates, we

present the detailed methodology to derive the effective complex distance (ECD) between phys-

ical conductors and their corresponding complex images by preserving the first moment of the

analytic vector potential formulation. The EPEEC model is obtained by modifying PEEC with

mutual inductances between physical and image conductors separated by the effective complex

distance. Since EPEEC reflects the substrate effects in resistance and inductance values di-

rectly based on the configuration of substrate instead of applying discretization, it leads to very

compact models for interconnects.

For modeling even larger scale interconnect systems, EPEEC is enhanced to extract re-

luctance instead of inductance by applying an extended window-based reluctance extraction

algorithm. Furthermore, we propose a reluctance realization algorithm by directly converting

reluctances to circuit elements compatible with general circuit simulators, such as SPICE.

After validating the EPEEC model by comparison with the rigorous full-wave simulator,

SonnetR©, we use EPEEC to comprehensively study the impacts of frequency and substrate

configuration, such as thickness and conductivity, on interconnect models.

with describing the application of complex image theory to on-chip interconnects above

presents the EPEEC model based on the derived effective complex summary of our work (Sec-

tion V) conclude this

97

5.2 Electro-magnetic Formulation of Substrate Eddy Cur-

rent and Complex Image Theory

In this section, we explain the generation and the nature of eddy currents in a multi-layer sub-

strate. The effective complex distance can be obtained by preserving the first moment of the

analytic vector potential formulation. Then we discuss the application of complex image theory

to on-chip interconnects above a lossy multi-layer substrate.

5.2.1 Generation of Substrate Eddy Currents

Eddy currents in the substrate are caused by time-varying magnetic fields. If a time-varying

magnetic flux densityBf is induced by currents in interconnects, an electric fieldE is produced

in the substrate as

5× E = −∂Bf

∂t. (5.1)

The electric fieldE can be expressed in terms of the vector magnetic potentialA and the

scalar potentialφ by

E = −∂A

∂t−5φ. (5.2)

This electric fieldE in turn establishes currents flowing according to Ohm’s law

J = σE. (5.3)

Substituting Eq. 5.3 into Eq. 5.2 leads to

J = −σ(∂A

∂t+5φ). (5.4)

98

to the electrically induced currents.

These induced currents will produce another magnetic field according to Ampere’s Law

5×B = µ(J +∂D

∂t). (5.5)

By using Eq. 5.3 and applying the constitutive equationD = εE, the time-harmonic format of

Ampere’s Law can be expressed as:5 × B = µ(σE + jωεE). Since at current frequencies

of interest (< 20GHz),σ >> ωε, the second term representing the displacement currents is at

least three orders of magnitude smaller than the first term and can be safely ignored. Therefore,

Ampere’s Law in Eq. 5.5 can be simplified as

5×B = µJ. (5.6)

Since the magnetic flux densityB is solenoidal, we have5 · B = 0. Substituting Eq. 5.4

into Eq. 5.6 and applying vector identities5× (5×F) = 5(5·F)−52F and5×5φ = 0,

it can be obtained that

52B− µσ∂B

∂t= 0. (5.7)

Eq. 5.7 is referred to as the diffusion equation in terms of the magnetic flux densityB.

From Eq. 5.7, one can see that although the current arising from the electrical potential

φ in Eq. 5.4 could be as large as the current arising from the magnetic vector potentialA,

its contribution to the magnetic flux density can be ignored by noticing that5 × 5φ = 0.

Furthermore, since the magnetic flux densityB determines the magnetic fluxΦ, and hence

directly affects the line parameterL = Φ/I, we do not need to consider the current arising from

99

the electrical potentialφ [70,71], and in this scenario, Eq. 5.4 can be approximated as

J = −σ∂A

∂t. (5.8)

SubstitutingB = 5×A into Eq. 5.6 and adopting Coulomb gauge5 ·A = 0 leads to

52A = −µJ. (5.9)

By using Eq. 5.8 and Eq. 5.9, we get

52A− µσ∂A

∂t= 0. (5.10)

Eq. 5.10 is the diffusion equation of the vector potential in a medium subject to a time-varying

magnetic field.

5.2.2 Analytic Vector Potential within A Multilayer Substrate

Outside the diffusion/active areas and contact areas, the substrate can be treated as consisting

of uniformly-doped semiconductor-material layers of varying doping densities [61].

Assume that a long current filament is located distanceh above a multilayer substrate. Cur-

rent density within the filament is denoted byJf . The substrate consists ofn layers. The layer

k in the substrate has thicknesstk, conductivityσk, permeabilityµk, and is assumed infinite in

the traverse direction. Regions above and below substrate are free spaces. The configuration is

shown in Fig. 5.1.

For frequencies up to a few giga-Hertz, we can make magneto-quasi-static assumption. Un-

der this assumption, induced eddy currents within the substrate will be parallel to the filament.

100

nt n n Layer n

2t

2 2 Layer 2

1t

1 1 Layer 1

Substrate

h

fJ

y

xz

0

Figure 5.1: A current filament parallel to a multilayer substrate which contains different layers

of different thickness, conductivity, and permeability.

For a z-direction filament current, only the z-component ofA is nonzero, so that the problem

becomes two dimensional. By using Eqs. 5.9 and 5.10, we can obtain magnetic vector potential

diffusion equations in different regions

52A0(x, y) = −µ0δ(0, y − h)Jf Above Substrate,

52Ak(x, y) = jωµkσkAk(x, y) Within Substrate,

52An+1(x, y) = 0 Below Substrate,

(5.11)

wherek = 1, · · · , n. Ak denotes the vector potential within the substrate layerk.

Applying the method of separation variables and noticing the symmetry of the configuration

with respect to they axis [70,72], it can be shown that the general solution of Eqs. 5.11 is given

101

by

Ak(x, y) =

∫ ∞

0

[Mk(τ)eγky + Nk(τ)e−γky

]cos(τx)dτ, (5.12)

where

γk = (τ 2 + ζ2k)1/2,

ζk =√

jωµkσk. (5.13)

To solve vector potentials in the whole problem space, there are2(n+2) unknownMk’s and

Nk’s in Eq. 5.12. In order to obtain those coefficients, we need to apply boundary conditions at

different medium interfaces. Since the normal component of the flux density and the tangential

component of the field intensity are continuous, we obtain that for the boundary between the

substrate layerk andk + 1

Bk,y = Bk+1,y,

1

µk

Bk,x =1

µk+1

Bk+1,x. (5.14)

SinceB = 5×A and only the z-component ofA is nonzero, by using Eq. 5.12, thex and

y components of the magnetic flux density will be

Bk,x =

∫ ∞

0

[Mke

γky −Nke−γky

]γkcos(τx)dτ,

Bk,y =

∫ ∞

0

[Mke

γky + Nke−γky

]τsin(τx)dτ. (5.15)

By employing the boundary conditions in Eqs. 5.14, the coefficients of different substrate

102

layers can be shown to have the following relationship [73]

Mk+1

Nk+1

=

1

2

(1 + λk)e−αk (1− λk)e

−βk

(1− λk)e+βk (1 + λk)e

+αk

Mk

Nk

,

where

λk =µk+1

µk

· γk

γk+1

,

αk = (γk+1 − γk) · yk,

βk = (γk+1 + γk) · yk, (5.16)

andyk =∑k

i=1 tk are they coordinates of different interfaces.

Furthermore, by matching the magnetic flux generated by a current filament in free space,

the coefficientM0 can be obtained as

M0(τ) =µ0I

2π· e−hτ

τ. (5.17)

Also noticing that normally there is a ground plane underneath the substrate and fory →

−∞, the field must vanish, we get

Nn+1 = 0. (5.18)

So we haven + 1 interfaces and hence2(n + 1) boundary conditions to uniquely determine all

the rest2(n + 1) unknown coefficients in Eq. 5.12 by using Eq. 5.16.

Since our purpose is to study the substrate effects on interconnects, we are interested in the

vector potential in the region above substrate(k = 0). The solution of the vector potential in

103

this region can be shown to have the following general form

A0 =µ0I

∫[e−τ |y−h|

τ− Γ(τ)

e−τ(y+h)

τ]cos(τx)dτ (5.19)

Γ(τ) is known afterMk’s andNk’s are obtained using the above method.

5.2.3 Complex Image Theory and Its Application

It is observed that the integral in the analytic solution ofA0 in Eq. 5.19 has two terms. The first

term can be attributed to the currentJf following within the filament. The second term can be

attributed to the induced substrate eddy currents [66]. So the vector potential can be written as

A0(x, y) = Af0 −Ae

0, (5.20)

where

Af0 =

µ0I

∫e−τ |y−h|

τcos(τx)dτ, (5.21)

Ae0 =

µ0I

∫Γ(τ)

e−τ(y+h)

τcos(τx)dτ

=µ0I

∫Γ(τ)eτd e−τ(y+h+d)

τcos(τx)dτ. (5.22)

The similarity between these two terms suggests that eddy currents induced in the substrate

may be treated as an image filament current flowing aty = −(h + d) in the opposite direction.

This approximation holds when the coefficientΓ(τ)eτd is approximated by constant one. The

Taylor expansion ofΓ(τ)eτd at τ = 0 is given by

Γ(τ)eτd = Γ(0) + [Γ′(0) + Γ(0)d]τ + O(τ 2). (5.23)

104

Furthermore, by using symbolic mathematic tools, such as MathcadR©, to solveΓ(τ), one

can easily verify that

Γ(0) = 1. (5.24)

By preserving the first moment in Eq. 5.23,Γ(τ)eτd can be approximated by constant one

when

d = −Γ′(0). (5.25)

Therefore the multilayer substrate can now be substituted by a single image filament below

its corresponding physical filament with distanced + 2h, which is called the effective complex

distance (ECD). It is easy to show that ECD is uniquely determined by the substrate process pa-

rameters and the extraction frequency. One can use MathcadR© to solve ECD when the substrate

includes many layers.

5.3 Eddy-Current-Aware PEEC model: EPEEC

We have shown that the effect of a lossy multilayer substrate can be approximated by image

conductors, given currents in those conductors are evenly distributed. However, due to skin and

proximity effects at high frequencies, conductor segments have to be discretized into filaments

so as to account for the non-uniform current distribution [74] as shown in Fig. 5.2.

In order to calculate the total inductance for a particular filament, it’s necessary to com-

bine its physical and image filaments together [75]. After applying complex image theory, the

105

Physical

Conductors

Image

Conductors

d+2h

Figure 5.2: Eddy-current-aware PEEC model. Each conductor is further discretized to consider

the uneven distribution of currents.

effective complex inductance (ECI) between filamenti andj is given by

Lij = Lij − Lij′ . (5.26)

Lij is the inductance between the physical filamentsi andj and can be calculated by existing

close-form static inductance formulas, such as Hoer’s formula [76] and Grover’s formula [77].

Lij′ is the inductance between the physical filamenti and the image filamentj′.

Since the calculation ofLij′ depends on ECD, so thatLij′ will depend on frequency and

substrate parameters. Hoer’s formula can be accurately extended to calculate inductances of

rectangular filaments separated by complex distances.

Notice that although applying complex image theory doubles the computational complexity,

106

it will not increase the model size sinceLij′ is basically used to modify the value ofLij after

considering the lossy substrate effects.

The filament impedance matrixZ(ω)1 at frequencyω/2π can be expressed as follows

Z(ω) = RDC + jωL. (5.27)

L is the filament inductance matrix containingLij ’s by using Eq. 5.26.RDC is a diagonal

matrix including DC resistances of physical filaments.

5.3.1 EPEEC Interconnect Modeling Algorithm

For a complicated interconnect system, the number of passive elements will be huge if induc-

tance extraction is applied. Moreover, the discretization of conductors further increases the

model size. We will show that complex image theory can be easily combined with reluctance

extraction to generate compact interconnect models.

Most existing reluctance extraction tools are based on window selection algorithms [78,79].

Here we propose an extended window selection algorithm to handle both physical conductors

and their images.

We illustrate the algorithm in Table 5.1 by a simple example shown in Fig. 5.3. If the current

aggressor is conductor1, its neighboring conductors include3, 4, and5. Therefore, their image

conductors1′, 3′, 4′, and5′ are also included into the current neighboring group.

By using the extended window algorithm, we limit EPEEC to consider couplings within

1A little hat is used to distinguish the symbols for filaments from those for conductor segments.

107

BEGIN

For each conductor in the interconnect system

a. Applying a general window algorithm to select its neighboring

physical conductors;

b. Once one physical conductor is selected as a neighboring

conductor, its corresponding image is also selected.

END

Table 5.1: Extended Window Selection Algorithm.

neighboring conductor groups instead of the whole conductor system, and hence the computa-

tional complexity is significantly reduced.

For the neighboring group of conductori, assume it containsn segments and thekth con-

ductor is discretized intopk filaments, then the total number of filaments within the neighboring

conductor group will benf =∑n

k=1 pk. Let Zif (ω) ∈ Cnf×nf denote the filament impedance

matrix of this neighboring group with the consideration of substrate effects by using Eq. 5.27,

then

Zif (ω) · I i

f = V if , (5.28)

whereI if , V i

f ∈ Cnf are filament terminal current and voltage vectors, respectively.

Physically, a bundle of filaments within the same conductor segment can be treated as par-

allel branches. Merging parallel elements can be facilitated by using admittance instead of

impedance. To directly calculate the admittance of each conductor segment, assume the current

108

1

3'

2

2'

4'

4

5' 1'

3

5

Physical Conductors

Image

Conductors

d+2h

Figure 5.3: Extended window selection algorithm to simultaneously consider physical and im-

age conductors.

aggressor is conductori, we simultaneously set voltages along all itspi filaments to one while

others inV if to zero. The physical meaning of the current distributionI i

f by solving Eq. 5.28

is that: the summation of all the filament currents within the aggressor is the aggressor admit-

tance, while the summation of currents within one victim is the coupling admittance between

the aggressor and that victim.

Those obtained admittance values are composed of two parts

yij = gij + jxij, (5.29)

wheregij is the conductance andxij is the susceptance. Obviously, if we model each conduc-

tor segment as serially connected resistance and reluctance, the equivalent resistancerij and

109

reluctancekij can be synthesized as

rij =gij

g2ij + x2

ij

,

kij =(g2

ij + x2ij)

ωxij

. (5.30)

The detailed EPEEC interconnect modeling algorithm is summarized in Table 5.2.

5.3.2 SPICE Compatible Reluctance Realization

After constructing the resistance matrixR and the reluctance matrixK using the algorithm

in Table 5.2, circuit simulation is required to analyze those models. Unfortunately, traditional

circuit analysis tools cannot handle reluctance directly. Although [58] and [79] incorporate

the capability to simulate reluctance, significant modifications to traditional analysis tools are

inevitable. In this subsection, we present a reluctance realization algorithm to directly convert

reluctance to its mathematically and electrically equivalent circuit model, which only contains

self inductances and voltage control voltage sources (VCVS) [80].

For a general circuit containing reluctances, the branch equation of self and mutual reluc-

tances is given by

Ii =n∑

j=1

KijVj = KiiVi +n∑

j=1,j 6=i

KijVj (5.31)

whereKii is self reluctance andKij is the mutual reluctance betweenKii andKjj. By rear-

ranging the terms in Eq. 5.31, it can be written as:

Vi =1

Kii

Ii −n∑

j=1,j 6=i

Kij

Kii

Vj (5.32)

110

iiK jjK

ijK

iiK/1 jjK/1

iiij KK /

|+

|+

jjij KK /

iV

+

-

+

-

jV

jV iV

Figure 5.4: SPICE compatible model for reluctance. The original reluctance element is substi-

tuted by serial self inductance and VCVSs.

If we take1/Kii as a self inductance, the original voltage drop across the self reluctanceKii

can be viewed as the combination of the voltage drops across that inductance and some VCVSs.

These serial VCVSs are controlled by voltages on other self reluctances which are originally

coupled with the reluctanceKii. The gains of those VCVSs are determined byKij/Kii.

Therefore, Eq. 5.32 can be used to construct the SPICE compatible model for reluctances

shown in Fig. 5.4. The detailed reluctance realization algorithm is presented in Table 5.3. It

can be either combined into an extraction tool or programmed as a post-extraction software.

5.4 Experimental Results

Extensive experimental results are reported to show the efficiency and accuracy of our new

interconnect modeling approach EPEEC. All tests are run on a Pentium IV1.4GHz machine

111

with 768MB memory.

5.4.1 EPEEC Model Validation

To validate the new modeling approach and to illustrate the accuracy, we first compare in-

ductance and resistance values computed by complex image theory using Eq. 5.26 with Fas-

tHenry [74] and a more rigorous full-wave electromagnetic analysis tool, SonnetR©.

The experimental objects are two parallel conductor segments in a power/ground (P/G) net-

work in metal layer 6. They are made of copper with conductivity5.8× 107S/m. Both of them

are90µm long,1.2µm thick, and26µm wide. They are separated60µm apart. The substrate is

composed of two layers. The upper layer has conductivityσ1 = 100S/m while the lower layer

conductivityσ2 = 10000S/m. The thickness of the upper layer is20µm and the lower layer

100µm. The top area of the substrate is1cm×1cm. The distance between the substrate surface

and the bottom of these conductors is5.481µm. Underneath the substrate, there is a ground

plane. The detailed test configuration is shown in Fig. 5.5.

The self inductance of one wire is calculated and compared. Up to20GHz, EPEEC gives

inductance values that are very close to full-wave simulation results (within1.5% error) and

shows over100X speedup compared to FastHenry and SonnetR©.

5.4.2 Substrate Effects

As shown in Fig. 5.6, interconnect models are essentially frequency dependent. Besides fre-

quency, many other factors may also affect inductance and resistance values, such as conductor-

112

Substrate

20

100

60

90

261.2

1000

Gro

und

Pla

ne

Figure 5.5: Test configuration: two parallel copper interconnects above a two-layer substrate

(Length unit:µm).

substrate distance, substrate conductivity, and substrate thickness. The next set of experiments

investigates how those factors can impact parasitic values.

In order to minimize the skin and proximity effects, we select two thin signal lines in metal

layer 6. Both of them are90µm long,1.2µm thick, and0.6µm wide. They are separated1.2µm

apart. The substrate has the same configuration as the above test. Without considering the

substrate, i.e. in free space (PEEC), the self inductance and resistance are91.95pH and2.16Ω

respectively.

First, we discuss the substrate effect under different frequencies and with different conductor-

substrate distances. Figs. 5.7 and 5.8 show that the substrate effect becomes more evident under

higher frequencies and when conductors are getting closer to the substrate. The increased in-

ductive and ohmic substrate losses are expressed by smaller inductance and larger resistance

values. At10GHz and with conductor and substrate separated by10µm, the inductance value

becomes85.14pH which shows8% deviation from the value calculated in free space.

113

5 10 15 20

39.5

40.0

40.5

41.0

41.5

42.0

42.5

43.0

43.5

44.0

44.5

EPEECFastHenrySonnet

Se

lf In

du

cta

nce

(p

H)

Frequency (GHz)

Figure 5.6: Self inductance comparison by using three different extraction tools: FastHenry,

SonnetR©, and EPEEC.

Second, we show the impact of substrate conductivities of different layers. For a multilayer

substrate in real design, the upper layer is usually less conductive in order to facilitate the

functionality of the on-chip analog circuitry. Low conductivity prevents the generation of large

eddy currents in the upper layer. However, since low conductivity medium has large skin depth,

the electromagnetic field can easily penetrate through the upper layer to reach lower layers and

hence lower layers may have more significant effects on interconnect values.

We set the conductor-substrate distance to5.48µm at10GHz. To fairly compare two layers,

they are both set to50µm thick. From Fig. 5.9, one can see that the upper layer will have large

114

0

5

10

15

20 010

2030

4050

84

85

86

87

88

89

90

91

Distance (um)Frequency (GHz)

Sel

f Ind

ucta

nce

(pH

)

Figure 5.7: Self inductance decreases as frequency increases and conductor-substrate distance

decreases.

impact compared to the lower layer when two layers have the same conductivity. However,

if the upper layer conductivity is small, the low layer effect also cannot be ignored. When

σ1 = 200S/m, the upper layer has skin depth355.88µm which is much larger than its thickness.

In this scenario, ifσ2 = 1000S/m, the inductance value will be91.45pH. While changingσ2

to 10000S/m gives the inductance value87.05pH, which is5.1% different from the previous

value.

Therefore, although the upper layer normally has low conductivity, it determines to what

extent the lower layers affect interconnects. In the case that the upper layer thickness is smaller

115

0

5

10

15

20 010

2030

4050

2.15

2.2

2.25

2.3

2.35

2.4

2.45

2.5

2.55

Distance (um)Frequency (GHz)

Res

ista

nce

(Ω)

Figure 5.8: Resistance increases as frequency increases and conductor-substrate distance de-

creases.

than its skin depth, one cannot simply discard the effects from lower layers. To proof this, we

set the upper layer and the low layer conductivity to100S/m and10000S/m respectively, and

then we gradually increase the upper layer thickness to see what will happen on line parameters.

From Figs. 5.10, one can see that at a specific frequency, when the upper layer thickness

grows over its skin depth, increasing its thickness will not have further effects on interconnects.

In this experiment, since the upper layer has low conductivity, when the interaction between

interconnects and the low substrate layer is blocked by a thick upper layer, the overall substrate

116

0

2000

4000

6000

8000

10000

0

2000

4000

6000

8000

10000

82

84

86

88

90

92

Upper Layer Conductivity (S/m)Lower Layer Conductivity (S/m)

Sel

f Ind

ucta

nce

(pH

)

Figure 5.9: With the same conductivity, the upper layer substrate will have larger effect than the

lower layer. However, the lower layer cannot be ignored when the thickness of the upper layer

is less than its skin depth.

effect becomes insignificant.

5.4.3 Inductance vs. Reluctance

The next set of experiments is run to show the computational complexity and model size of

EPEEC compared to PEEC. The testing conductor system includes 604 conductor segments

which are in a P/G network located within metal layer 7 and 6. The substrate configuration is

the same as previous tests.

117

0

5

10

15

20

0

200

400

600

800

100084

85

86

87

88

89

90

91

92

Frequency (GHz)Upper Layer Thickness (µm)

Sel

f Ind

ucta

nce

(pH

)

Figure 5.10: Self inductance saturates when the thickness of the upper layer grows over its skin

depth.

As shown in Table 5.4, PEEC and EPEEC-L2 have identical model size, while the extraction

time of EPEEC-L is roughly doubled since we need to calculate inductances for both physical

and image conductors in Eq. 5.26. However, the model size and extraction time of the EPEEC-

R is greatly reduced. For larger interconnect systems, the improvement will be even more

significant.

Since substrate affects values of passive elements in the EPEEC model, it impacts the tran-

2EPEEC-L means we apply complex image theory while extracting inductance. EPEEC-R is obtained by

extracting reluctance using the algorithm given in Table 5.2.

118

1.0 1.2 1.4

0.8

0.9

1.0

1.1

1.2 V

oltage(V

)

Time(nS)

PEEC EPEEC-L EPEEC-R

Figure 5.11: Waveforms of transient responses by using different interconnect models: PEEC,

EPEEC-L, and EPEEC-R.

sient responses which are critical for timing and signal integrity analysis. To compare different

responses at different frequencies by using PEEC, EPEEC-L, and EPEEC-R models, we ran-

domly select one node in the above P/G network. Since PEEC model does not consider the

substrate, it only depends on geometries of conductors and is frequency independent. However,

at high frequencies, ignoring substrate will lead to significant errors in the transient response.

As shown in Fig. 5.11, at20GHz, the waveforms of PEEC and EPEEC-L exhibit about10%

difference, which is intolerable for the present interconnect modeling accuracy requirement.

119

On the contrary, the reluctance model EPEEC-R demonstrates much smaller model size while

maintaining less than1.5% error compared to EPEEC-L.

120

INPUT: An interconnect system includingn conductor segments;

Extraction frequencyf ;

Substrate parametersµk andσk.

OUTPUT: Resistance matrixR; Reluctance matrixK.

BEGIN

I. Discretize all conductor segments according to their geometries

and the extraction frequencyf .

II. For each conductori in the interconnect system, do the following:

a. Search its neighboring conductorsΥi by adopting the extended

window algorithm;

c. Calculate the filament impedance matrixZif with the

consideration of multilayer substrate effects by using Eq. 5.27;

d. Set entries in the voltage vectorV if corresponding to filaments

in conductori to one while others to zero;

e. Obtain the filament current distributionI if by solving Eq. 5.28;

f. The self admittance of conductori equals the sum of filament

currents within conductori; the summation of filament currents

in conductorj is the coupling between conductori andj;

g. Synthesize admittance into serial resistance and reluctance by

applying Eq. 5.30.

f. Stamp those values into parasitic matricesR andK respectively.

END

Table 5.2: EPEEC Interconnect Modeling Algorithm.

121

BEGIN

For each reluctanceKii between nodeni andnj in a given circuit

a. q=0;

b. Let nqi =ni;

c. For each self-reluctanceKjj that has mutual reluctanceKij

with self-reluctanceKii:

Connect one VCVS controlled byVj with gain−Kij/Kii

betweennqi andnq+1

i ;

q=q+1.

d. Connect inductance1/Kii betweennqi andnj.

END

Table 5.3: Reluctance Realization Algorithm.

Extraction Time Number of Passive Elements

PEEC 38.162s 92,639

EPEEC-L 91.547s 92,639

EPEEC-R 4.094s 2,794

Table 5.4: Extraction Time and Model Size Comparison.

122

Chapter 6

Conclusion

Moore’s law has being described the growth of the semiconductor industry for more that 35

years. The aggressive scaling of integrated circuits relies on an integration of the inter-layer

dielectric and metal layers. At the same time, the industry trend of integrating higher levels

of circuit functionality on one chip and process induced variations which directly impact the

geometry of on-chip interconnects has made the structure and hence the modeling of VLSI

interconnect more and more complicated.

This thesis presents some progress in the area of interconnect parasitic extraction and inter-

connect process variation modeling. First, this thesis presents ICCAP, a fast 3-D capacitance

extraction algorithm. ICCAP proposes a novel technique for sparsifying and reordering the

potential coefficient matrix. The sparse transformation is performed by simply switching ba-

sis from leaf panels to a new set of panels, thus cost-efficient preconditioners can be easily

constructed and hence greatly speedup iterative matrix solvers.

123

To take the process variation into consideration, this thesis presents a fast mask image simu-

lation algorithm LithSim to model the interconnect geometry variation introduced in the optical

lithography process. Then an efficient methodology StatCap for generating explicit statistical

representations of parasitic capacitances is proposed. StatCap applies principle factor analysis

to reduce the number of random variables while preserving the dominant global/local factors

that induce the conductor surface fluctuation due to process variations. The obtained quadratic

form can not only be used to directly generate parasitic capacitance probability distribution to

locate design corners, but it is also fully compatible with statistical model order reduction and

statistical timing analysis tools.

Finally, to model the inductive effects, we propose new frequency dependent interconnect

models, EPEEC, which considers lossy substrate by using complex image theory. EPEEC

is reluctance-based and is obtained by combining complex image theory with an extended

window-based reluctance extraction algorithm. Extensive simulation results demonstrate that

EPEEC have extremely high accuracy and result in significantly small model size.

We hope that by transferring those proposed algorithms into the realm of production, these

building blocks serve the goal of design for manufacturability in the state-of-the-art VLSI cir-

cuits and can improve the fabrication yield and circuit efficiency in the long term.

Bibliography

[1] S. Balakrishnan, J. H. Park, H. Kim, Y.-M. Lee, and C. C.-P. Chen. Linear time hierarchical ca-

pacitance extraction without multipole expansion.International Conference on Computer Design,

pages 98–103, Sept 2001.

[2] M. W. Beattie and L. T. Pileggi. Error bounds for capacitance extraction via window techniques.

IEEE Trans. CAD, 18:311–321, March 1999.

[3] X. Cai, K. Nabors, and J. White. Efficient galerkin techniques for multipole-accelerated capacitance

extraction of 3-d structures with multiple dielectrics.Advanced Research in VLSI, pages 200–211,

March 1995.

[4] W. Hong, W.-K. Sun, Z.-H. Zhu, H. Ji, B. Song, and W.-M. Dai. A novel dimension-reduction

technique for the capacitance extraction of 3-d vlsi interconnects.IEEE Transactions on Microwave

Theory and Techniques, 46:1037–1044, Aug 1998.

[5] T. Lu, Z. Wang, and W. Yu;. Hierarchical block boundary-element method (hbbem): a fast field

solver for 3-d capacitance extraction.IEEE Transactions on Microwave Theory and Techniques,

52:10–19, Jan 2004.

124

125

[6] Y. Yanhong and P. Banerjee. A parallel implementation of a fast multipole based 3-d capacitance

extraction program on distributed memory multicomputers.Proceedings. 14th International Par-

allel and Distributed Processing Symposium, pages 323–330, May 2000.

[7] W. Yu and Z. Wang. Enhanced qmm-bem solver for three-dimensional multiple-dielectric capaci-

tance extraction within the finite domain.IEEE Transactions on Microwave Theory and Techniques,

52:560–566, Feb 2004.

[8] Z. Zhu and W. Hong. A generalized algorithm for the capacitance extraction of 3d vlsi intercon-

nects.IEEE Transactions on Microwave Theory and Techniques, 47:2027–2030, Oct 1999.

[9] B. Krauter, Xia Yu, A. Dengi, and L.T. Pileggi. A sparse image method for bem capacitance

extraction.Proc. DAC, pages 357–362, June 1996.

[10] T. Sometani, “Image method for a dielectric plate and a point charge,”IOP, 2000.

[11] E. Weber,Electromagnetic Fields. John Wiley & Sons, 1950.

[12] A. Balanis,Advanced Engineering Electromagnetics. John Wiley & Sons, 1989.

[13] M. Beattie and L. Pileggi. Electromagnetic parasitic extraction via a multipole method with hierar-

chical refinement.Proc. ICCAD, pages 437–444, 1999.

[14] K. Nabors and J. White. Fastcap: a multipole accelerated 3-d capacitance extraction program.IEEE

Trans. on CAD, pages 1447–1459, 1991.

[15] J. Tausch and J. White. A multiscale method for fast capacitance extraction.Proc. DAC, pages

537–542, 1999.

126

[16] W. Shi, J. Liu, N. Kakani, and T. Yu. A fast hierarchical algorithm for 3-d capacitance extraction.

IEEE Trans. on CAD, pages 330–336, 2002.

[17] S. Yan, V. Sarin, and W. Shi. Sparse transformations and preconditioners for hierarchical 3-d

capacitance extraction with multiple dielectrics.Proc. DAC, pages 788–793, 2004.

[18] J. R. Phillips and J. White. A precorrected fft method for capacitance extraction of complicated 3-d

structures.IEEE Trans. CAD, pages 1059–1072, 1997.

[19] S. Kapur and D. E. Long. Ies3: A fast integral equation solver for efficient 3-dimensional extraction.

Proc. ICCAD, pages 448–455, 1997.

[20] P. Wrschka, J. Hernandez, G. S. Oehrlein, and J. King. Chemical mechanical planarization of

copper damascene structures.Journal of The Electrochemical Society, pages 706–712, 2000.

[21] Peng Li, F. Liu, Xin Li, L. T. Pileggi, and S. R. Nassif. Modeling interconnect variability using

efficient parametric model order reduction.Design Automation and Test in Europe, pages 958–963,

2005.

[22] E. Chiprout and T. Nguyen. Survey of model reduction techniques for analysis of package and

interconnect models of high-speed designs.IEEE 6th Topical Meeting on Electrical Performance

of Electronic Packaging, pages 251–254, 1997.

[23] B. N. Sheehan. Branch merge reduction of rlcm networks.Proc. ICCAD, pages 658–664, 2003.

[24] Hongliang Chang and S. S. Sapatnekar. Statistical timing analysis considering spatial correlations

using a single pert-like traversal.Proc. ICCAD, pages 621–625, 2003.

127

[25] L. Daniel, C. S. Ong, S. C. Low, K. H. Lee, and J. White. A multiparameter moment matching

model reduction approach for generating geometrically parameterized interconnect performance

models. IEEE Trans. on Computer-Aided Design of Integrated Circuits and Systems, 23(5):678–

693, May 2004.

[26] X. Li, J. Le, P. Gopalakrishnan, and L. T. Pileggi. Asymptotic probability extraction for non-normal

distributions of circuit. pages 2–9, 2004.

[27] Zhenhai Zhu, Alper Demir, and Jacob White. A stochastic integral equation method for modeling

the rough surface effect on interconnect capacitance.Proc. ICCAD, pages 887–891, 2004.

[28] R. L Gorsuch.Factor Analysis. Hillsdale, NJ, 1974.

[29] B. V. Gnedenko.Theory of Probability. Gordon and Breach Science Publishers, 1997.

[30] A.M. Mathai and Serge B. Provost.Quadratic Forms in Random Variables: Theory and Applica-

tions. New York Marcel Dekker, 1992.

[31] M. Born and E. Wolf.Principles of Optics. New York: Pergamon, 1980.

[32] P. Ghosh, C. shin Kang, M. Sanie, and D. Pinto. New dfm approach abstracts altpsm lithography

requirements for sub-100nm ic design domains.Proceedings. Fourth International Symposium on

Quality Electronic Design, pages 131–137, March 2003.

[33] J. Gomes and L. Velho.Image processing for computer graphics. Springer, 1996.

[34] E. Hecht.Optics. Addison-Wesley, 1998.

[35] A. B. Kahng. and Y. C. Pati. Subwavelength lithography and its potential impact on design and

eda.Design Automation Conference, pages 799–804, June 1999.

128

[36] L. R. Harriott, “Limits of lithography,”Proceedings of the IEEE, vol. 89, pp. 366–374, March 2001.

[37] G. Pugh, J. Canning, and B. Roman, “Impact of high resolution lithography on ic mask design,”

Custom Integrated Circuits Conference, pp. 149–153, May 1998.

[38] T. Brunner, “Pushing the limits of lithography for ic production,”Electron Devices Meeting, pp.

9–13, Dec. 1997.

[39] M. Sasago, “Lithography solutions for sub-0.1 m generations,”VLSI Technology, pp. 6–9, June

1998.

[40] L. V. den Hove, A. M. Goethals, K. Ronse, M. V. Bavel, and G. Vandenberghe, “Lithography for

sub-90nm applications,”Electron Devices Meeting, pp. 3–8, Dec. 2002.

[41] W. W. Flack and G. E. Flores, “Lithographic manufacturing techniques for wafer scale integration,”

Wafer Scale Integration, pp. 4–13, Jan. 1992.

[42] D. R. Huston and W. Sauter, “Mask stretching for next generation lithography masks,”IEEE Trans-

actions on Semiconductor Manufacturing, vol. 14, pp. 214–217, Aug. 2001.

[43] M. D. Levenson, N. S. Viswanathan, and R. A. Simpson, “Improving resolution in photolithography

with a phase-shifting mask,”IEEE Transactions on Electron Devices, pp. 1828–1836, Dec. 1982.

[44] Y. Liu and A. Zakhor, “Binary and phase shifting mask design for optical lithography,”IEEE Trans-

actions on Semiconductor Manufacturing, vol. 5, pp. 138–152, May 1992.

[45] B. J. Lin, “Phase-shifting masks gain an edge,”Circuits and Devices Magazine, vol. 9, pp. 28–35,

March 1993.

129

[46] Y. Liu, A. Zakhor, and M. A. Zuniga, “Computer-aided phase shift mask design with reduced

complexity,”IEEE Transactions on Semiconductor Manufacturing, vol. 9, pp. 170–181, May 1996.

[47] Z. Li and H. Nakagawa, “Performance-driven opc for mask cost reduction,”Proceedings of the 41st

SICE Annual Conference, vol. 2, pp. 917–920, 2002.

[48] D. Lee and A. R. Neureuther.SPLAT v5.0 User’s Guide. University California Press, Mar 1995.

[49] Y. C. Pati, A. A. Ghazanfarian, and R. F. Pease. Exploiting structure in fast aerial image computa-

tion for integrated circuit patterns.IEEE Transactions on Semiconductor Manufacturing, 10:62–74,

Feb 1997.

[50] F. Schellenberg. A little light magic.IEEE Spectrum, 40:34–39, Sep 2003.

[51] A. K.-K. Wong. Resolution Enhancement Techniques in Optical Lithography. Spie Press, 2001.

[52] R. Panda, S. Sundareswaran, and D. Blaauw, “On the interaction of power distribution network with

substrate,”International Symposium on Low Power Electronics and Design, pp. 388–393, August

2001.

[53] Z. He, M. Celik, and L. T. Pileggi, “SPIE: Sparse partial inductance extraction,”Proceedings of

Design Automation Conference, pp. 137–140, June 1997.

[54] M. W. Beattie and L. T. Pileggi, “Inductance 101: Modeling and extraction,”Proceedings of Design

Automation Conference, pp. 323–328, June 2001.

[55] K. Gala, D. Blaauw, J. Wang, V. Zolotov, and M.Zhao, “Inductance 101: Analysis and design

issues,”Proceedings of Design Automation Conference, pp. 329–334, June 2001.

130

[56] A. E. Ruehli, “Inductance calculatioin in a complex integrated circuit environment,”IBM Journal

of Research and Development, September 1972.

[57] A. Devgan, H. Ji, and W. Dai, “How to efficiently capture on-chip inductance effects:introducing

a new circuit element k,”IEEE/ACM International Conference on Computer Aided Design, pp.

150–155, November 2000.

[58] H. Ji, A. Devgan, and W. Dai, “KSIM: A stable and efficient rkc simulator for capturing on-chip

inductance effect,”Proceedings of Asia and South Pacific Design Automation Conference, pp. 379–

384, January 2001.

[59] L. M. Silveira and N. Vargas, “Characterizing substrate coupling in deep-submicron designs,”IEEE

Design and Test of Computers, vol. 19, pp. 4–15, March 2002.

[60] R. Gharpurey and R. G. Meyer, “Modeling and analysis of substrate coupling in integrated circuits,”

IEEE Journal of Solid-State Circuits, vol. 31, pp. 344–353, March 1996.

[61] B. R. Stanisic, N. K. Verghese, R. A. Rutenbar, L. R. Carley, and D. J. Allstot, “Address substrate

coupling in mixed-mode ic’s simulation and power distribution synthesis,”IEEE Journal of Solid-

State Circuits, vol. 29, pp. 226–238, March 1994.

[62] Y. Massoud and J. White, “Simulation and modeling of the effect of substrate conductivity on

coupling inductance,”IEEE Transactions on Very Large Scale Integration (VLSI) Systems, pp. 286–

291, June 2002.

[63] M. Liu, T. Yu, and W.-M. Dai, “Fast 3-d inductance extraction in lossy multi-layer substrate,”/ACM

International Conference on Computer Aided Design, pp. 424–429, November 2001.

131

[64] H. Ymeri, B. Nauwelaers, K. Maex, S. Vandenberghe, and D. D. Roest, “New analytic expressions

for mutual inductance and resistance of coupled interconnects on lossy silicon substrate,”Digest of

Silicon Monolithic Integrated Circuits in RF Systems, pp. 192–200, September 2001.

[65] T.-H. Chen, C. Luk, H. Kim, and C. C.-P. Chen, “SuPREME: Substrate and power-delivery

reluctance-enhanced macromodel evaluation,”International Conference on Computer Aided De-

sign, pp. 786–792, November 2003.

[66] P. R. Bannister, “Applications of complex image theory,”Radio Science, vol. 21, no. 4, pp. 605–616,

August 1986.

[67] R. Jiang and C. C.-P. Chen, “ESPRIT: A compact reluctance based interconnect model considering

lossy substrate eddy current,”IEEE MTT-S International Microwave Symposium Digest, vol. 3, pp.

1385–1388, June 2004.

[68] A. Weisshaar, H. Lan, and A. Luoh, “Accurate closed-form expressions for the frequency-

dependent line papameters of coupled on-chip interconnects on lossy silicon substrate,”IEEE

Transactions on Advanced Packaging, vol. 25, pp. 288–296, May 2002.

[69] D. Melendy and A. Weusshaar, “A new scalable model for spiral inductors on lossy silicon sub-

strate,” IEEE MTT-S International Microwave Symposium Digest, vol. 2, pp. 1007–1010, June

2003.

[70] J. A. Tegopoulos and E. E. Kriezis,Eddy Currents in Linear Conducting Media. Elsevier Publi-

cations, 1985.

[71] R. L. Stoll,The Analysis of Eddy Currents.Oxford, U.K. Clarendon.

132

[72] M. N. O. Sadiku,Numerical Techniques in Electromagnetics. CRC Publications, 2001.

[73] A. M. Niknejad and R. G. Meyer, “Analysis of eddy-current losses over conductive substrates with

applications to monolithic inductors and transformers,”IEEE Transactions on Microwave Theory

and Techniques, vol. 49, pp. 166–176, January 2001.

[74] M. Kamon, M. J. Tsuk, and J. K. White, “FastHenry: A multipole-accelerated 3-d inductance

extraction program,”IEEE Transactions on Microwave Theory and Techniques, vol. 42, pp. 1750–

1758, September 1994.

[75] A. Weisshaar and H. Lan, “Accurate closed-form expressions for the frequency-dependent line

papameters of on-chip interconnects on lossy silicon substrate,”IEEE MTT-S International Mi-

crowave Symposium Digest, vol. 3, pp. 1753–1756, May 2001.

[76] C. Hoer and C. Love, “Exact inductance equations for rectangular conductors with applications to

more complicated geomotries,”J. Res. Nat. Bureau of Standards, April 1965.

[77] F. W. Grover,Inductance calculations: Working Formulas and Tables. Dover Publications, 1946.

[78] G. Zhong, C.-K. Koh, V. Balakrishnan, and K. Roy, “An adaptive window-based susceptance ex-

traction and its efficient implementation,”Proceedings of Design Automation Conference, pp. 728–

731, June 2003.

[79] T.-H. Chen, C. Luk, H. Kim, and C. C.-P. Chen, “INDUCTWISE: Inductance-wise interconnect

simulator and extractor,”IEEE Transactions on Computer-Aided Design of Integrated Circuits and

Systems, vol. 22, pp. 884–894, July 2003.

133

[80] R. Jiang and C. C.-P. Chen, “SCORE: Spice compatible reluctance extraction,”Proceedings of

Design Automation and Test in Europe Conference and Exhibition, vol. 2, pp. 948–953, February

2004.