ccf.ee.ntu.edu.twccf.ee.ntu.edu.tw/~cchen/research/rong_thesis.pdfccf.ee.ntu.edu.tw
Transcript of ccf.ee.ntu.edu.twccf.ee.ntu.edu.tw/~cchen/research/rong_thesis.pdfccf.ee.ntu.edu.tw
Three-Dimensional Interconnect Modeling for Nano-Scale
VLSI Technologies
by
Rong Jiang
Dissertation submitted in partial fulfillment of
the requirements for the degree of
Doctor of Philosophy
(Electrical Engineering)
at the
UNIVERSITY OF WISCONSIN-MADISON
2006
i
Abstract
Designing high performance very large scale integration (VLSI) circuits has become more chal-
lenging than ever due to deep sub-micron effects and accelerating time-to-market cycles. With
the increasing interconnect delay dominance and strong coupling effects, a small change in the
design can cause new timing violations and result in design iterations. At the same time, the in-
dustry trend of integrating higher levels of circuit functionality on one chip and the widespread
growth of wireless communication have triggered the proliferation of mixed analog-digital sys-
tems. The digital and the analog components share a common lossy substrate, which provides
an alternative path for the current to reach difference devices and leads to more significant
electro-magnetic couplings. Furthermore, the ever-increasing complexity of VLSI designs and
integration circuit (IC) process technologies increases the mismatch between a circuit fabri-
cated on the wafer and the one designed in the layout tool. Process induced variations can make
the circuit performance deviate from the design specification and timing-convergence is getting
harder and harder to achieve.
Therefore, to perform fast circuit analysis and optimization, efficient extraction of compact
yet accurate lumped circuit models of on-chip interconnect has become an extremely crucial
ii
part in modern VLSI design. The central theme of this thesis is the fast capacitance extrac-
tion with the consideration of process variations and the interconnect inductive effect modeling
subject to multi-layer lossy substrate effects. The main contributions of the work are as follows:
• ICCAP , a linear-time hierarchical three dimensional (3D) capacitance extraction algo-
rithm. ICCAP proposes a novel method to sparsify and reorder the dense linear system
associated with boundary element method (BEM) capacitance extraction, and hence the
new sparse system can be solved efficiently by preconditioned Krylov subspace itera-
tive methods, such as generalized minimum residue method (GMRES) or preconditioned
conjugate gradient method (PCG).
• STATCAP, a statistical capacitance extraction algorithm considering process induced
variations. By utilizing the efficiency of ICCAP, STATCAP develops a systematic way to
model the surface fluctuation of interconnect geometries and generate explicit quadratic
form representation for parasitic capacitances. The quadratic expression can be directly
integrated into statistical timing analysis and statistical model order reduction to perform
further analysis.
• LITHSIM , a fast aerial image simulation for modeling process variations introduced in
lithography process. By exploiting the regular structures inherent in IC mask patterns,
LITHSIM avoids the sampling process in two dimensional (2D) discrete Fourier trans-
formation or discrete convolution and eliminates aliasing error by providing a close-form
analytic formula to directly generate two dimensional mask image.
iii
• EPEEC, a compact interconnect inductive effect modeling algorithm considering lossy
substrate. Based on the complex image theory, EPEEC extends the traditional partial
equivalent element circuit (PEEC) model to simultaneously take multi-layer substrate
eddy current losses and frequency dependent effects into consideration. To accommo-
date even larger scale on-chip interconnect networks, EPEEC develops a new SPICE-
compatible reluctance extraction algorithm by applying sparsification in the inverse in-
ductance domain with an extended window algorithm.
Those validated interconnect parasitic extraction and modeling algorithms can be easily
integrated into general design tools. We hope that by transferring the proposed algorithms into
the realm of production, these building blocks serve the goal of design for manufacturability in
the state-of-the-art VLSI circuits and can improve the fabrication yield and circuit efficiency in
the long term.
iv
Acknowledgements
First and foremost, I would like to express my deepest gratitude and appreciation to my re-
search advisor, Professor Charlie Chung-Ping Chen, the real professor in my life, for his super-
excellent guidance and tremendous support, and for the opportunities he has created for me
during my graduate study and research at University of Wisconsin, Madison. His vision and
leadership in the semiconductor computer aided design industry has been inspiring to both my
research work and career development. I sincerely thank him for his consistent supervision
and enlightenment in every detail of my research and education at University of Wisconsin,
Madison.
I am thankful to Professor Franco Cerrina, Professor Michael J. Schulte, Professor Kewal
K. Saluja, Professor Parameswaran Ramanathan, Professor Yu Hen Hu, and Professor Shi Jin
for reviewing my dissertation and serving as committee members in my preliminary exams and
defense. Their insightful inputs to this work and expertise in the field of semiconductor and
mathematics have provided me strong support throughout this process.
I would like to sincerely thank my former advisor, Professor Zhiquan Wang at Nanjing
University of Science & Technology for his mentoring, encouragement, support and consistent
v
help. His guidance will be an unerasable part in my life.
I would like to thank Chin-Chi Teng, Pinhong Chen, Eddy Pramono, Yu Zheng and Jin
Zhang for their help and support and sharing their knowledge and expertise during my work at
Cadence Design Systems.
Special thanks to Professor Janet Meiling Wang at University of Arizona, Tucson, my col-
leagues Wenyin Fu and Yi-Hao Chang at National Taiwan University, Mr. Vince Lin from
Springsoft for their tremendous help with me during my research at University of Wisconsin,
Madison. I also deeply thank all the past and present members at the University of Wisconsin,
Madison VLSI-EDA group, Yu-Min Lee, Tsung-Hao Chen, Ting-Yuan Wang, Jeng-Liang Tsai,
and Sanghamitra Roy, for their best friendship, help, and support. I would like to thank all my
Chinese and international friends from different parts of the small world. They made my life in
the United States colorful and enjoyable.
I would like to thank my mom and dad and other members in my big family for their love
during my years in graduate school. Their care always provides the warmest support in my life
and work, wherever I am.
Most importantly, I would like to thank my dear wife, Yi Zhou, for her companion and
love during the last a few years. Together we have managed to lots of meaningful things done
and overcome many difficulties. I deeply thank her for her love, understanding, and consistent
support. Without her love and encouragement, this thesis wouldn’t be possible. I look forward
to enjoying a better and better life with her in the rest of my whole life.
This work was partially funded by National Science Foundation under grants CCR-0093309
vi
& CCR-0204468 and National Science Council of Taiwan, R.O.C. under grant NSC 92-2218-
E-002-030 and by the following participating companies: Intel, TSMC, UMC, Faraday, and
SpringSoft.
Contents
Abstract i
Acknowledgements iv
1 Introduction 1
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Capacitance Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Inductance Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.4 Thesis Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2 Linear Time Hierarchical Capacitance Extraction – ICCAP 11
2.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.1.1 Capacitance Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.1.2 Boundary Element Method . . . . . . . . . . . . . . . . . . . . . . . . 13
2.1.3 Hierarchical Capacitance Algorithms . . . . . . . . . . . . . . . . . . 14
2.2 ICCAP Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
vii
viii
2.2.1 New Basis Panels Generation . . . . . . . . . . . . . . . . . . . . . . 19
2.2.2 Direct Formulation ofJ ′ in Linear Time . . . . . . . . . . . . . . . . . 25
2.2.3 ExtractingE from J ′ . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.2.4 SolvingP ′q′ = v′ for Uniform- and Multiple-dielectric Media . . . . . 28
2.2.5 Potential Coefficient Matrix Reordering . . . . . . . . . . . . . . . . . 28
2.2.6 Complexity Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.3 Practical Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.3.1 Potential Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.3.2 Panel Refinement Scheme . . . . . . . . . . . . . . . . . . . . . . . . 33
2.3.3 Direct Construction ofP ′ . . . . . . . . . . . . . . . . . . . . . . . . . 34
2.3.4 Direct Construction ofV ′ . . . . . . . . . . . . . . . . . . . . . . . . 35
2.4 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3 Statistical Capacitance Extraction – STATCAP 47
3.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.1.1 Process Variations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.2 Statistic Capacitance Extraction . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.2.1 Variational Capacitance Approximation . . . . . . . . . . . . . . . . . 51
3.2.2 Process Variation Modeling . . . . . . . . . . . . . . . . . . . . . . . 55
3.2.3 Random Variable Reduction . . . . . . . . . . . . . . . . . . . . . . . 57
3.2.4 Potential Coefficient Approximation . . . . . . . . . . . . . . . . . . . 59
ix
3.2.5 Distribution of Parasitic Capacitance . . . . . . . . . . . . . . . . . . . 62
3.3 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
4 Fast Analytic Lithography Simulation – LITHSIM 70
4.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
4.1.1 Simplified Projection System Model . . . . . . . . . . . . . . . . . . . 75
4.1.2 General Lithography System Model . . . . . . . . . . . . . . . . . . . 77
4.2 LithSim Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
4.2.1 Rectangular Pupil . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
4.2.2 Circular Pupil . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
4.2.3 LithSim Simulation Flow . . . . . . . . . . . . . . . . . . . . . . . . . 87
4.3 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
5 Efficient Inductive Effect Extraction with Lossy Substrate – EPEEC 94
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
5.2 Electro-magnetic Formulation of Substrate Eddy Current and Complex Image
Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
5.2.1 Generation of Substrate Eddy Currents . . . . . . . . . . . . . . . . . 97
5.2.2 Analytic Vector Potential within A Multilayer Substrate . . . . . . . . 99
5.2.3 Complex Image Theory and Its Application . . . . . . . . . . . . . . . 103
5.3 Eddy-Current-Aware PEEC model: EPEEC . . . . . . . . . . . . . . . . . . . 104
5.3.1 EPEEC Interconnect Modeling Algorithm . . . . . . . . . . . . . . . . 106
x
5.3.2 SPICE Compatible Reluctance Realization . . . . . . . . . . . . . . . 109
5.4 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
5.4.1 EPEEC Model Validation . . . . . . . . . . . . . . . . . . . . . . . . 111
5.4.2 Substrate Effects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
5.4.3 Inductance vs. Reluctance . . . . . . . . . . . . . . . . . . . . . . . . 116
6 Conclusion 122
List of Figures
1.1 Wire dimension trends in advanced VLSI technologies. . . . . . . . . . . . . . 2
2.1 Capacitance extraction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.2 BEM capacitance algorithms: FastCap. . . . . . . . . . . . . . . . . . . . . . 15
2.3 BEM capacitance algorithms: HiCap and PHiCap. . . . . . . . . . . . . . . . . 16
2.4 Different bases have different structure matrices and potential coefficient matri-
ces with different densities. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.5 Fill-ins introduced by a link between non-leaf panels. . . . . . . . . . . . . . . 19
2.6 The elementary operation of switching basis panels is equivalent to perform a
congruence transformation onP . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.7 Keep on moving basis panels upward is equivalent to apply consecutive congru-
ence transformations on the potential coefficient matrix without explicit matrix
manipulations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.8 Comparison of non-zero entries inH andP ′. . . . . . . . . . . . . . . . . . . 24
2.9 Efficient construction of the new structure matrixJ ′. . . . . . . . . . . . . . . 26
xi
xii
2.10 Comparison of non-zero entries inJ andJ ′. . . . . . . . . . . . . . . . . . . . 27
2.11 Extraction flowchart of ICCAP and PHiCap. . . . . . . . . . . . . . . . . . . . 30
2.12 ICCAP capacitance extraction flow. . . . . . . . . . . . . . . . . . . . . . . . 31
2.13 Centroid of triangular panel. . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.14 Centroid of quadrilateral panel. . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.15 High level link and basic link. . . . . . . . . . . . . . . . . . . . . . . . . . . 35
2.16 Direct construction of the sparse potential coefficient matrixP ′. . . . . . . . . 36
2.17 Direct construction of the new right hand sidev′. . . . . . . . . . . . . . . . . 36
2.18 Density of the new potential coefficient matrixP ′. . . . . . . . . . . . . . . . . 38
2.19 preconditioners from incomplete LU factorization with different reordering schemes
(RelativeResidue = 0.01). . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.1 Process variations due to (a) chemical-mechanical planarization, (b) optical dif-
fraction, and (c) chemical etching. (Picture courtesy of TSMC, Hsin-Chu, Tai-
wan.) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
3.2 Process variation modeling with correlated statistical position perturbations on
leaf panels. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
3.3 Random variable transformation. . . . . . . . . . . . . . . . . . . . . . . . . . 61
3.4 An efficient algorithm for constructing the random variable transformation ma-
trix R. The functionInsertEntry(R, i, j, value) fills value into the entry(i, j)
of R. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
xiii
3.5 First and second order capacitance models and their comparisons with Monte
Carlo method for the bus2× 2 benchmark (σ = 20%). . . . . . . . . . . . . . 67
3.6 Second order parasitic capacitance modeling with different number of factors
and the comparison with Monte Carlo method for bus2× 2 benchmark. . . . . 68
4.1 General optical lithography process: (1) Photoresist coating (2) Exposure (3)
Development. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
4.2 Subwavelength gap between IC future size and light wavelength. (Picture cour-
tesy of Numerical Technologies, Inc and Synopsis, Inc.) . . . . . . . . . . . . . 72
4.3 Generic exposure system in optical projection lithography. . . . . . . . . . . . 75
4.4 Shift photomask spectrum is equivalent to shift pupil function. . . . . . . . . . 78
4.5 Transmission cross-coefficient (TCC). . . . . . . . . . . . . . . . . . . . . . . 80
4.6 Mask decomposition. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
4.7 Inverse Fourier transformation of a rectangular pupil. . . . . . . . . . . . . . . 83
4.8 εi(x, y) of a1µm× 1µm slit. . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
4.9 Waveform of sine integral function. . . . . . . . . . . . . . . . . . . . . . . . 85
4.10 Windowing method to reduce computational cost. . . . . . . . . . . . . . . . . 88
4.11 LithSim Optical Lithography Simulation Flow. . . . . . . . . . . . . . . . . . 89
4.12 Irradiance calculated by using discrete Fourier transformation, discrete convo-
lution, LithSim, and continuous convolution. . . . . . . . . . . . . . . . . . . . 90
4.13 Errors in irradiance matrices calculated by using discrete Fourier transforma-
tion, discrete convolution and LithSim compared to continuous convolution. . . 91
xiv
4.14 Images (contours) calculated by using discrete Fourier transformation, discrete
convolution, LithSim, and continuous convolution. . . . . . . . . . . . . . . . 92
5.1 A current filament parallel to a multilayer substrate which contains different
layers of different thickness, conductivity, and permeability. . . . . . . . . . . 100
5.2 Eddy-current-aware PEEC model. Each conductor is further discretized to con-
sider the uneven distribution of currents. . . . . . . . . . . . . . . . . . . . . . 105
5.3 Extended window selection algorithm to simultaneously consider physical and
image conductors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
5.4 SPICE compatible model for reluctance. The original reluctance element is
substituted by serial self inductance and VCVSs. . . . . . . . . . . . . . . . . 110
5.5 Test configuration: two parallel copper interconnects above a two-layer sub-
strate (Length unit:µm). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
5.6 Self inductance comparison by using three different extraction tools: FastHenry,
SonnetR©, and EPEEC. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
5.7 Self inductance decreases as frequency increases and conductor-substrate dis-
tance decreases. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
5.8 Resistance increases as frequency increases and conductor-substrate distance
decreases. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
5.9 With the same conductivity, the upper layer substrate will have larger effect than
the lower layer. However, the lower layer cannot be ignored when the thickness
of the upper layer is less than its skin depth. . . . . . . . . . . . . . . . . . . . 116
xv
5.10 Self inductance saturates when the thickness of the upper layer grows over its
skin depth. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
5.11 Waveforms of transient responses by using different interconnect models: PEEC,
EPEEC-L, and EPEEC-R. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
xvi
List of Tables
2.1 Algorithm of directly constructingJ ′. . . . . . . . . . . . . . . . . . . . . . . 41
2.2 Hierarchical panel refinement scheme. . . . . . . . . . . . . . . . . . . . . . . 42
2.3 Refinement of two panels. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
2.4 Self coupling insertion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
2.5 Simulation results comparison. . . . . . . . . . . . . . . . . . . . . . . . . . . 45
2.6 Comparison with HiCap for some large benchmarks. . . . . . . . . . . . . . . 46
3.1 Simulation runtime comparison for bus crossing benchmark. (1) Monte Carlo
(M.C.); (2) Quadratic Model (QuadMod). . . . . . . . . . . . . . . . . . . . . 69
4.1 Extraction time and error comparison. . . . . . . . . . . . . . . . . . . . . . . 93
5.1 Extended Window Selection Algorithm. . . . . . . . . . . . . . . . . . . . . . 107
5.2 EPEEC Interconnect Modeling Algorithm. . . . . . . . . . . . . . . . . . . . . 120
5.3 Reluctance Realization Algorithm. . . . . . . . . . . . . . . . . . . . . . . . . 121
5.4 Extraction Time and Model Size Comparison. . . . . . . . . . . . . . . . . . . 121
1
Chapter 1
Introduction
1.1 Motivation
The semiconductor industry has been experiencing an unprecedented growth over the last forty
years. As integrated circuit processing technology marches relentlessly down through deep sub-
micron feature sizes, interconnect effects have moved to the forefront as the chip performance
limitations, such as system delay and signal integrity, other than active device characteristics.
Although, the device and wire dimensions are decreasing, the size of the chip is increasing. This
implies that the number of interconnects as well as their lengths are increasing with each new
generation of advanced logic and memory chips. Nowadays, interconnect delay can account for
more than50% percent of the total path delay. Based on these observations, analysis and opti-
mization of the interconnect performance of very large scale integration (VLSI) or ultra large
scale integration (ULSI) designs becomes an indispensable component of the global effects of
2
advancing the Moore’s Law even further.
Figure 1.1: Wire dimension trends in advanced VLSI technologies.
To characterize the interconnect effects in timing analysis, efficient extraction of on-chip
parasitics, such as resistance, capacitance, and inductance associated with complex interconnect
structures has become a crucial issue for establishing compact yet accurate interconnect circuit
models. Resistance is mainly determined by the geometry of the line only and does not change
depending on the distribution of the wires in its surroundings. On the contrary, capacitance and
inductance are strongly affected by geometry and the distribution of nearby conductors. For
example, increasing the number of metal layers and changing the aspect ratio of metal lines
reduce the effect of interconnect capacitance to a certain extent. The upper metal layers have
lower capacitance to the ground because of the shielding effect of lower metal layers. Also,
the lines in lower metal layers are narrower and taller i.e., their vertical height is more than
their horizontal width. With less width, the capacitance to the ground is decreased. Due to the
dependence on the interconnect geometry, the extraction of capacitance and inductance is much
3
harder than the extraction of resistance.
Capacitance extraction and inductance extraction are crucial for not only timing analysis,
but also signal integrity analysis. For large, high performance circuits, functional noise failures
have become a significant design and verification issue. Due to the non-uniform scaling of in-
terconnects, the width and spacing of wires decreases more rapidly than the thickness of wires
with each process shrink. Cross-coupling capacitance between wires is therefore becoming an
increasingly dominant fraction of total wire capacitance, causing an increase in cross-coupled
noise effects. Furthermore, with the employment of hierarchical metal wiring levels and the
recent introduction of copper wiring (because its resistivity is approximately half that of alu-
minum wiring), on-chip inductance modeling has also become an indispensable issue for clocks
and the fastest signal interconnects.
Consequently, accurate and efficient estimation of on-chip capacitances and inductances in
complicated three dimensional interconnects is becoming increasingly important for determin-
ing the final circuit speeds or functionality in the ultra deep sub-micron design (UDSM) of
integrated circuits.
1.2 Capacitance Extraction
During different stages in the whole VLSI design circle, capacitance extraction needs to be per-
formed pre- and post- routing and pre- and post- layout, with different accuracy requirements.
To extract parasitic capacitances from a given design, the following steps need to be performed:
4
1. Define the technology process and material data. This includes mask layers and di-
electrics, their thickness, conductivity, and permittivity constants.
2. Use the technology data as input to a 2D or 3D field solver to obtain capacitance coeffi-
cients.
3. Generate a rule file for a full-chip capacitance extractor using the obtained capacitance
coefficients.
4. Run the capacitance extractor using the generated rule file.
The above extraction flow is widely adopted by most industrial extraction tools, such as Cadence
Encounter Fire & Ice, Magma QuickCap, Mentor Graphics Calibre xRC, Synopsis Star-RC, and
etc. The most expensive step in the above extraction flow is to establish the rule file, which is
also called the capacitance look-up table (LUT), by using 2D or 3D field solvers.
Although many numerical methods can be used to solve the capacitance extraction prob-
lem [1–8], boundary element method (BEM) has been adopted as the main approach for 3D
capacitance calculation due to its capability to handle complex interconnect structure. How-
ever, BEM yields an extremely dense linear system, and hence direct matrix solving methods,
such as Gaussian elimination, requireO(n3) operation and greatly limit the tractable problem
size.
Many fast capacitance extraction algorithms have been proposed in the literature to solve
the dense linear system, such as [9–19]. FastCap [14] is based on fast multipole method (FMM)
for accelerating the dense matrix-vector multiplications required by iterative matrix solvers.
5
Other multipole accelerated BEM algorithms include Multi-scale [15] and [13]. HiCap [16] is
also FFM algorithm with kernel-independent hierarchical panel refinement. Normally, those
iterative algorithms requireO(n2) per iteration since the potential coefficient matrix is of order
n2. Other well-known algorithms include the precorrected fast Fourier transformation (FFT)
method [18] and singular value decomposition (SVD) method [19], they are ofO(nlogn) com-
plexity and withO(n) memory requirement.
Recently, PHiCap [17] proposes to construct cost-efficient preconditioners by applying an
orthogonal sparsification transformation. Albeit the iteration number is greatly reduced, the or-
thogonal matrix generation still requiresO(nlogn) operation and hence becomes the bottleneck
of the entire algorithm. Furthermore, the transformation matrix needs extra storage spaces and
makes the memory budget even tighter for large scale design applications.
Capacitance extraction problem becomes even more complicated as the semiconductor in-
dustry advances to 65nm technology node. Due to the ever-increasing complexity of VLSI de-
signs and IC process technologies, the mismatch between a circuit fabricated on the wafer and
the one designed in the layout tool grows ever larger. Therefore, characterizing and modeling
process variations of interconnect geometry has become an integral part of analysis and opti-
mization of modern VLSI designs. Process induced variations in the device and interconnect
structures are posing a significant challenge to parasitic modeling and signal integrity analysis.
To determine the extent of such effects, the distribution of various electrical parameters, such as
interconnect resistances and capacitances due to variations in the manufacturing process must
be determined.
6
Our work on capacitance extraction focuses on the development of an efficient BEM ca-
pacitance algorithm to solve 3D capacitance extraction problem and then extends to consider
process variations. The main contributions in this area are as follows:
1. A novel algorithm,ICCAP , which provides a completely different perspective to gener-
ate sparsified and reordered potential coefficient matrices, is presented. ICCAP reveals
that the intrinsic reason why the linear system arising from BEM is dense is due to the
selection of leaf panel charges as the basis. Therefore, ICCAP presents a linear-time ba-
sis panel selection algorithm (BPSA) to choose a new basis. Mathematically, selecting a
different basis is equivalent to perform consecutive congruence transformations to spar-
sify the original dense system, although no explicit matrix computations are required.
Furthermore, ICCAP proposes a cost-free Level-Oriented Reordering (LOR) method to
generate reordered potential matrices, so that preconditioners contain even less fill-ins
than explicitly applying minimum degree reordering (MMD). Experimental results show
that ICCAP is faster and consumes less memory than all previous algorithms, including
FastCap [14], HiCap [16], and PHiCap [17].
2. To efficiently evaluate process variation effects on parasitic capacitance, this work pro-
poses a comprehensive statistical capacitance extraction algorithm,STATCAP, to de-
velop an explicit quadratic form representation for parasitic capacitance in terms of dom-
inant process variation sources. The quadratic model can be easily extended to even
higher order to achieve higher accuracy. Also, STATCAP proposes a systematic way to
7
model interconnect surface fluctuation due to process variations and introduces principle
factor analysis to reduce the large number of random variables used to model the surface
fluctuation. Then STATCAP solves the capacitance quadratic representation by applying
random variable matching and taking the advantage of the efficiency of ICCAP.
3. To study the effects of process variations during the lithography process on the geome-
try of fabricated interconnects, we propose an analytic close-form formula, LITHSIM, to
directly generate mask images by exploiting the regular structure in VLSI designs. LITH-
SIM avoids Fourier and inverse Fourier transformation adopted by general aerial image
simulators to achieve significant speedup. Also, due to its analytic formulation, LithSim
eliminates the aliasing error introduced in the sampling process.
1.3 Inductance Extraction
Parasitic on-chip inductance is growing as another design concern as the very large scale inte-
gration (VLSI) technology marches toward ultra-deep sub-micron and the operation frequency
approaches the giga-hertz range. Inductive coupling effect becomes more important because of
higher frequency signal content, denser geometries, and reductions of both resistance and ca-
pacitance by copper and low-k devices. Inductance effect is present not only in IC packages but
also in on-chip interconnects such as power grids, clock nets, and bus structures. It causes sig-
nal overshoot, undershoot, and oscillations, and aggravates crosstalk and power-grid noises. All
of these seriously impact the on-chip signal integrity. The importance and difficulty of on-chip
8
inductance extraction and analysis have been addressed in [54] and [55].
One major problem of inductance modeling is the long range coupling effect and the uncer-
tainty of return paths. Since inductance is a function of a closed loop, the return path is difficult
to predict in advance before simulation. Fortunately, the PEEC method has been widely adopted
to deal with this issue [56]. However, since PEEC assumes that each conductor segment has
a current return path at infinity, inductive couplings are now among all conductor segments,
so that extremely dense partial inductance matrices are usually generated. For this reason, the
reluctance-based method [57, 58] has been proposed by Hao Ji et al to alleviate this problem.
Since reluctance has higher degree of locality similar to capacitance, only a small number of
neighbors need to be considered. Consequently, the reluctance matrix for circuit simulation is
very sparse compared to the partial inductance matrix.
Moreover, the traditional PEEC approach does not take substrate effects into consideration.
With continuous advances in radio frequency (RF) mixed-signal very large scale integration
(VLSI) technology, the creation of eddy currents in lossy multi-layer substrates has made the
already complicated interconnect analysis and modeling issue more challenging. Although sev-
eral previous works have been proposed to account for substrate losses, such as [59–65], most
of these approaches are based on traditional electromagnetic methods and use the numerical
finite difference method to discretize the entire substrate and hence are often computationally
prohibitive for today’s VLSI geometries. With the roaring clock frequency and the reduced sub-
strate resistivity, a large volume of silicon bulk needs to be spatially discretized into very tiny
cells to capture the substrate effects accurately. Therefore, the obtained equivalent circuit mod-
9
els are extremely prohibitive in sizes since inductive couplings are now among all conductor
segments and substrate cells.
Unsatisfied with the above facts, we propose an accurate and efficient interconnect model-
ing approach – EPEEC (Eddy-current-aware Partial Equivalent Element Circuit). Based on
complex image theory, EPEEC extends the traditional PEEC model to simultaneously take
multi-layer substrate eddy current losses and frequency dependent effects into consideration. To
accommodate even larger scale on-chip interconnect networks, EPEEC develops a new SPICE-
compatible reluctance extraction algorithm by applying sparsification in the inverse inductance
domain with an extended window algorithm. Comparing with several industry standard induc-
tance and full-wave solvers, such as FastHenry and SonnetR©, EPEEC demonstrates within1.5%
accuracy while providing over100X speedup.
1.4 Thesis Organization
This work presents an integrated framework to solve on-chip interconnect and package parasitic
capacitance and inductance extraction problem with and without the consideration of process
variations.
We begin with Chapter 2 to introduce the background and the problem definition of ca-
pacitance extraction. Most recent multipole and hierarchical algorithms based on BEM are
reviewed. The main idea and detailed implementation of ICCAP are presented. We mathemati-
cally proof that ICCAP is much more efficient than existing algorithms and is a linear algorithm
10
in terms of execution time and memory consumption. Extensive and meaningful experimental
results are presented to demonstrate the excellent features of ICCAP.
Chapter 3 presents STATCAP algorithm which extends ICCAP to consider process varia-
tions. STATCAP is the first to introduce random variable matching and random variable re-
duction techniques to the capacitance extraction literature. Also STATCAP proposes a general
framework to model interconnect geometry deviations due to process variations and generate
high accuracy quadratic representations of parasitic capacitances.
Chapter 4 devotes to LITHSIM, an analytic mask image simulation algorithm to model the
process variations introduced in lithography process. LithSim presents a close-form formula to
directly calculate mask image without sampling the mask and hence efficiently eliminates the
discretization aliasing error.
Chapter 5 presents EPEEC which combines complex image method with window-based
reluctance extraction to generate compact, accurate, and SPICE-compatible inductance model.
EPEEC extends complex image theory to handle multi-layer substrates and develops a reluctance-
based extraction algorithm to consider inductive and ohmic losses due to induced eddy currents
in a multi-layer substrate. Furthermore, EPEEC is SPICE-compatible by employing a reluc-
tance realization algorithm which converts one reluctance element to serial self inductance and
VCVSs. Extensive experiments demonstrate that EPEEC has high accuracy and can generate
very compact interconnect models.
Chapter 6 provides some conclusion remarks for this thesis and future works in the area of
parasitic extraction and lithography process variation modeling.
11
Chapter 2
Linear Time Hierarchical Capacitance
Extraction – ICCAP
2.1 Preliminaries
2.1.1 Capacitance Extraction
The purpose of the capacitance extraction is to calculate the capacitance matrix of a conductor
system containingm + 1 conductors.
Given each conductori is at a difference potentialVi and the(m + 1)th conductor is the
reference conductor at zero potential, the total charge on theith conductorQi will be the sum-
mation of contributions from all other conductors. The contribution from thejth conductor to
the charge on theith conductor is equal to the product of the coupling capacitanceCij and the
12
Q1
Q0
Q4 Q3
Q2 V2
V0=0
V4 V3
V1
Reference
Conductor
......
Figure 2.1: Capacitance extraction.
potential difference(Vi − Vj) between theith conductor and thejth conductor.
Q1 = C10V1 + C12(V1 − V2) + · · ·+ C1m(V1 − Vm)
Q2 = C21(V2 − V1) + C20V2 + · · ·+ C2m(V2 − Vm)
...
Qm = Cm1(Vm − V1) + Cm2(Vm − V2) + · · ·+ CmVm
Therefore, the capacitances betweenm non-reference conductors can be represented by the
capacitance matrixC ∈ Rm×m
CV = Q, (2.1)
whereV ∈ Rm andQ ∈ Rm are conductor potential and surface charge vectors respectively.
To determine thejth column of the capacitance matrix, the surface charge on each conductor is
13
computed by raising the potential on thejth conductor to one while grounding other conductors
andCij is equal to the surface charge on theith conductor. The procedure is repeatedm times
to compute all columns ofC.
2.1.2 Boundary Element Method
BEM capacitance extraction is equivalent to solve a first-kind integral equation
ψ(x) =
∫
surface
G(x, x′)σ(x′)da′ (2.2)
to find the conductor charge distributionsσ given the conductor potentialsψ. G(x, x′) is the
Green’s function which has different formulas for uniform dielectric and multiple dielectrics.
To numerically solve the integral equation in Eq. 2.2, the surfaces ofm conductors are
discretized into much smaller panels and surface charges on those most delicate panels (leaf
panels) are assumed to be uniform. So the potential at the center of theith panel is the sum of
the contributions to that potential from the charge distribution on alln leaf panels,
vi =n∑
j=1
qj
aj
∫
panelj
G(xi, x′)da′. (2.3)
Applying Eq. 2.3 to alln leaf panels leads to a dense linear system
Pq = v, (2.4)
where
Pij =1
aj
∫
panelj
G(xi, x′)da′. (2.5)
14
P ∈ Rn×n is referred to as the potential coefficient matrix andq, v ∈ Rn are panel charge and
potential vectors respectively.
Then, to compute thejth column of the capacitance matrix, Eq. 2.4 must be solved forq,
given av vector wherevk = 1 if panelk is on thejth conductor, andvk = 0 otherwise [14].
ThenCij of the capacitance matrix is computed by summing all the panel charges on theith
conductor,
Cij =∑
k∈conductor i
qk. (2.6)
2.1.3 Hierarchical Capacitance Algorithms
The main obstacle of solvingq is that the coefficient matrix in Eq. 2.4 is very dense and di-
rect linear system solvers, such as Gaussian elimination or Cholesky decomposition, become
computationally intractable if the number of panels exceeds several hundred. Therefore, mul-
tipole accelerated [14, 15] and hierarchical algorithms [16, 17] have been proposed to address
this problem.
FastCap [14] accelerates matrix-vector multiplications in iterative matrix solvers by mul-
tipole and local expansions shown in Fig. 2.2: Charge points within an inner circle can be
replaced by a single charge equal to their sum if the distance between evaluation points and the
center of the circle is much larger than its radiusR; Potentials on evaluation points within a
small circle induced by faraway charge points are roughly the same as the potential evaluated
at the center.
HiCap [16] and PHiCap [17] are fast multipole algorithms with hierarchical panel refine-
15
charge
points
evaluation
points
charge
points
evaluation
points r
r j R
r j
Figure 2.2: BEM capacitance algorithms: FastCap.
ment. Hierarchical panel discretization can be represented by a multiple-tree structure as shown
in Fig. 2.3. The root panel of each tree structure corresponds to a conductor surface or a dielec-
tric interface. If the estimated potential coefficient between two panels is larger than a threshold
value, they are further divided into smaller panels. Otherwise, a link recording the potential
coefficient is created between these two panels.
PHiCap [17] proposes the use of a link matrixH ∈ RN×N and a structure matrixJ ∈ RN×n
to represent the hierarchical refinement, whereN is the number of all panels andn the number
of leaf panels. An exampleH andJ for the multiple-tree structure is also shown in Fig. 2.3.
Each row of the structure matrixJ corresponds to a panel, either leaf or non-leaf, and each
column corresponds to a leaf panel. The(i, j) entry inJ is 1 if paneli contains the leaf panelj,
and is0 otherwise [17]. For any two panels with no links in between, the corresponding entries
in H are zero. Otherwise, for panelsi andj, the corresponding entry can be calculated by Eq.
2.5.
Since in every elementary tree, the parent panel charge is the sum of charges on its two child
16
Figure 2.3: BEM capacitance algorithms: HiCap and PHiCap.
panels, all panel charges can be represented by charges on leaf panels,
qN = Jq, (2.7)
whereqN ∈ RN is the vector of all panel charges.
Let vN ∈ RN denote the vector of potentials induced by links on individual panels,
vN = HqN . (2.8)
Since the potential on a parent panel distributes to its two child panels, the leaf panel potential
vectorv ∈ Rn is equal to
v = JT vN . (2.9)
17
By using Eqs. 2.7, 2.8, and 2.9, the potential coefficient matrix can be formulated as
P = JT HJ. (2.10)
Therefore, FastCap, HiCap, and PHiCap all developP based on surface potential and
charges on leaf panels. We will show that this is the intrinsic reason why the linear system
in Eq. 2.4 is dense.
2.2 ICCAP Algorithm
To facilitate our following discussion, we first introduce the definition of basis charges and basis
panels.
Definition 1 Let S denote the variable space composed of charges on all leaf and non-leaf
panels
S = qi|surface charge on panel i, 1 ≤ i ≤ N.
If each panel charge inS can be represented by an unique linear combination of charges onn
panels, charges on those panels are basis charges and thosen panels are corresponding basis
panels.
For a given tree structure, except leaf panel charges, there are many possible bases. For
example, for the multiple-tree structure in Fig. 2.3, Fig. 2.4 shows another set of basis, which
includes two non-leaf panelsc ande. The corresponding structure matrixJ ′ of the new basis is
also shown in Fig. 2.4.
18
a b
c
ihg
fed
j
12 3 45
d f g c i e
a 1 1
b 1 1
c 1
d 1
e 1
f 1
g 1
h -1 1
i 1
j -1 1
e
i
c
g
f
d
q
q
q
q
q
q
New Basis
J' =
Figure 2.4: Different bases have different structure matrices and potential coefficient matrices
with different densities.
Since each basis has its distinct structure matricesJ ′, so that the related potential coefficient
matrix P ′ = J ′T HJ ′ has different densities. For example,P ′ related to the new basis in Fig.
2.4 contains several zeros whileP related to the old basis has no zeros. Therefore, it’s desirable
that one can choose a basis so that its related potential coefficient matrix is sparse.
Before presenting the method to choose a new basis, we first show leaf panel charges com-
pose the worst basis and the corresponding potential coefficient matrix is the densest one.
To prove this, it is necessary to clarify how links between non-leaf panels are filled into the
potential coefficient matrix. As shown in Fig. 2.5, paneli is a non-leaf panel and it containsk
underlying leaf panels. So the charge on paneli is equal toqi =∑k
n=1 qin. Similarly, the charge
on another non-leaf panelj is qj =∑l
n=1 qjn, wherel is the number of leaf panels under panel
j.
Assume there is a linkPij between paneli and j. The potential induced by linkPij on
paneli is given byPijqj = Pij
∑ln=1 qj
n and it distributes to all thek leaf panels under paneli.
19
i
,2
…i
k
ii qqq ,,1
…j
l
jj qqq ,,, 21
ijP
**
**
**
**
k leaf panels
under panel i
Fill - ins of P ij in P
l leaf panels
under panel j
j
Figure 2.5: Fill-ins introduced by a link between non-leaf panels.
Similarly, thel leaf panels under panelj gather the potential produced byPij on panelj which
is Pijqi = Pij
∑kn=1 qi
n. SoPij creates2kl fill-ins in the potential coefficient matrix and has the
pattern shown in Fig. 2.5.
Since leaf panels interact with each other through links between themselves or their upper-
level parent panels, every entry inP is non-zero, and hence the total number of fill-ins isn2.
Consequently, if we take all leaf panel charges as the basis, the corresponding potential coeffi-
cient matrix will be the densest one.
2.2.1 New Basis Panels Generation
Our basis panel selection algorithm (BPSA) is based on continuously performing an elementary
operation to generate a new basis.
Theorem 1 Assume the structure matrix and the potential coefficient matrix corresponding to a
possible basis areJ andP respectively. If the current basis contains two panelsj andk, which
are child panels in the same elementary tree, then arbitrarily eliminating one of them (sayk)
and adding their parent paneli to the basis generates another set of basis panels.
20
The new structure matrixJ ′ corresponding to the new basis can be obtained by
J ′j = Jj − Jk;
J ′i = Jk.
whereJi represents the column corresponding to paneli in J . And the new potential coefficient
matrixP ′ can be obtained by
P ′ = ET PE.
whereE is an elementary transformation matrix.
Without loss of generality, we use an example to gain a clear idea of this important operation.
As shown in Fig. 2.6.(a), leaf panels6 and7 are contained in the same elementary tree. Their
parent is panel4. The right hand side shows the corresponding structure matrixJ when all leaf
panels are selected as the basis.
Now, we apply the elementary operation and move one basis panel from panel7 to its parent
panel4 as shown in Fig. 2.6.(b). Apparently this movement results in a new basis since all panel
charges still can be represented by charges on the new basis panels. The structure matrixJ ′ is
shown on right hand side in Fig. 2.6.(b).
The column corresponding to panel4 in J ′ is identical with the column corresponding to
panel7 in J , since upper level panels originally gathering the charge on panel7 still collects the
charge on panel4 after the elementary operation. So the column of panel4 in J ′ “inherits” the
column of panel7 in J .
21
4
6
5
3
7
6
5
4
3
2
1
1100
0100
0010
1000
0001
1010
1011
q
q
q
q
q
q
q
q
q
q
q
6 4
(b)
3 5
76
54
32
1
7
6
5
3
7
6
5
4
3
2
1
1000
0100
0010
1100
0001
1110
1111
q
q
q
q
q
q
q
q
q
q
q
(a)
Basis Panels
6 73 5
76
54
32
1
4
6
2
3
7
6
5
4
3
2
1
1100
0100
1010
1000
0001
0010
0011
q
q
q
q
q
q
q
q
q
q
q
62 43
(c)
76
54
32
1
4
6
2
1
7
6
5
4
3
2
1
1100
0100
1010
1000
0011
0010
0001
q
q
q
q
q
q
q
q
q
q
q
62 41
(d)
76
54
32
1
Figure 2.6: The elementary operation of switching basis panels is equivalent to perform a con-
gruence transformation onP .
On the contrary, the column corresponding to panel6 is changed inJ ′, since the charge on
panel4 is the sum of charges on panel6 and7 and hence upper level panels now only need to
gather the charge on panel4. Panel6 is included in the new basis since the charge on panel7 can
be obtained only when the charge on panel6 is known. So the changed column corresponding
to panel6 in J ′ is
J ′6 = J6 − J7. (2.11)
Furthermore, Eq. 2.11 can be represented in a matrix form as
J ′ = JE (2.12)
22
whereE is an elementary transformation matrix expressed by
E =
. ..
1 0
−1 1
.. .
panel 6
panel 7
(2.13)
Consequently, by using Eqs. 2.12 and 2.13, the relation between the new potential coeffi-
cient matrixP ′ andP can be written as
P ′ = J ′T HJ ′ = (JE)T H(JE) = ET PE (2.14)
SoP ′ is obtained by a congruence transformation onP .
Based on Eqs. 2.13 and 2.14, it is important to notice that this transformation only changes
the column and row related to panel6. P ′ is obtained by subtracting the column and row of
panel7 from the column and row of panel6. We have shown in Section 3.1 that links on upper
level panels introduce identical fill-ins in columns and rows of panel6 and7. So the subtraction
cancels out identical terms and creates many zeros inP ′.
The elementary operation of moving basis panels upward can be executed continuously. As
shown in Fig. 2.7.(a), after moving basis panel7 to panel4, the elementary tree including panel
2, 4, and5 now has two basis panels (panel4 and panel5). So we can eliminate panel5 (or
panel4) and add its parent panel2. This operation cancels out identical terms in the column
and row of panel4 which inherits the column and row of panel7 in the previous step.
23
4
6
2
3
7
6
5
4
3
2
1
1100
0100
1010
1000
0001
0010
0011
q
q
q
q
q
q
q
q
q
q
q
62 4
4
6
2
1
7
6
5
4
3
2
1
1100
0100
1010
1000
0011
0010
0001
q
q
q
q
q
q
q
q
q
q
q
62 41
3
(b)
(a)
Basis Panels
76
54
32
1
76
54
32
1
Figure 2.7: Keep on moving basis panels upward is equivalent to apply consecutive congruence
transformations on the potential coefficient matrix without explicit matrix manipulations.
Notice that the subtractions are only performed on the column and row related to panel 4 in
P . The column and row of panel6 will not be affected and hence zeros created in the previous
step are preserved. Similarly, after this step, we can move panel3 to panel1 and again eliminate
identical terms in row and column of panel2.
Successively applying the elementary operation is equivalent to implicitly apply consecutive
congruence transformations on the potential coefficient matrix with the transformation matrix
E = E1E2E3 · · ·
24
In each step, many zeros are created by eliminating identical terms in the original potential
coefficient matrix and previously created zero entries will not be destroyed the later steps.
Assume we start from the basis including all leaf panels, and then we apply the elementary
operation to consecutively push basis panels from bottom to top. At the end, the result basis
will only include root panels and left-hand side (LHS) panels. This process is equivalent to
consecutively apply congruence transformations to cancel out duplicated terms introduced by
the same link. So in the new potential matrixP ′, the number of non-zeros is comparable with
the total number of links in the multiple tree structure, which has been proven to beO(n) [16].
This property has also been observed in the experiment as shown in Fig. 2.8.
10 3
10 4
10 5
10 6
10 7
10 3
10 4
10 5
10 6
slope<1
Matrix Demension
Number of non-zeros in H
Number of non-zeros in P'
Figure 2.8: Comparison of non-zero entries inH andP ′.
25
Theorem 2 The basis includes all root panels and all left-hand side panels will lead to a sparse
potential coefficient matrix containing O(n) non-zero entries.
The selection of basis panels is not unique since in each elementary operation, we can either
eliminate right-hand side (RHS) panels or LHS panels. However, the construction ofJ ′ will be
simplified by choosing the basis in Theorem 2.
2.2.2 Direct Formulation of J ′ in Linear Time
One way to constructJ ′ is based on Theorem 1. One can first generate the structure matrixJ
corresponding to the basis containing leaf panels. Then we apply the elementary operation to
push basis upwards. In each operation, we simultaneously updateJ based on Theorem 1. Since
the basis ofn leaf panels is switched to another set ofn panels, at mostn column subtractions
are performed. However, the disadvantage is we need to first constructJ . So we propose the
second method to directly constructJ ′.
Lemma 1 In the columnJ ′i corresponding to a basis paneli, each entryJij is 1 if panel i
contains the right-hand side panelj. If paneli is not a root panel, then each entryJij is−1 if
the parent of paneli contains the right-hand side panelj.
Lemma 1 can be illustrated by a small example in Fig. 2.9. Panel2 is a LHS panel and
has been included in the new basis. Panel5 and7 are its underlying RHS panels and hence the
corresponding entries inJ ′ are filled by1. The parent of panel2 contains RHS panel3, so that
26
the corresponding entry inJ ′ is −1. A detailed implementation of Lemma 1 is presented in
Table 2.1.
6 7
4 5
2 3
1Level 0
Level 1
Level 2
Level 3
1
1
1 2
1 2 1 3
2
1 3
1 2 1 4
2
2
7
5
3
2
1
1
1
1
q
q
q
q
q
(a) (b)
Figure 2.9: Efficient construction of the new structure matrixJ ′.
Theorem 3 The new structure matrixJ ′ corresponding to the new basis in Theorem 2 hasO(n)
entries.
Assume a complete tree structure withn leaf nodes andm = lgn levels where root node is in
level 0. In level i, there are2i−1 LHS panels. Each LHS panel introduces2m − 2i + 1 fill-ins.
So the total number of fill-ins inJ ′ is given bym +∑m
i=1 2i−1(2m− 2i + 1) = 3n + lgn− 3.
So non-zeros inJ ′ is O(n). This property has been observed in practice as shown in Fig. 2.10.
Similarly, we can prove that the original structure matrixJ containsO(nlgn) non-zeros.
That is the reason why Phicap [17] hasO(nlogn) runtime and memory consumption.
27
0 1x103
2x103
3x103
5.0x103
1.0x104
1.5x104
2.0x104
2.5x104
Number of Leaf Panels
Number of non-zeros in J
Number of non-zeros in J'
0
Figure 2.10: Comparison of non-zero entries inJ andJ ′.
2.2.3 ExtractingE from J ′
We have shown that the new potential coefficient matrixP ′ is obtained by applying congruence
transformations on the originalP matrix. By substitutingP ′ = ET PE into P ′q′ = v′, we get
ET PEq′ = v′. (2.15)
Also we know that the original system in Eq. 2.5 is given byPq = v. So these two equations
can be satisfied by setting
v′ = ET v, (2.16)
q = Eq′. (2.17)
From q = Eq′, we can see thatE is the coefficient matrix when leaf panel charges are
28
represented by charges on new basis panels. Since all panel charges can be expressed byqN =
J ′q′, so thatE has been included in theJ ′ matrix and hence can be obtained directly.
2.2.4 SolvingP ′q′ = v′ for Uniform- and Multiple-dielectric Media
ICCAP provides a general sparsification technique that does not depend on specific matrix
solvers. For uniform dielectric, we can adopt incomplete Cholesky decomposition followed by
applying preconditioned conjugate gradient (PCG). For multiple-dielectric media, the sparse
linear systemP ′q′ = v′ is unsymmetrical. In this scenario, the preconditioner is computed from
incomplete LU factorization. Then we use preconditioned GMRES method to solve the system.
Since the new basis includes all root panels, after solvingq′, root panel charges are already
contained inq′ and hence no additional matrix operations are required.
2.2.5 Potential Coefficient Matrix Reordering
The distribution of non-zeros inP ′ affects the number of fill-ins in preconditioners produced
by incomplete Cholesky or LU factorization. AlthoughP ′ is sparse, directly apply minimum
degree reordering (MMD) may still be expensive for large-scale design applications. So we
propose a heuristic cost-free reordering method called Level-Oriented Reordering (LOR).
According to the new basis generation process, it is reasonable to expect that columns and
rows related to lower level basis panels contain more zeros than upper level basis panels, since
fill-ins introduced by links on their upper level panels can mostly be eliminated. So the basic
idea of LOR is to assign basis panels in upper levels with larger indexes, thus the dense part
29
will be in the low right-hand side corner ofP ′.
LOR can be easily done during the panel refinement process by implementing a stack-like
data structure. When one panel is divided into two smaller ones, those two children are pushed
onto the top of the stack such that lower level panels will finally get smaller indexes. By using
the simple reordering scheme, LOR can lead to even less fill-ins in preconditioners than MMD
which will be shown in the experimental section.
2.2.6 Complexity Analysis
The extraction flowchart of ICCAP and its comparison with PHiCap [17] is presented in Fig.
2.11. The first step of ICCAP to selectn basis panels based on Theorem 2 can be done by
scanning allN = 2n− 1 panels to determine which are roots and LHS panels and hence takes
O(n) time. The second step of constructingJ ′ is equivalent to insertO(n) non-zeros inJ ′ and
hence is alsoO(n). E is contained inJ ′ and does not require extra time.H has been proved
to containO(n) non-zeros [16], so that the construction ofP ′ = J ′T HJ ′ can also be done in
O(n).
2.3 Practical Implementation
In this section, we will discuss the detailed implementation of ICCAP. First we will discuss how
to estimate potential coefficients between panels of various shapes and the hierarchical panel
refinement scheme used in ICCAP. Furthermore, in Fig. 2.11, we have presented the primitive
30
New Structure Matrix J'
Basis PanelsSelection
Leaf PanelsSelection
ICCAP PHiCap
Preconditioned Iterative Matrix Solver
Root PanelCharges in q'
Leaf PanelCharges q
''' HJJPT
'JE
HH
0
WFJ
~~~
vqP
'q
''' vqP
~
qWq
F
W
**
*~
PFHF
T
WvWWvT 1
~
)(
Root PanelCharges
Structure Matrix J
v
v
Panel Refinement and Link Matrix H Construction
(In J')
Directly GenerateSparse System
Explicit OrthgonalSparsification
1
2
5
vEvT
'
43
Figure 2.11: Extraction flowchart of ICCAP and PHiCap.
extraction flowchart of ICCAP and its comparison with PHiCap. In practice, the extraction flow
can be greatly simplified by discovering the facts that the sparse potential coefficient matrixP ′
and the right hand sidev′ can be directly constructed without using the link matrixH and the
new structure matrixJ ′. Thus we not only save memory spaces, but avoid many matrix-matrix
and matrix-vector multiplications.
The simplified extraction flow in shown in Fig. 2.12. We will discuss the detailed extraction
flow in the following sections.
31
Construct structure matrix J’
Panel Refinement and Construct link matrix H
' ' ' v q P = ' ' ' v q P =
Construct potential coefficient matrix P’
Construct v’
Iterative matrix solver
Directly
construct P’
V’
Figure 2.12: ICCAP capacitance extraction flow.
2.3.1 Potential Estimation
The self potential coefficient of one panel can be approximated by3.5 divided by the area of
that panel and the coupling potential coefficient between two panels is equal to the inverse of
the distance between the centroids of these two panels. In this section, we will present how to
efficiently calculate the area and centroid of one quadrilateral or triangular panel.
The centroid of a body is the center of its mass (or masses), the point at which it would be
stable, or balance, under the influence of gravity. There are other names for the same point. It
is also often called the center of gravity and the geocenter and barycenter.
There are three common ”centers of gravity” that are studied in math, science and engineer-
ing. The most common in math is the center of masses located at the vertices of a polygon. A
second approach is to treat the area of the polygon as if it were a sheet of uniform density. The
third, and least common, approach is to represent the sides of the polygon as wire rods of uni-
form density. The three centers of gravity are usually different points in other non-symmetric
32
polygons. The first approach is the one we will adopt to calculate the centroid or center of
gravity in ICCAP.
A
B C
G
Figure 2.13: Centroid of triangular panel.
For a triangular panel, the center of balance for the uniform sheet and also of point masses
at the vertices, that is almost universally referenced as the centroid of a triangle. The centroid
of a triangle is a point at the intersection of the three medians of the triangle. One of the basic
ideas known about the centroid is that it divides the medians into a 2:1 ratio. The part of the
median nearest the vertex is always twice as long as the part near the midpoint of the side. If the
coordinates of the triangle are known, then the coordinates of the centroid are the averages of
the coordinates of the vertices. If we call the three verticesA = (x1, y1, z1), B = (x2, y2, z2),
andC = (x3, y3, z3), then the coordinates(xc, yc, zc) of the geocenter would be
xc =x1 + x2 + x3
3, yc =
y1 + y2 + y3
3, zc =
z1 + z2 + z3
3. (2.18)
In a quadrilateral, the line joining the midpoints of two opposite sides is called a bimedian.
The centroid of masses located at the vertices of a quadrilateral is also the intersection of the
bimedians of a quadrilateral. Another property of the quadrilaterals centroid is that it is also the
midpoint of the segment joining the midpoints of the diagonals. Therefore, the centroid of a
33
G G
Figure 2.14: Centroid of quadrilateral panel.
quadrilateral shape will be
xc =x1 + x2 + x3 + x4
4, yc =
y1 + y2 + y3 + y4
4, zc =
z1 + z2 + z3 + z4
4. (2.19)
Calculating the area of a triangle is an elementary problem encountered often in many dif-
ferent situations. Various approaches exist, depending on what is known about the triangle.
An important theorem in plane geometry, also known as Heron’s formula. Given the lengths
of the sidesa, b, andc and the semi-perimeters
s =1
2(a + b + c) (2.20)
of a triangle, Heron’s formula gives the areaA of the triangle as
A =√
s(s− a)(s− b)(s− c). (2.21)
Also the area of a quadrilateral shape is equal to the summation of areas of two non-overlapping
triangular shapes.
2.3.2 Panel Refinement Scheme
As shown in Fig. 2.12, the first step of ICCAP is the hierarchical panel refinement. This process
is much more complicated than it sounds like, and hence deserves some in-depth explanation.
34
Panels are hierarchically discretized based on the couplings between different panels. In
functionRefineScheme , Refine is called to discretized root paneli and root panelj. It is
important to notice thatRefine is only applied to different root panels. So after this process,
we still need to consider self couplings, which is done by functionSelfLinkInsert . In
functionRefine , Peps andLengthguard are parameters that can be specified in the command
line. The detailed implementation of those functions are presented in Tables 2.2, 2.3, and 2.4.
2.3.3 Direct Construction ofP ′
Before presenting the direct construction ofP ′, we introduce two definitions that will facilitate
our discussion.
Definition 2 If one link is between two leaf panels, it is called basic link; otherwise if it is
called high level link.
Definition 3 If one link is between two basis panels, it is called type I link; If one link is between
one basis panel and one non-basis panel, it is called type II link; otherwise if one link is between
two non-basis panels, it is called type III link.
The type of a given link depends on the current selection of basis. For example, in Fig. 2.15,
the link between panela and panelb is a high level link and also a type III link, since both panel
a and panelb are non-leaf panels.
High level links basically are approximations of basic links. For example, in Fig. 2.15, the
link between panela and panelb is an approximation of four links between panelsc, d and
35
a b
c d e f
a b
c d e f
Figure 2.15: High level link and basic link.
panelse and panelf . Therefore, when choosing leaf panels as basis, the link between panela
and panelb will be inserted intoP multiple times.
Our goal now is to find out how different types of links are inserted intoP ′ when we use
our new basis. In the new linear system by selecting the new basisP ′q′ = v′, q′ is the charge
vector of charges on those new basis panels andv′ is the potentials induced on those new basis
panels. For type I links, they will be inserted intoP ′ once. For example, the linkP1 shown in
Fig. 2.16.(a) will be inserted toP ′ab. Type II and type III links need to be inserted into multiple
places. For example, in Fig. 2.16.(b), the potential on panelc induced by the linkP2 is equal to
P2qf = P2(qb− qe). Therefore,P2 will be inserted intoP ′cb while−P2 will be inserted intoP ′
ce.
Similarly, the type III link in Fig. 2.16.(c) will be inserted multiple times. Thus we can directly
constructP ′ without the construction ofH andJ ′ and the matrix multiplicationP ′ = J ′T HJ ′.
2.3.4 Direct Construction ofV ′
In previous chapter, we have shown that the new right hand sidev′ can be obtained byv′ = ET v.
Also we have shown thatq = Eq′ andE is contained in the new structure matrixJ ′. E contains
36
a b
P1
c d e f
a b
P2
c d e f
a b
P3
c d e f
(a) Type I (b) Type II (c) Type III
Figure 2.16: Direct construction of the sparse potential coefficient matrixP ′.
rows inJ ′ corresponding to leaf panels.
1
2 3
4 5 6 7
Figure 2.17: Direct construction of the new right hand sidev′.
Let’s further study the algorithm for constructingJ ′ which is presented in Theorem 3. As
shown in Fig. 2.17, according to Theorem 3, the basis panel2 will affect two rows correspond-
ing to leaf panels5 and7. In the row of leaf panel5, J ′52 is 1, while in the row of leaf panel7,
J ′72 is−1. Therefore, one can see that for every LHS panelj,
n∑i=1
Eij = 0. (2.22)
However, for every root panelk,
n∑i=1
Eik = 1. (2.23)
37
Therefore we conclude that in the new right hand sidev′, if we currently calculate theith
column of the capacitance matrix, only entries corresponding to root panels belong to conductor
i need to be set to1 and all other entries are0.
Therefore, we can directly construct the new potential coefficient matrixP ′ and the right
hand sidev′ without any extra efforts. Furthermore, after solving the new basis panel charge
vectorq′, the root panel charges has already been included inq′ and hence no further steps are
required. The final ICCAP extraction flow has been presented in Fig. 2.12.
2.4 Experimental Results
ICCAP is implemented inC + + language and Matlab. All experiments are executed on Sun-
Blade 2500 with two 1.28-GHz UltraSPARC IIIi processors,8G RAM and OS Sorlaris 9. The
main test examples arek × k bus crossing conductors fork = 2 to 16, generated by busgen in
FastCap released package [14].
The density of the new potential coefficient matrixP ′ related to the new basis is plotted. The
density is defined as the total number of non-zeros inP ′ divided by its dimension. As shown
in Fig. 2.18, as the number of leaf panels goes over one thousand,P ′ is very sparse and the
density ofP ′ becomes well below10%.
We also test the Level-Oriented Reordering (LOR) method embedded in the panel refine-
ment process by using the bus4× 4 benchmark. Without using LOR, original lower and upper
triangular factors from incomplete LU factorization contain29017 and24546 non-zeros respec-
38
0 1x103
2x103
3x103
0
10
20
30
40
50
B-Spline CurveDe
ns
ity
of
P' (%
)
Number of Leaf Panels
Figure 2.18: Density of the new potential coefficient matrixP ′.
tively. By adopting LOR, the number of fill-ins is dramatically reduced by30%. The result is
comparable with directly applying MMD which in this case results in22129 and19633 fill-ins
in L andU .
Table 3.3 compares the performance of three algorithms : FastCap [14] with expansion order
2, HiCap [16], and the new algorithm, ICCAP. The convergence tolerance is set to0.01, and
error is calculated with respect to FastCap (-o2). Iteration is the average number of iterations
per conductor. ICCAP is the fastest one in these three algorithms. Compared with FastCap,
ICCAP is 30 − 40 times faster and with much less memory. Compared with HiCap, for the
bus12 × 12 benchmark, ICCAP exhibits nearly10 times speedup. HiCap representsP as a
block matrix instead of implementing it directly, and hence the real storage ofP is O(n). All
39
0
500
1000
1500
0 500 1000 1500
0
500
1000
1500
0 500 1000 1500
0
500
1000
1500
0 500 1000 1500
P':108228 L:22129 U:19633
0
500
1000
1500
0 500 1000 1500
0
500
1000
1500
0 500 1000 1500
0
500
1000
1500
0 500 1000 1500
P':108228 L:21761 U:18783
0
500
1000
1500
0 500 1000 1500
0
500
1000
1500
0 500 1000 1500
0
500
1000
1500
0 500 1000 1500
P':108228 L:29017 U:24546 Without Reordering
Reordered by LOR
Reordered by MMD
Figure 2.19: preconditioners from incomplete LU factorization with different reordering
schemes (RelativeResidue = 0.01).
H, J , andP ′ in ICCAP containO(n) non-zeros, so that the memory consumptions of ICCAP
and HiCap are in the same order. The actual accuracy and memory consumption of HiCap and
ICCAP depend on the refinement parameters. When the number of leaf panels is roughly the
same, HiCap and ICCAP have comparable accuracy.
We do not have access to PHiCap [17] and cannot compare with it explicitly. Published
results show PHiCap is2 − 3 times faster than HiCap for the testing benchmarks in Table 2.
40
Based on the comparison with HiCap, we can expect ICCAP is faster than PHiCap as well.
Also notice that for testing cases in Table 2, normally ICCAP converges in less than 2 iterations
while PHiCap needs about 3 iterations. Also, the main disadvantage of PHiCap is its memory
consumption due to the explicit formulation of transformation matrix while ICCAP directly
formulates the sparse matrixP ′. Also [17] shows that PHiCap has lower accuracy than HiCap.
So ICCAP can be superior to PHiCap in terms of memory and accuracy.
Also we use ICCAP and HiCap to test large files containing more conductors. The result is
shown in Table 3. For these test files, ICCAP can converge within three iterations and shows
7− 8 times speedup compared with HiCap.
41
GenerateNewJ(NewBasis)
for ( i=0; i<NewBasis.size(); i++ )
p = NewBasis[i];
InsertEntryNewJ (p, i, 1);
while ( panel[p] is a non-leaf panel )
p = panel[p].GetRight();
InsertEntryNewJ (p, i, 1);
p = NewBasis[i];
if ( ! panel[p] is a root panel )
p = panel[p].GetParent();
while ( panel[p] is a non-leaf panel )
p = panel[p].GetRight();
InsertEntryNewJ (p, i, -1);
Table 2.1: Algorithm of directly constructingJ ′.
42
RefineScheme(OriginalPanelNum)
for ( i=0; i<OriginalPanelNum; i++ )
for ( j=i+1; j<OriginalPanelNum; j++ )
Refine(i,j);
Table 2.2: Hierarchical panel refinement scheme.
43
Refine(PanelAi, PanelAj)
Pij = PotentialEstimate(Ai, Aj);
Ri = Longest side of panelAi;
Pj = Longest side of panelAj;
if ( (Pij ∗Ri < Peps && Pij ∗Rj < Peps) ||
(max(Ri, Rj) ≤ Lengthguard) )
RecordLink(i,j,Pij);
else if( Ri > Rj )
Subdivide(Ai);
Refine(Ai.left, Aj);
Refine(Ai.right,Aj);
else
Subdivide(Aj);
Refine(Aj.left, Ai);
Refine(Aj.right,Ai);
Table 2.3: Refinement of two panels.
44
SelfLinkInsert()
for (i=0; i<Basis.size(); i++)
for (j=i; j<Basis.size(); j++)
if (panel[Basis[i]].root == panel[Basis[j]].root)
Pij = PotentialEstimate(Basis[i],Basis[j]);
RecordLink(Basis[i],Basis[j],Pij);
Table 2.4: Self coupling insertion.
45
4× 4 Bus, Unit List: Time(Sec), Memory(MB)
Algorithm Time Iteration Memory Error Panels
FastCap 8.03 18.63 26.27 – 2736
HiCap 0.77 8.7 0.99 0.72% 2176
ICCAP 0.39 1.12 0.581 0.76% 2112
6× 6 Bus, Unit List: Time(Sec), Memory(MB)
Algorithm Time Iteration Memory Error Panels
FastCap 35.55 14.4 65.19 – 5832
HiCap 3.19 14.5 1.85 1.42% 3168
ICCAP 0.7 1.08 1.54 1.50% 3168
8× 8 Bus, Unit List: Time(Sec), Memory(MB)
Algorithm Time Iteration Memory Error Panels
FastCap 67.4 12 114.5 – 10080
HiCap 14.64 13.4 5.03 1.63% 8448
ICCAP 2.84 1.43 3.58 1.91% 8320
12× 12 Bus, Unit List: Time(Sec), Memory(MB)
Algorithm Time Iteration Memory Error Panels
FastCap 357.99 18.1 297.8 – 22032
HiCap 76.53 15.1 12.72 1.08% 12864
ICCAP 7.21 1.41 11.87 1.18% 12480
Table 2.5: Simulation results comparison.
46
Cond Num 36 48 68
Algorithm HiCap ICCAP HiCap ICCAP HiCap ICCAP
Time 159.03 21.8 427.37 53.6 1932.8 164.7
Iteration 15.8 2.58 18.6 3.26 23.2 3.15
Memory 14.6 13.4 24.5 20.1 47.3 37.3
Panels 13440 12876 19040 18156 33040 31356
Table 2.6: Comparison with HiCap for some large benchmarks.
47
Chapter 3
Statistical Capacitance Extraction –
STATCAP
3.1 Preliminaries
Due to the ever-increasing complexity of VLSI designs and IC process technologies, the mis-
match between a circuit fabricated on the wafer and the one designed in the layout tool grows
ever larger. Therefore, characterizing and modeling process variations of interconnect geome-
try has become an integral part of analysis and optimization of modern VLSI designs. In this
chapter, we present a systematic methodology to develop a closed form capacitance model,
which accurately captures the nonlinear relationship between parasitic capacitances and domi-
nant global/local process variation parameters. The explicit capacitance representation applies
the orthogonal principle factor analysis to greatly reduce the number of random variables as-
48
sociated with modeling conductor surface fluctuations while preserving the dominant sources
of variations, and consequently the variational capacitance model can be efficiently utilized by
statistical model order reduction and timing analysis tools. Experimental results demonstrate
that the proposed method exhibits over100× speedup compared with Monte Carlo simulation
while having the advantage of generating explicit variational parasitic capacitance models of
high order accuracy.
3.1.1 Process Variations
As VLSI circuits have entered deep sub-micron dimensions, increasing complexity of VLSI
designs and IC process technologies increases the mismatch between design and manufacturing.
Process induced variations in the device and interconnect structures are posing a significant
challenge to parasitic modeling and signal integrity analysis. To determine the extent of such
effects, the distribution of various electrical parameters, such as interconnect resistances and
capacitances due to variations in the manufacturing process must be determined. Once this
distribution is known, which is also called the design envelope, the design corners can then be
identified.
During the modern Damascene process, the dielectric is usually patterned by reactive ion
etching (RIE), followed by the linear and metal (Cu) deposition. Then chemical-mechanical
planarization (CMP) is applied to remove excessive metal and provide a global planarization.
During RIE, the ideal eroded rectangular trenches in dielectric, and hence later deposited metals
and liners, may become trapezoidal due to the aspect dependent etch rate (ARDE) effect. During
49
the CMP overpolishing process, regions of high metal pattern density tend to erode faster and
hence show higher metal and dielectric removal rates than regions of low metal pattern density
[20]. The non-uniform metal removal rates across the wafer can lead to varying metal line
thickness for interconnects sited in the same metal layer. Also during the pattern transferring in
lithography process, photomask geometries may be distorted due to nonlinear distortions caused
by optical diffraction and resist process effects, so that the tips and corners of interconnect will
become round shape.
0.447
0.368
0.375
0.421
0.414
M7
M6
M5
Eroded dielectric
High pattern density Low pattern density
(b)
(a)
(c)
Figure 3.1: Process variations due to (a) chemical-mechanical planarization, (b) optical diffrac-
tion, and (c) chemical etching. (Picture courtesy of TSMC, Hsin-Chu, Taiwan.)
Therefore, for deep submicron technologies, a combination of device physics, die location
50
dependence, optical proximity effects, micro-loading in etching and deposition may lead to het-
erogeneous and non-monotonic relationships among the process random variables. Also para-
sitic capacitance does not change monotonically or linearly according to those random parame-
ters, which have varying effects on interconnect geometries depending on local characteristics
of the layout and uncertainties in fabrication. Since all these process variations are random in
nature, statistical parasitic capacitance models having the ability to capture those complicated
nonlinear relationships become indispensable.
Furthermore, capacitance extraction with process variations can never be the final goal. Ca-
pacitance variation analysis needs to provide a model fully compatible with statistical model
order reduction and statistical timing analysis tools, most of which require representing para-
sitic capacitances as functions of some common random variables [21–25]. Also recent study
shows that the first order canonical model is not sufficient enough to represent the nonlinear
dependency of parasitic capacitances on many variation sources [26]. To our best knowledge,
although many efficient 3D capacitance extraction algorithms [9,13–19] have been proposed in
the literature and there have been some pioneer works [26, 27] on capacitance extraction with
the consideration of process variations, no algorithm has the functionality to efficiently supply
an explicit statistical capacitance model with high order accuracy.
51
3.2 Statistic Capacitance Extraction
The following four issues will be discussed in this section for modeling parasitic capacitance
variations: (1) how to efficiently solve the system equations associated with the variational
capacitance model; (2) how to mathematically model the surface fluctuation due to process
variations; (3) how to reduce the large number of random variables used to model the surface
fluctuation; (4) how to obtain the probability density function without using time consuming
Monte Carlo simulation.
3.2.1 Variational Capacitance Approximation
Assume for now that process variations induce some perturbations in the nominal potential
coefficientPkl between panelk and panell, and the variational potential coefficientPkl can be
represented in terms of the nominal valuePkl andk normal random variablesδ = [δ1 δ2 · · · δk]T
as
Pkl = Pkl +∑
i
∆P iklδi +
∑i,j
∆P ijkl δiδj + h.o.t. (3.1)
How to representPkl in the such a form will be presented in the following sections.
The expression ofPkl in terms ofδ can be extended to higher orders. If the first three terms
is used, Eq. 3.1 is the quadratic form of the potential coefficientPkl. The second term represents
the canonical linear model while the third term captures the nonlinear relationship betweenPkl
andδ. In the rest of this chapter, our discussion will be based on the quadratic form, since
higher order approximations can be easily extended using the presented derivation.
52
Since each entry of the variational link matrixH has the form shown in Eq. 3.1, the entire
H can also be expressed in a quadratic form as follows:
H = H +∑
i
∆H iδi +∑i,j
∆H ijδiδj, (3.2)
whereH, ∆H i, ∆H ij ∈ RN×N are constant coefficient matrices.
By using Eq. 2.10, the variational potential coefficient matrixP can also be represented in
terms ofP andδ
P = JT HJ +∑
i
JT ∆H iJδi +∑i,j
JT ∆H ijJδiδj,
= P +∑
i
∆P iδi +∑i,j
∆P ijδiδj
︸ ︷︷ ︸∆P
, (3.3)
where∆P i = JT ∆H iJ and∆P ij = JT ∆H ijJ . P is the potential coefficient matrix without
considering the process variations, and∆P , which is the summation of the second and third
terms in Eq. 3.3, represents the variational part ofP .
Let q denote the variational charge distribution vector, our goal is then to expressq in a
quadratic form, such that
q = q +∑
i
∆qiδi +∑i,j
∆qijδiδj
︸ ︷︷ ︸∆q
, (3.4)
whereq, ∆qi, ∆qij ∈ Rn×1. From Eq. 3.4, it is clear that the quadratic expressions of self and
coupling capacitances can be easily obtained.
From Eq. 3.3 and Eq. 3.4, the variational linear system can be then represented as
(P + ∆P )(q + ∆q) = v. (3.5)
53
Substituting the normal equation in Eq. 2.4 into Eq. 3.5 and applying the Taylor expansion,∆q
can be expressed as
∆q = −(I + P−1∆P )P−1∆Pq
= −P−1∆Pq︸ ︷︷ ︸∆q1
+ P−1∆PP−1∆Pq︸ ︷︷ ︸∆q2
+ · · ·
= Aq + A2q + · · · =∞∑i=1
Aiq, (3.6)
whereA = −P−1∆P .
Theorem 4 The variational charge distribution vector∆q can be represented as∆q =∑∞
i=1 Aiq,
whereA = −P−1∆P . The Taylor expansion series of∆q converges under the condition
‖ P−1∆P ‖p< 1. So high order terms can be iteratively calculated by using the following
equation
P∆qi+1 = −∆P∆qi. (3.7)
Since in practice, the perturbation matrix∆P is normally smaller than the normal poten-
tial coefficient matrixP , the convergence condition can be almost always satisfied. Let the
quadratic form representation of the first term on the right hand side of Eq. 3.6,∆q1, to be
∆q1 =∑
i
∆qi1δi +
∑i,j
∆qij1 δiδj. (3.8)
By using Eq. 3.3 andP∆q1 = −∆Pq, we can get
P∆qi1 = −∆P iq,
P∆qij1 = −∆P ijq, (3.9)
54
Therefore, the quadratic expression of∆q1 can be calculated by solving(k+k2) linear systems.
SinceP is sparse, each linear system in Eq. 3.9 can be efficiently solved by preconditioned
iterative methods withO(n) complexity. So the total complexity of solving∆q1 isO((k2+k)n).
Usually, the number of random variables,k, is much smaller than the total number of leaf panels
n.
The second term∆q2 =∑
i,j ∆qij2 δiδj in Eq. 3.6 can be obtained by using∆q1
P∆q2 = −∆P∆q1. (3.10)
Let the right hand side vector in Eq. 3.10 to beq1 = ∆P∆q1, then the quadratic approximation
of q1 can be expressed as
q1 =∑i,j
∆P i∆qj1δiδj + h.o.t. (3.11)
Therefore, the coefficient vectors of∆q2 can be obtained by
P∆qij2 = −∆P i∆qj
1, (3.12)
So the quadratic expression of∆q2 requires the solving ofk2 linear systems and hence the
complexity isO(k2n).
Therefore, by using the quadratic expressions of∆q1 and∆q2, the quadratic expression of
∆q is then obtained by
∆qi = ∆qi1,
∆qij = ∆qij1 + ∆qij
2 . (3.13)
55
So the total computational complexity of calculating the quadratic form of∆q is O(k2n).
Also, one may notice that the first order terms are only generated by∆q1 while the second
order terms are generated by∆q1 and∆q2. Therefore, for the quadratic form approximation,
when i > 2, ∆qi does not contain the first and second order terms, and hence can be safely
truncated. In the follow subsections, we will present how to express the variational potential
coefficients in a form in terms ofk normal random variables.
3.2.2 Process Variation Modeling
After the hierarchical panel discretization process, the positions of those most delicate panels,
leaf panels, may be varying due to process variations. The surface fluctuation of a conductor
can be described as a statistical perturbation on each nominal leaf panel smooth surface along
its normal direction as shown in Fig. 3.2.
Nominal smooth surface
Rough surface
nj
ni
correlation betweennjandni
Figure 3.2: Process variation modeling with correlated statistical position perturbations on leaf
panels.
Although leaf panel position variations may not be truly random, they can often be ac-
56
curately modeled by assuming an appropriate spatial correlation [27]. We denote leaf panel
position variations as a random variable vector∆n, where theith element in∆n, ∆ni, is the
random perturbation on the leaf paneli. For simplicity, one can assume that the expectation of
∆n is µ(∆n) = 0.
Obviously, the larger the distance between two leaf panels, the weaker the correlation will
be. This spatial relationship can be accurately modeled by using the Gaussian correlation func-
tion [27]. For two leaf panelsi andj, the correlation between them is determined by
Γij = e−‖xi−xj‖2/η2
, (3.14)
wheree is Euler constant andη is user-specified correlation length.xi andxj are the centers of
leaf panelsi andj, respectively. Then the correlation matrix can be written as
Γ(∆n) = (Γij)n×n. (3.15)
Many small terms inΓ(∆n) can be truncated to make it sparse if the corresponding two leaf
panels are separated faraway enough. Also if the variance on leaf paneli is assumed to beσi,
then the variance-covariance matrixΣ of ∆n can be obtained as
Σ(∆n) = (Γijσiσj)n×n. (3.16)
Therefore, the surface fluctuation can be modeled by the random vector∆n with meanµ(∆n) =
0 and the variance-covariance matrixΣ(∆n) given in Eq. 3.16.
57
3.2.3 Random Variable Reduction
Although the process variations can be modeled as position perturbations on leaf panels, the
number of random variables can easily exceeds several thousand and this may greatly limit the
size of the problem that can be analyzed.
The position perturbations of leaf panels may be caused by many unobservable variation
sources, either global or local. However, some of them may have significant effects on the
conductor surface fluctuation while others may not, and hence those non-significant factors can
be safely neglected in our modeling process. In multivariate statistics, determining the dominant
unobservable variation sources can be performed by principle factor analysis (PFA) [28] based
on either the correlation matrixΓ(∆n) in Eq. 3.15 or the variance-covariance matrixΣ(∆n) in
Eq. 3.16.
The random variable vector∆n representing the perturbations on leaf panels is observable,
and hasn components with the mean vectorµ(∆n) = 0 and the variance-covariance matrix
Σ(∆n) given in Eq. 3.16. The principle factor analysis postulates that∆n is linearly dependent
uponk (k << n) unobservable random variablesδ, called common factors. Thosek common
factors are used to model the unknown and unobservable dominant process variation sources
that inherently induce the perturbations on leaf panels.
Furthermore, the orthogonal principle factor analysis (OPFA), also referred to as principle
58
component model, assumes that
µ(δ) = 0,
Σ(δ) = I. (3.17)
The goal of orthogonal principle factor analysis is to find a loading matrixL ∈ Rn×k, such that
∆n = L × δ.
(n× 1) (n× k) (k × 1)
(3.18)
From the OPFA model in Eq. 5.32 and by using Eq. 5.31, one can easily obtain that
Σ(∆n) = LΣ(δ)L′ = LL′. (3.19)
Let Σ(∆n) have eigenvalue-eigenvector pairs(λi, ei) with λ1 ≥ λ2 ≥ · · · ≥ λn ≥ 0. Then the
eigen-decomposition ofΣ(∆n) is given by
Σ(∆n) = λ1e1e′1 + λ2e2e
′2 + · · ·+ λnene′n
=
[√
λ1e1
√λ2e2 · · ·
√λnen
]
√λ1e1
√λ2e2
...
√λnen
. (3.20)
So if the loading matrix equal is equal toL = [√
λ1e1 · · ·√
λnen], then we can obtainΣ(∆n) =
LL′ as in Eq. 3.19.
However, in this case, the principle factor analysis is not particularly useful since it employs
as many common factors as there are random variables and does not lead to any approximation
59
of Σ(∆n), although the correlative relationships among∆n have been decoupled. We prefer
models that explain the variance-covariance matrixΣ(∆n) in terms of just a few common
factors.
When the last(n−k) eigenvalues are small, one can neglect the contribution ofλk+1ek+1e′k+1+
· · ·+ λnene′n to Σ(∆n) in Eq. 3.20. So if one let
L = [√
λ1e1
√λ2e2 · · ·
√λkek], (3.21)
then neglecting this contribution leads to the approximation
Σ(∆n) ≈ λ1e1e′1 + λ2e2e
′2 + · · ·+ λkeke
′k = LL′. (3.22)
Furthermore, OPFA provides a easy way to determine how many number of common factors
are necessary to achieve the user specified accuracy. Since theith factor basically corresponds
to theith eigenvalue as shown in Eq. 3.20 and∑n
i=1 λi = tr(Σ(∆n)), the contribution of the
ith factor toΣ(∆n) can then be estimated by
ci =
λi
tr(Σ(∆n))factor analysis usingΣ(∆n)
λi
nfactor analysis usingΓ(∆n)
. (3.23)
So if∑k
i=1 ci of the firstk largest eigenvalues is larger than a user specified value depending on
accuracy requirement, the resultk number of factors will be applied to approximate∆n.
3.2.4 Potential Coefficient Approximation
For one pair of panelsk and l without process variations, the potential coefficient between
them is evaluated by Eq. 2.3. If panelsk and l have variations∆nk and ∆nl along their
60
normal direction, then the variation potential coefficientPkl is a function of∆nk and ∆nl,
Pkl = f(xk, xl, ∆nk, ∆nl). By expandingPkl into Taylor series around∆nk and∆nl, one can
obtain that
Pkl = Pkl + akl∆n + ∆n′Akl∆n + h.o.t, (3.24)
whereakl is a1 × 2 vector andAkl is a2 × 2 matrix. ∆n = [∆nk ∆nl]T is a random vector
containing∆nk and∆nl.
During the hierarchical panel refinement process, the recorded links may or may not be
created between two leaf panels as we have shown in Fig. 2.5. So∆n could contain the
variations on some non-leaf panels. Since our process variations and principle factor analysis
are performed in terms of variations on leaf panels, it is necessary to represent∆n in terms of
∆n.
Without loss of generality, we assume that the position variations of leaf panels are along
their normal direction. Then if two panelsi andj have variations∆ni and∆nj, the variation
on their parent panelk will be ∆nk = 1/2(∆ni + ∆nj). So all panel variations∆n, either leaf
or non-leaf, can be expressed in terms of variations on its underlying leaf panels
∆n = R∆n, (3.25)
whereR ∈ RN×n is a provable sparse matrix. For example, for the small tree structure shown
on the right hand side in Fig. 3.3, panels 1, 2, and 4 are leaf panels. Panel 3 is the parent of
panels 1 and 2, and hence∆n3 = 1/2(∆n1 + ∆n2). Panel 5 is the parent of panels 3 and 4,
and hence∆n5 = 1/2(∆n3 + ∆n4) = 1/4(∆n1 + ∆n2) + 1/2∆n4. The detailed algorithm for
61
constructing the random variable transformation matrixR is presented in Fig. 3.4.
4
2
1
5
4
3
2
1
2/14/14/1
100
02/12/1
010
001
n
n
n
n
n
n
n
n
1n
2n
3n
R
4
a5
3
1 2
Figure 3.3: Random variable transformation.
Therefore, by using Eq. 3.25,∆n can be expressed in terms of∆n as
∆n =
Rk
Rl
∆n, (3.26)
whereRk andRl are thekth and thelth rows in the transformation matrixR. And then the
variational potential coefficient between panelsk andl, Pkl, can be written as according to∆n
Pkl = Pkl + akl∆n + (∆n)′Akl∆n, (3.27)
where
akl = akl
Rk
Rl
, (3.28)
and
Akl =
Rk
Rl
′
Akl
Rk
Rl
. (3.29)
62
Furthermore, since the leaf panel variations can be represented usingk common factors, the
variational potential coefficient between panelsk andl, Pkl, can be further represented in terms
of thek dominant common factors
Pkl = Pkl + aklδ + δ′Aklδ, (3.30)
where
akl = aklL, (3.31)
Akl = L′AklL. (3.32)
The ith element in the vectorakl is equal to∆P ikl while ∆P ij
kl is equal to2(Akl)ij if i 6= j and
(Akl)ij if i = j. So the method presented in section 3.1 can be used to solve∆q.
3.2.5 Distribution of Parasitic Capacitance
After obtaining the quadratic expression of parasitic capacitance, Monte Carlo simulation can
be applied to determine the corresponding probability density distribution (PDF). However, in
this section, we will present a way to directly calculate the PDF of a parasitic capacitance given
its quadratic form.
To compute the PDF of the parasitic capacitance, we first need to calculate its characteristic
function. For a random variableX, its characteristic function is defined as
CX(ξ) = E(ejξX) =
∫ +∞
−∞ejξxfX(x)dx, (3.33)
wherefX(x) is the probability density function (PDF) ofX.
63
Since the characteristic function is actually an inverse Fourier transform of the PDF, the
PDF of the random variableX can easily computed if its characteristic function is known
fX(x) =1
2π
∫ +∞
−∞e−jξxCX(ξ)dξ. (3.34)
The formal proof of this conclusion can be found in [29].
For a parasitic capacitance defined in the quadratic form
C = C + aδ + δ′Aδ, (3.35)
whereδ ∼ N(0, Σ), its exact characteristic function can be analytically computed by [30]
CC(ξ) = |Ω|− 12 expjξm− 1
2ξ2a′Σ
12 Ω−1Σ
12 a, (3.36)
where|Ω| is the determinant of matrixΩ = I − 2jξΣ12 AΣ
12 . Once we obtainCC(ξ), the PDF,
and then the cumulative distribution function (CDF), can be computed from Eq. 3.34.
Clearly, there will be one step of eigenvalue decomposition (computingΣ12 ) and one step
of Fourier transformation in order to analytically compute the distribution of a parasitic capac-
itance. Since our principle factor vector isδ ∼ N(0, I), the Ω matrix can be simplified to
Ω = I − 2jξA. So thatCC(ξ) = |Ω|− 12 expjξm− 1
2ξ2a′Ω−1a and the eigenvalue decompo-
sition can be eliminated.
3.3 Experimental Results
The proposed capacitance variability modeling approach has been implemented in C/C++ lan-
guage. All experiments are executed on a Pentium(R) 4 CPU 1.4GHz machine with 1GB RAM.
64
Monte Carlo simulation with10, 000 runs is used for comparison purpose.
First, for the2× 2 bus crossing problem, probability density functions (PDF) obtained from
the canonical linear model and the quadratic model are shown in Fig. 3.5 and compared with
that from Monte Carlo simulation. It is illustrated that there is a significant accuracy improve-
ment by using the second order quadratic model instead of the canonical model. The accuracy
improvement of the quadratic model is mostly due to the probability distribution region corre-
sponding to larger capacitance values, which is actually more critical for circuit performance
and timing analysis. The canonical model will tend to underestimate the possible capacitance
value in the high probability region. This underestimation, in reality, will result in optimistic
design and excessive chip failure. This example clearly shows the necessity of the quadratic
model in today’s technology where process variation can no longer be ignored..
In the second experiment, the CDFs and PDFs of the second order quadratic models with
different number of dominant factors are compared. Without applying PFA, the number of
random variables is equal to the total number of leaf panel, which is1126 for bus2 × 2. In
practice, how many number of dominant factors need to be preserved is determined by the
Gaussian correlation length in Eq. 3.14. The setup of Gaussian correlation length depends on
the detailed processing techniques and the local layout characteristics. For different regions
and different panel orientations, we can assign different correlation lengths. In this test, PFA
with only ten factors is very close to the result CDF and PDF from Monte Carlo simulation,
so that ninety percent random variable reduction has been achieved by PFA. And in this case,
the error in CDF compared with Monte Carlo is less than3%. Furthermore, as the number
65
of factors increases, the CDFs and PDFs from the quadratic models quickly converge to those
from Monte Carlo simulation.
In table 3.3, the run times of Monte Carlo method and the quadratic model with10 dominant
factors for different bus crossing benchmarks are compared. It is clear that the quadratic model
exhibits over100× speedup compared with Monte Carlo simulation. Statistical distribution-
related parameters, such as mean value, standard deviation, and skewness are normally within
3% errors. Combined with the results from previous experiments, We can safely conclude that,
currently, the second order approximation is accurate enough for variational parasitic capaci-
tance modeling.
66
Procedure ConstructR
Input: (a) VectorPanel contains the indexes of all panels;
(b) VectorBasis contains the indexes of leaf panels.
Output: R ∈ RN×n, such that∆n = R×∆n.
1: n = Basis.size();
2: for i = 1 · · ·n do
3: X = Basis[i];
4: InsertEntry(R,X, i, 1);
5: value = 1/2;
6: while Panel[X].parent! = NULL do
7: X = Panel[X].GetParent();
8: InsertEntry(R, X, i, value);
9: value = 1/2× value;
10: end while
11: end for
Figure 3.4: An efficient algorithm for constructing the random variable transformation matrix
R. The functionInsertEntry(R, i, j, value) fills value into the entry(i, j) of R.
67
−400 −300 −200 −100 0 100 200 300 4000
0.002
0.004
0.006
0.008
0.01
0.012
0.014Parasitic capacitance PDF comparison with different model
Parasitic Capacitance
Pro
babi
lity
Den
sity
Linear Model
Quadratic Model
Monte Carlo
Figure 3.5: First and second order capacitance models and their comparisons with Monte Carlo
method for the bus2× 2 benchmark (σ = 20%).
68
−400 −300 −200 −100 0 100 200 300 4000
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1Second order parasitic capacitance CDF comparison
Parasitic Capacitance
Pro
babi
lity
2 factors
5 factors
10 factorsMonte Carlo
< 3% error
−400 −300 −200 −100 0 100 200 300 4000
0.002
0.004
0.006
0.008
0.01
0.012
0.014Second order parasitic capacitance PDF comparison
Parasitic Capacitance
Pro
babi
lity
Den
sity
2 factors
5 factors 10 factors
Monte Carlo
Figure 3.6: Second order parasitic capacitance modeling with different number of factors and
the comparison with Monte Carlo method for bus2× 2 benchmark.
69
2× 2 Bus
Method Time Meanµ Std Variationσ Skewnessη
M.C. 1826 -78.56 106.01 1.868
QuadMod 9.78 -81.43 103.64 1.927
Speedup/Err 186.7× 3.7% 2.2% 3.2%
4× 4 Bus
Method Time Meanµ Std Variationσ Skewnessη
M.C. 4673 -194.89 85.62 -1.78
QuadMod 16.88 -192.45 83.78 -1.72
Speedup/Err 276.8× 1.3% 2.1% 3.4%
6× 6 Bus
Method Time Meanµ Std Variationσ Skewnessη
M.C. 8568 -195.49 89.34 -1.42
QuadMod 69.56 -190.71 85.52 -1.37
Speedup/Err 123.2× 2.4% 4.3% 3.5%
Table 3.1: Simulation runtime comparison for bus crossing benchmark. (1) Monte Carlo (M.C.);
(2) Quadratic Model (QuadMod).
70
Chapter 4
Fast Analytic Lithography Simulation –
LITHSIM
In Chapter 3, we present statistical capacitance extraction algorithm StatCap for considering
geometry fluctuation induced by process variations. To study the real process variation intro-
duced in the lithography process, in this chapter, we propose an analytic optical projection sys-
tem simulation algorithm by using a simplified lithography system model to directly generate
photomask image on the wafer.
Integrated circuits are made using optical lithography, a process similar to photographic
printing, in which the patterns that will become layers of an integrated circuit are exposed on
a semiconductor wafer, one layer at a time [50]. During the lithography process, the wafer is
first spin-coated with photoresist, which is a light-sensitive organic polymer. The photoresist
is exposed to ultra violet light passing through the photomask which composes of opaque and
71
transparent regions. For a positive photoresist, exposed areas become soluble and non-exposed
areas remain hard. The soluble photoresist is chemically removed (development) from wafer
surface and the patterned photoresist will serve as an etching mask for the silicon dioxide.
3. Development
Substrate
1. Photoresist coating
SiO 2
Photoresist
Substrate
Substrate
Mask
Ultra violet light Opaque
Exposed Unexposed
2. Exposure
Figure 4.1: General optical lithography process: (1) Photoresist coating (2) Exposure (3) De-
velopment.
Optical lithography is widely used for mass production of ultra large scale integrated (ULSI)
devices mainly due to its superiority in economic terms [36–42]. However, as the sub-wavelength
gap (Fig. 4.2) between the wavelength of light used in the lithography projection system and
the size of the features is growing, lithography has become the bottleneck controlling the device
scaling, circuit performance, and magnitude of integration for silicon semiconductors.
Furthermore, the sub-wavelength gap gives rise to optical distortions which manifest them-
selves in the form of unprinted patterns and distorted geometries. These so-called optical prox-
imity effects may cause significant performance degression, or even worse, cause missing, in-
complete, or shorted structures that result in hard failure [32]. Therefore, efficient lithography
simulation algorithms have been found indispensable in the investigation of optical proximity
effects and the application of non-equipment based resolution enhancement techniques (RET),
72
Figure 4.2: Subwavelength gap between IC future size and light wavelength. (Picture courtesy
of Numerical Technologies, Inc and Synopsis, Inc.)
e.g. phase shifting mask (PSM) [43–46] and optical proximity correction (OPC) [47], to achieve
improved performance in the sub-wavelength realm [35].
photoresist is exposed to ultra violet light passing through the photomask which a positive
photoresist, exposed areas become soluble and removed (development) from wafer surface
superiority in economic terms. However, as the minimum feature sizes required for fabri-
cation of packing density increasing, the wavelength of the light used to project the resolve the
ever-shrinking details of each generation of reduced exposure wavelength, but shortcoming of
projected mask shapes to vary depending upon the density, size, and location of nearby prox-
imity effects manifest themselves in the form of unprinted patterns and performance degression,
or even worse, cause missing, subwavelength gap poses significant intentions.
73
realm. Phase shifting mask (PSM) enables significantly smaller geometries; optical prox-
imity fix subwavelength distortions. Combined, these technologies offer of existing, available
optical lithography
To perform the aerial image calculation, photomasks must be sampled before being repre-
sented on the computer. However, one of the main challenges to using general purpose aerial
image simulators, such as SPLAT [48], in IC applications is the formidable size of the data rep-
resenting a typical mask pattern [49]. To illustrate this point by example consider a moderately
sized IC that occupies1cm × 1cm of silicon with a minimum feature size of1µm. A fairly
sparse sample spacing of0.25µm along each side immediately results in1.6 × 109 points to
represent the image of the chip. Hence, it is extremely important to reduce both the execution
time and the memory consumption of lithography simulation in VLSI applications.
Furthermore, point sampling of the mask in the spatial domain is mathematically attained
by multiplying the mask function by an Dirac delta impulse lattice in a finite domain. Multi-
plication by the Dirac delta function in the spatial domain corresponds to the convolution of
the frequency spectrum of the mask function with the Fourier transformation of the impulse
lattice, which is another delta function, in the frequency domain. Therefore, the spectrum of the
sampled mask function consists of replica of the spectrum of the photomask. Since the spec-
trum of the mask function is not bandlimited, the high frequency components are mixed into
the low frequency components and aliasing error takes place. When aliasing takes place, it is
not possible to distinguish the high frequency parts from low frequency parts, because they are
tightly combined with each other. Consequently, the final calculated image may then be a poor
74
approximation of the real mask image [33].
In this chapter, we propose a fast mask image calculation algorithm, LithSim. By exploit-
ing the regular structure within general IC masks, LithSim provides a close-form formula to
directly generate mask images. One of the main advantages of LithSim is that LithSim does
not require the sampling of the mask and can directly calculate the intensity for an arbitrary
point on the image plane, and hence it eliminates the aliasing error in the discretization process.
Furthermore, by discovering the fact that the mask image is the summation of images of many
simpler structures, the entire mask function can be represented in terms of those regular struc-
tures instead of sampling points, so that the memory consumption is greatly reduced. A careful
analysis demonstrates that the complexity of LithSim is proportional to the total number of cal-
culation points while the traditional discrete Fourier transformation approach is of complexity
O(nlogn).
4.1 Preliminaries
Optical lithography comprises four basic elements: an illumination system, a reticle, an expo-
sure system, and a wafer coated with photoresist.
The illumination system, which consists of a light source and a condenser lens, plays an
important role in the lithography modeling process. The illumination system delivers light to
the mask with the specified intensity, uniformity, spectral characteristics, and spatial coherence.
Traditional optical lithography uses circular light source to maintain directional uniformity such
75
that the same feature are replicated identically regardless of their orientations [51]. However,
circular light source is partial coherent, which is the main obstacle for efficient lithography
simulation. To simplify our discussion, we first assume that the light source is a point source,
which will lead to a simplified optical system model. After the simplified model is obtained, we
will extend it to consider more general light sources and the well-known Hopkins model will
be introduced.
4.1.1 Simplified Projection System Model
By using the Kohler’s method, the point light source is placed in the focal plane of the condenser
and the rays therefore illuminate the mask as a parallel beam as shown in Fig. 4.3.
2. Low Pass Filter
Projection Lens
Mask
Pupil
Numerical Aperture
Photoresist
1. Fourier Transformation
3. Inverse Fourier Transformation
Parallel Illumination
a
Figure 4.3: Generic exposure system in optical projection lithography.
Once the light passes through the mask, Fraunhofer diffraction effects come into play. Be-
fore applying resolution enhancement techniques, the mask can be described by a two dimen-
76
sional mask function
f(x, y) =
1 in clear regions
0 in opaque regions
(4.1)
After the mask diffracts the light, energy transmitted through the photomask forms a distribution
at the entrance to the pupil plane and can be described by the Fraunhofer diffraction integral in
the far field region, which is equivalent to the Fourier transformation of the mask function:
F (fx, fy) =
∫ +∞
−∞
∫ +∞
−∞f(x, y)ej(fxx+fyy)dxdy (4.2)
where
fx = κx′/R, fy = κy′/R (4.3)
are called spatial angular frequencies of the diffraction pattern.κ = 2π/λ is the spatial fre-
quency of the illumination andR is the distance between the mask and the surface of the pupil
plane.
From Eq. 4.3, low spatial frequency components closer to the center to the pupil pass
through the pupil plane, while high frequency components near the peripheral of the pupil are
cut off. Therefore, the pupil acts as a low pass filter that truncates high frequency components
from the spectrum of the mask function. For a pupil with radiusa, the pupil function in the
frequency domain can be described as [31]:
P (fx, fy) =
1√
f 2x + f 2
y ≤ κa/R = κ×NA
0 otherwise
(4.4)
77
NA is defined as the numerical aperture of the pupil.
After the light passes through the pupil, the objective or projection lens is required to collect
as much of the diffract light as possible and focus it onto the resist layer on the wafer. Due to
the Fourier transforming property of the lens, the light field transmitted through the condenser
lens can be represented as:
ε(x, y) = F−1Ff(x, y)P (fx, fy) (4.5)
Eq. 4.5 is the mathematical model commonly used to describe the projection exposure system.
The irradiance, which is the average energy per unit area per unit time, is then proportional to
the square of the amplitude of the light field in Eq. 4.5:
I(x, y) = ‖ε(x, y)‖2 (4.6)
4.1.2 General Lithography System Model
Eq. 4.5 is the simplified model for an optical projection system since we assume that the light
rays come from a single point light source and become parallel after passing through the con-
denser. In this scenario, the illumination system is completely coherent. However, in reality, the
photomask is illuminated by light rays traveling in different directions since the light source is
circular instead of a point and hence the illumination system is partially coherent. Partial coher-
ent illumination improves the theoretical resolvable minimum feature but makes the projection
system model much more complicated.
In previous subsection, the point source is assumed to be on the axis and the correspond-
78
ing spectrum of the photomask isF (fx, fy) as shown in Eq. 4.2. For a general point light
source which is located off-axis, if we assume that the optical system is shift-invariant, the light
intensity then will be a shifted version of the one described by Eq. 4.5:
ε(x, y, fx, fy) = F−1F (fx − fx, fy − fy)P (fx, fy) (4.7)
wherefx andfy are determined by the location of the off-axis point light source.
The shift of the spectrum of the photomask is equivalent to shift the pupil function in the
frequency domain as shown in Fig. 4.4. For a light source containing many off-axis point
f x f x
f y f y f y ^
f x ^
Figure 4.4: Shift photomask spectrum is equivalent to shift pupil function.
sources, the light fields generated by each pair of point light sources, which produce waves
traveling in different directions, interference with each other, and hence the Hopkins model is
obtained
I(x, y) =
∫...
∫J(fx, fy)P (fx + f ′x, fy + f ′y)P
∗(fx + f ′′x , fy + f ′′y )
F (f ′x, f′y)F
∗(f ′′x , f ′′y )e−i2π[(f ′x−f ′′x )x+(f ′y−f ′′y )y]dfxdfydf ′xf′yf
′′x f ′′y . (4.8)
J(fx, fy) is the effective light source, which is the image of the illumination source on the pupil
79
plane in the absence of the photomask. Therefore,J(fx, fy) is basically the spectrum of the
light source. For a circular illumination light source, the effective sourceJ(fx, fy) will fill in a
circle with radiusσ in the pupil plane and can be represented as:
J(fx, fy) =
1πσ2 if
√f 2
x + f 2y ≤ σ
0 otherwise
. (4.9)
whereσ is refer to as the partial coherent factor.
As we can see from the Hopkins model in Eq. 4.8 that each pair of shifted photomask
spectrum is weighted by a factor known as the transmission cross-coefficient (TCC):
TCC(f ′x, f′y, f
′′x , f ′′y ) =
∫ ∫J(fx, fy)P (fx + f ′x, fy + f ′y)P
∗(fx + f ′′x , fy + f ′′y )dfxdfy (4.10)
For a circular light source,J(fX , fy) is of circular shape as defined in Eq. 4.9.P (fx+f ′x, fy+f ′y)
andP (fx+ f ′′x , fy + f ′′y ) are shifted pupil functions which are also circles centered at(−f ′x,−f ′y)
and(−f ′′x ,−f ′′y ) respectively. So TCC is the overlap area intersected by those circles as shown
in Fig. 4.5. Based on the definition of TCC, the Hopkins model can then be rewritten as:
I(x, y) =
∫ ∫ ∫ ∫TCC(f ′x, f
′y, f
′′x , f ′′y )F (f ′x, f
′y)F
∗(f ′′x , f ′′y )e−i2π[(f ′x−f ′′x )x+(f ′y−f ′′y )y]f ′xf′yf
′′x f ′′y
Therefore, TCC couples the inverse Fourier transformations of two shifted photomask spectrum
together and hence greatly increases the complexity of lithography simulation.
4.2 LithSim Algorithm
LithSim is based on the simplified model in Eq. 4.5 by assuming that the light rays illuminating
the photomask are parallel. The main computational advantages of LithSim we propose are
80
f x
f y
( , ) x ^
-f'' -f'' y ^
( , ) x ^
-f ' -f ' y ^
TCC
Effective light source
Shifted pupil fuction I
Shifted pupil fuction II
Figure 4.5: Transmission cross-coefficient (TCC).
realized by exploiting the structure inherent in IC mask patterns. Although features on the
photomasks have a wide variety of shapes and dimensions, most of them can be approximated
by one of the three types: line, spaces, and contacts. As shown in Fig. 4.6, IC masks can be
decomposed into rectangular slits with different width, height and location.
Mathematically, assume thatf(x, y) is the mask function, it can be represented as a summa-
tion of N much simpler slit functions, each of them corresponding to a simple two dimensional
slit:
f(x, y) =N∑
i=1
fi(x, y) (4.11)
81
Figure 4.6: Mask decomposition.
where
fi(x, y) =
1 x0i ≤ x ≤ x1
i andy0i ≤ y ≤ y1
i
0 otherwise
(4.12)
Let p(x, y) be the inverse Fourier transformation of the pupil functionP (fx, fy) in the spa-
tial domain and substitutingf(x, y) =∑N
i=1 fi(x, y) into the image formulation equation, we
obtain that:
ε(x, y) = F−1Ff(x, y)P (fx, fy)
=N∑
i=1
F−1Ffi(x, y)Fp(x, y) (4.13)
By applying the convolution theorem, Eq. 4.13 can be further simplified to:
ε(x, y) =N∑
i=1
F−1Ffi(x, y) ~ p(x, y)
=N∑
i=1
fi(x, y) ~ p(x, y) (4.14)
82
Therefore, the real image functionε(x, y) of maskf(x, y) composing ofN slit function
fi(x, y) is the algebraic summation ofN εi(x, y), which is the image formulated by an individ-
ual slitfi(x, y):
εi(x, y) = fi(x, y) ~ p(x, y)
=
∫ +∞
−∞
∫ +∞
−∞fi(u, v)p(x− u, y − v)dudv
=
∫ x1i
x0i
∫ y1i
y0i
p(x− u, y − v)dudv (4.15)
Consequently, if we can efficiently compute the image of a single slit by using Eq. 4.15, the
entire complex image of an arbitrary mask can be obtained by the superposition theorem. Since
the shape of the pupil is much simpler than that of the mask, the convolution in Eq. 4.15 can be
obtained explicitly.
4.2.1 Rectangular Pupil
First we consider a rectangular pupilP (fx, fy), which can be represented in the frequency
domain as:
P (fx, fy) =
1 fx ≤ |Kx| andfy ≤ |Ky|
0 otherwise
(4.16)
Its inverse Fourier transformation in the spatial domain is then as follows:
p(x, y) =1
(2π)2
∫ Kx
−Kx
∫ Ky
−Ky
e−j(fxx+fyy)dfxdfy
=1
π2KxKysinc(Kxx)sinc(Kyy) (4.17)
83
wheresinc(x) is the well-known sinc function defined assinc(x) = sin(x)/x. Sinc function
basically is a sinusoidal function modularized by1/x, and hencesinc(x) = 1 whenx = 0 and
sinc(x) = 0 whenx →∞.
−20
−10
0
10
20
−20
−10
0
10
20−0.4
−0.2
0
0.2
0.4
0.6
0.8
1
x
An inverse Fourier transformation p(x,y) of a rectangular pupil function P(fx,f
y)
y
p(x,
y)
Figure 4.7: Inverse Fourier transformation of a rectangular pupil.
By substituting Eq. 4.17 into Eq. 4.15 and given the slit functionfi(x, y) in Eq. 4.12, we
get
εi(x, y) =1
π2
∫ Kx(x−x0i )
Kx(x−x1i )
sinc(u)du
∫ Ky(y−y0i )
Ky(y−y1i )
sinc(u)du
=1
π2[Si(Kx(x− x0
i ))− Si(Kx(x− x1i ))][Si(Ky(y − y0
i ))− Si(Ky(y − y1i ))]
Fig. 4.8 shows the image of a square slit calculated by using Eq. 4.18.
Si(x) in Eq. 4.18 is the sine integral function defined as
Si(x) =
∫ x
0
sinc(u)du (4.18)
84
−20
−10
0
10
20
−20
−10
0
10
20−0.2
0
0.2
0.4
0.6
0.8
1
1.2
x
Image Function εi(x,y) for a 1µm×1µm square.
y
Figure 4.8:εi(x, y) of a1µm× 1µm slit.
Therefore, efficient calculation of image formed by slitfi(x, y) and a rectangular pupil depends
on whether we can efficiently solve the sine integral. Fortunately, sine integral has been exten-
sively studied in Mathematics due to its great importance in Fourier analysis.
Three methods are generally used in the literature to calculate sine integral function. (1)
Taylor expansion; (2) Chebyshev expansion; (3) Spline curve fitting. By comparison of these
methods, the first method, e.g. Taylor expansion is adopted in our scenario due to its easy
representation and sufficient accuracy. The sine integral can be expanded into an infinite Taylor
series as follows:
Si(x) =
∫ x
0
sinc(u)du =∞∑
k=1
(−1)k−1 x2k−1
(2k − 1)(2k − 1)!(4.19)
By using Eq. 4.19, Eq. 4.18 turns out to be an analytic formula to calculate images in the case
that the pupil is rectangular.
85
−30 −20 −10 0 10 20 30−2
−1.5
−1
−0.5
0
0.5
1
1.5
2
X
Sin
e In
tegr
al
π /2
−π /2
Figure 4.9: Waveform of sine integral function.
4.2.2 Circular Pupil
For a circular pupil which is used in most projection optical lithography system, it can be repre-
sented in the frequency domain as in Eq 4.4. First, we need to calculate its correspondingp(x, y)
in the spatial domain. The evident circular symmetry suggests the use of polar coordinates, and
so Let
fx = kcosα fy = ksinα x = rcosθ y = rsinθ (4.20)
By switching to polar coordinates, we get
p(r, θ) = F−1P (k, α)
=1
(2π)2
∫ a
0
∫ 2π
0
e−jkrcos(α−θ)kdαdk (4.21)
Inasmuch asP (k, α) is circularly symmetric, its inverse Fourier transform must be circularly
86
symmetric as well, which implies thatp(r, θ) is independent ofθ. So the integral can be simpli-
fied by lettingθ equal some constant value, which we choose to be zero, whereupon,
p(r) =1
(2π)2
∫ a
0
k∫ 2π
0
e−jkrcosαdαdk (4.22)
The quantity which arises quite frequently in the Mathematics of physics
J0(u) =1
2π
∫ 2π
0
ejucosαdα (4.23)
is known as the Bessel Function (of the first kind) of order zero. More generally,
Jm(u) =i−m
2π
∫ 2π
0
ej(mα+ucosα)dα (4.24)
represents the Bessel function of orderm. Another general property of Bessel functions, refer
to as a recurrence relation, is
d
du[umJm(u)] = umJm−1(u) (4.25)
Whenm = 1, this clearly leads to
∫ u
0
wJ0(w)dw = wJ1(u) (4.26)
By using the recurrence relation of Bessel function, Eq. 4.22 can be simplified to
p(r) =a2
2π
J1(ra)
ra(4.27)
Sincer =√
x2 + y2, the inverse Fourier transformation represented in the rectangular coordi-
nates is
p(x, y) =a2
2π
J1(√
x2 + y2a)√x2 + y2a
(4.28)
87
By using Eq. 4.28 and substituting the Taylor expansion of the Bessel function
J1(x) =∞∑
k=0
(−1)k (x/2)2k+1
k!(k + 2)!(4.29)
into Eq. 4.15, we get:
εi(x, y) =a2
2π
∞∑
k=0
(−1)k(a/2)2k
k!(k + 2)!
k∑i=0
i
k
u2k−2i+1|x−x0x−x1
v2i+1|y−y0y−y1
(2k − 2i + 1)(2i + 1)
(4.30)
4.2.3 LithSim Simulation Flow
The main advantage of LithSim is that we have a close-form formula to calculate the intensity
at an arbitrary point on the image plane, thus we avoid the sample process of mask and pupil,
and hence eliminate the aliasing error introduced in the discretization.
For each slit, we can adopt a windowing method to greatly reduce the calculation cost. Fig.
4.8 shows that the irradiance becomes very small as the calculation point gets far away from
the slit. Therefore, we only need to calculate nearby regions surrounding that slit (Fig. 4.10).
Assume the total number of calculation points on the image plane isN , then the complexity of
LithSim will be O(cN), wherec is a constant depending on the windowing size we use.
Furthermore, sine integral and Bessel integral can be tabulated to avoid repeated calculation.
As a summary, LithSim simulation flow is shown in Fig. 4.11.
88
1 2 3
4 5
5
4
1 2 3
Figure 4.10: Windowing method to reduce computational cost.
4.3 Experimental Results
LithSim is implemented inC + + language and Matlab. All experiments are executed on a
Pentium(R) 4 CPU 1.4GHz machine with 1GB RAM. We also implement the discrete FFT
(DFFT) and discrete convolution (DCONV) in Matlab and compare the three algorithms with
respect to continuous convolution (CCONV).
First, the irradiance matrix of a simple mask containing three parallel slits is calculated by
using the above four methods. The width and the height of three slits are3µm and17µm respec-
tively. The edge-to-edge spacing between slits is5µm. The cut-off angular spatial frequency of
the pupil is set to0.86 cycles perµm. Compared to the continuous convolution, for this small
test case, the discretization based methods, discrete FFT shows above10% error and discrete
convolution exhibits about8% error. From Fig. 4.12, we can see that the intensity generated by
DFFT andDCONV exhibits excessive higher peaks, which is related to the high frequency
components mixed into the low frequency parts introduced in the sampling process. On the
89
Mask PatternSpecification(GDSII, CIF)
Mask PaternDecomposition
Generate ComputationWindow of Slit i
Slit List
LUT of SineIntegral and
Bessel Integral
For each point(x,y) in window i,calculate
i(x,y)
End
Y
N
width and height of slit i
Finish All Slits ?
x and y
i(x,y)
Figure 4.11: LithSim Optical Lithography Simulation Flow.
country, LithSim avoids the discretization and hence naturally eliminates the aliasing error. For
this small test case, LithSim shows less than1% error as shown in Fig. 4.13. From Fig. 4.14,
we can see that the image calculated by LithSim almost cannot be distinguished from the one
generated using continuous convolution.
90
0
10
20
30
0
10
20
300
0.5
1
1.5
Discrete Fourier Transformation
0
10
20
30
0
10
20
300
0.5
1
1.5
Discrete Convolution
0
10
20
30
0
10
20
300
0.5
1
1.5
LITHSIM
0
10
20
30
0
10
20
300
0.5
1
1.5
Continious Convolution
Figure 4.12: Irradiance calculated by using discrete Fourier transformation, discrete convolu-
tion, LithSim, and continuous convolution.
91
0
10
20
30
0
10
20
300
0.05
0.1
0.15
0.2
Discrete Fourier Transformation
0
10
20
30
0
10
20
300
0.02
0.04
0.06
0.08
0.1
0.12
Discrete Convolution
0
10
20
30
0
10
20
300
2
4
6
8
x 10−3
LITHSIM
Figure 4.13: Errors in irradiance matrices calculated by using discrete Fourier transformation,
discrete convolution and LithSim compared to continuous convolution.
92
Discrete Fourier Transformation
5 10 15 20 25
5
10
15
20
25
0.2
0.4
0.6
0.8
1
1.2
Discrete Convolution
5 10 15 20 25
5
10
15
20
25
0.2
0.4
0.6
0.8
1
1.2
LITHSIM
5 10 15 20 25
5
10
15
20
25
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
1.1
Continious Convolution
5 10 15 20 25
5
10
15
20
25
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
1.1
Figure 4.14: Images (contours) calculated by using discrete Fourier transformation, discrete
convolution, LithSim, and continuous convolution.
93
111× 111 Points, 42 Slits
Algorithm LITHSIM DFFT DCONV
Execution Time (s) 1.359 7.31 6.92
Percentage Error (%) 0.63 8.74 6.57
759× 759 Points, 75 Slits
Algorithm LITHSIM DFFT DCONV
Execution Time (s) 1.671 187.6 137.5
Percentage Error (%) 0.86 9.26 8.47
4351× 4351 Points, 167 Slits
Algorithm LITHSIM DFFT DCONV
Execution Time (s) 3.546 > 1h > 1h
Percentage Error (%) 0.82 9.67 8.78
10239× 10239 Points, 393 Slits
Algorithm LITHSIM DFFT DCONV
Execution Time (s) 28.25 > 1h > 1h
Percentage Error (%) 0.74 8.92 8.35
Table 4.1: Extraction time and error comparison.
94
Chapter 5
Efficient Inductive Effect Extraction with
Lossy Substrate – EPEEC
5.1 Introduction
The industry trend of integrating higher levels of circuit functionality on one chip and the wide-
spread growth of wireless communication have triggered the proliferation of mixed analog-
digital systems. However, the development of efficient interconnect models for such a system
is made more difficult because of the lossy nature of the silicon substrate. In particular, the
creation of substrate eddy currents can lead to considerable interconnect inductive and ohmic
losses. As the behavior of on-chip interconnects becomes a dominant factor in overall circuit
performance at high frequencies, an interconnect system analysis without considering the lossy
substrate effects will result in an over-designed network and seriously waste chip resources [52].
95
With the increasing clock frequency and integration density, intentional and unintentional
inductive effects gradually rise in VLSI design. Inductance computation is a difficult task since
inductance depends on the current return path, which is unknown prior to the extraction and
simulation of a circuit model [53–55].
Fortunately, the PEEC method has been widely adopted to deal with this issue [56]. How-
ever, since PEEC assumes that each conductor segment has a current return path at infinity,
inductive couplings are now among all conductor segments, so that extremely dense partial in-
ductance matrices are usually generated. For this reason, the reluctance-based method [57, 58]
has been proposed by Hao Ji et al to alleviate this problem. Since reluctance has higher de-
gree of locality similar to capacitance, only a small number of neighbors need to be considered.
Consequently, the reluctance matrix for circuit simulation is very sparse compared to the partial
inductance matrix.
Moreover, the traditional PEEC approach does not take substrate effects into consideration,
and hence cannot capture inductive and ohmic losses due to the formation of eddy currents
in the conductive substrate. Although several previous works have been proposed to resolve
this issue by constructing three dimensional linear substrate models, such as [59–65], most
of these approaches are based on the numerical finite difference method. With the roaring
clock frequency and the reduced substrate resistivity, a large volume of silicon bulk needs to be
spatially discretized into very tiny cells to capture the substrate effects accurately. Therefore, the
obtained equivalent circuit models are extremely prohibitive in sizes since inductive couplings
are now among all conductor segments and substrate cells.
96
In this chapter, we propose EPEEC, an accurate, compact, and efficient interconnect model-
ing methodology to extend the PEEC model to consider multi-layer substrates based on complex
image theory [66], which has recently been used in RFIC regime to consider microstrips and
spiral conductors over a single layer substrate [67–69]. To deal with multi-layer substrates, we
present the detailed methodology to derive the effective complex distance (ECD) between phys-
ical conductors and their corresponding complex images by preserving the first moment of the
analytic vector potential formulation. The EPEEC model is obtained by modifying PEEC with
mutual inductances between physical and image conductors separated by the effective complex
distance. Since EPEEC reflects the substrate effects in resistance and inductance values di-
rectly based on the configuration of substrate instead of applying discretization, it leads to very
compact models for interconnects.
For modeling even larger scale interconnect systems, EPEEC is enhanced to extract re-
luctance instead of inductance by applying an extended window-based reluctance extraction
algorithm. Furthermore, we propose a reluctance realization algorithm by directly converting
reluctances to circuit elements compatible with general circuit simulators, such as SPICE.
After validating the EPEEC model by comparison with the rigorous full-wave simulator,
SonnetR©, we use EPEEC to comprehensively study the impacts of frequency and substrate
configuration, such as thickness and conductivity, on interconnect models.
with describing the application of complex image theory to on-chip interconnects above
presents the EPEEC model based on the derived effective complex summary of our work (Sec-
tion V) conclude this
97
5.2 Electro-magnetic Formulation of Substrate Eddy Cur-
rent and Complex Image Theory
In this section, we explain the generation and the nature of eddy currents in a multi-layer sub-
strate. The effective complex distance can be obtained by preserving the first moment of the
analytic vector potential formulation. Then we discuss the application of complex image theory
to on-chip interconnects above a lossy multi-layer substrate.
5.2.1 Generation of Substrate Eddy Currents
Eddy currents in the substrate are caused by time-varying magnetic fields. If a time-varying
magnetic flux densityBf is induced by currents in interconnects, an electric fieldE is produced
in the substrate as
5× E = −∂Bf
∂t. (5.1)
The electric fieldE can be expressed in terms of the vector magnetic potentialA and the
scalar potentialφ by
E = −∂A
∂t−5φ. (5.2)
This electric fieldE in turn establishes currents flowing according to Ohm’s law
J = σE. (5.3)
Substituting Eq. 5.3 into Eq. 5.2 leads to
J = −σ(∂A
∂t+5φ). (5.4)
98
to the electrically induced currents.
These induced currents will produce another magnetic field according to Ampere’s Law
5×B = µ(J +∂D
∂t). (5.5)
By using Eq. 5.3 and applying the constitutive equationD = εE, the time-harmonic format of
Ampere’s Law can be expressed as:5 × B = µ(σE + jωεE). Since at current frequencies
of interest (< 20GHz),σ >> ωε, the second term representing the displacement currents is at
least three orders of magnitude smaller than the first term and can be safely ignored. Therefore,
Ampere’s Law in Eq. 5.5 can be simplified as
5×B = µJ. (5.6)
Since the magnetic flux densityB is solenoidal, we have5 · B = 0. Substituting Eq. 5.4
into Eq. 5.6 and applying vector identities5× (5×F) = 5(5·F)−52F and5×5φ = 0,
it can be obtained that
52B− µσ∂B
∂t= 0. (5.7)
Eq. 5.7 is referred to as the diffusion equation in terms of the magnetic flux densityB.
From Eq. 5.7, one can see that although the current arising from the electrical potential
φ in Eq. 5.4 could be as large as the current arising from the magnetic vector potentialA,
its contribution to the magnetic flux density can be ignored by noticing that5 × 5φ = 0.
Furthermore, since the magnetic flux densityB determines the magnetic fluxΦ, and hence
directly affects the line parameterL = Φ/I, we do not need to consider the current arising from
99
the electrical potentialφ [70,71], and in this scenario, Eq. 5.4 can be approximated as
J = −σ∂A
∂t. (5.8)
SubstitutingB = 5×A into Eq. 5.6 and adopting Coulomb gauge5 ·A = 0 leads to
52A = −µJ. (5.9)
By using Eq. 5.8 and Eq. 5.9, we get
52A− µσ∂A
∂t= 0. (5.10)
Eq. 5.10 is the diffusion equation of the vector potential in a medium subject to a time-varying
magnetic field.
5.2.2 Analytic Vector Potential within A Multilayer Substrate
Outside the diffusion/active areas and contact areas, the substrate can be treated as consisting
of uniformly-doped semiconductor-material layers of varying doping densities [61].
Assume that a long current filament is located distanceh above a multilayer substrate. Cur-
rent density within the filament is denoted byJf . The substrate consists ofn layers. The layer
k in the substrate has thicknesstk, conductivityσk, permeabilityµk, and is assumed infinite in
the traverse direction. Regions above and below substrate are free spaces. The configuration is
shown in Fig. 5.1.
For frequencies up to a few giga-Hertz, we can make magneto-quasi-static assumption. Un-
der this assumption, induced eddy currents within the substrate will be parallel to the filament.
100
nt n n Layer n
2t
2 2 Layer 2
1t
1 1 Layer 1
Substrate
h
fJ
y
xz
0
Figure 5.1: A current filament parallel to a multilayer substrate which contains different layers
of different thickness, conductivity, and permeability.
For a z-direction filament current, only the z-component ofA is nonzero, so that the problem
becomes two dimensional. By using Eqs. 5.9 and 5.10, we can obtain magnetic vector potential
diffusion equations in different regions
52A0(x, y) = −µ0δ(0, y − h)Jf Above Substrate,
52Ak(x, y) = jωµkσkAk(x, y) Within Substrate,
52An+1(x, y) = 0 Below Substrate,
(5.11)
wherek = 1, · · · , n. Ak denotes the vector potential within the substrate layerk.
Applying the method of separation variables and noticing the symmetry of the configuration
with respect to they axis [70,72], it can be shown that the general solution of Eqs. 5.11 is given
101
by
Ak(x, y) =
∫ ∞
0
[Mk(τ)eγky + Nk(τ)e−γky
]cos(τx)dτ, (5.12)
where
γk = (τ 2 + ζ2k)1/2,
ζk =√
jωµkσk. (5.13)
To solve vector potentials in the whole problem space, there are2(n+2) unknownMk’s and
Nk’s in Eq. 5.12. In order to obtain those coefficients, we need to apply boundary conditions at
different medium interfaces. Since the normal component of the flux density and the tangential
component of the field intensity are continuous, we obtain that for the boundary between the
substrate layerk andk + 1
Bk,y = Bk+1,y,
1
µk
Bk,x =1
µk+1
Bk+1,x. (5.14)
SinceB = 5×A and only the z-component ofA is nonzero, by using Eq. 5.12, thex and
y components of the magnetic flux density will be
Bk,x =
∫ ∞
0
[Mke
γky −Nke−γky
]γkcos(τx)dτ,
Bk,y =
∫ ∞
0
[Mke
γky + Nke−γky
]τsin(τx)dτ. (5.15)
By employing the boundary conditions in Eqs. 5.14, the coefficients of different substrate
102
layers can be shown to have the following relationship [73]
Mk+1
Nk+1
=
1
2
(1 + λk)e−αk (1− λk)e
−βk
(1− λk)e+βk (1 + λk)e
+αk
Mk
Nk
,
where
λk =µk+1
µk
· γk
γk+1
,
αk = (γk+1 − γk) · yk,
βk = (γk+1 + γk) · yk, (5.16)
andyk =∑k
i=1 tk are they coordinates of different interfaces.
Furthermore, by matching the magnetic flux generated by a current filament in free space,
the coefficientM0 can be obtained as
M0(τ) =µ0I
2π· e−hτ
τ. (5.17)
Also noticing that normally there is a ground plane underneath the substrate and fory →
−∞, the field must vanish, we get
Nn+1 = 0. (5.18)
So we haven + 1 interfaces and hence2(n + 1) boundary conditions to uniquely determine all
the rest2(n + 1) unknown coefficients in Eq. 5.12 by using Eq. 5.16.
Since our purpose is to study the substrate effects on interconnects, we are interested in the
vector potential in the region above substrate(k = 0). The solution of the vector potential in
103
this region can be shown to have the following general form
A0 =µ0I
2π
∫[e−τ |y−h|
τ− Γ(τ)
e−τ(y+h)
τ]cos(τx)dτ (5.19)
Γ(τ) is known afterMk’s andNk’s are obtained using the above method.
5.2.3 Complex Image Theory and Its Application
It is observed that the integral in the analytic solution ofA0 in Eq. 5.19 has two terms. The first
term can be attributed to the currentJf following within the filament. The second term can be
attributed to the induced substrate eddy currents [66]. So the vector potential can be written as
A0(x, y) = Af0 −Ae
0, (5.20)
where
Af0 =
µ0I
2π
∫e−τ |y−h|
τcos(τx)dτ, (5.21)
Ae0 =
µ0I
2π
∫Γ(τ)
e−τ(y+h)
τcos(τx)dτ
=µ0I
2π
∫Γ(τ)eτd e−τ(y+h+d)
τcos(τx)dτ. (5.22)
The similarity between these two terms suggests that eddy currents induced in the substrate
may be treated as an image filament current flowing aty = −(h + d) in the opposite direction.
This approximation holds when the coefficientΓ(τ)eτd is approximated by constant one. The
Taylor expansion ofΓ(τ)eτd at τ = 0 is given by
Γ(τ)eτd = Γ(0) + [Γ′(0) + Γ(0)d]τ + O(τ 2). (5.23)
104
Furthermore, by using symbolic mathematic tools, such as MathcadR©, to solveΓ(τ), one
can easily verify that
Γ(0) = 1. (5.24)
By preserving the first moment in Eq. 5.23,Γ(τ)eτd can be approximated by constant one
when
d = −Γ′(0). (5.25)
Therefore the multilayer substrate can now be substituted by a single image filament below
its corresponding physical filament with distanced + 2h, which is called the effective complex
distance (ECD). It is easy to show that ECD is uniquely determined by the substrate process pa-
rameters and the extraction frequency. One can use MathcadR© to solve ECD when the substrate
includes many layers.
5.3 Eddy-Current-Aware PEEC model: EPEEC
We have shown that the effect of a lossy multilayer substrate can be approximated by image
conductors, given currents in those conductors are evenly distributed. However, due to skin and
proximity effects at high frequencies, conductor segments have to be discretized into filaments
so as to account for the non-uniform current distribution [74] as shown in Fig. 5.2.
In order to calculate the total inductance for a particular filament, it’s necessary to com-
bine its physical and image filaments together [75]. After applying complex image theory, the
105
Physical
Conductors
Image
Conductors
d+2h
Figure 5.2: Eddy-current-aware PEEC model. Each conductor is further discretized to consider
the uneven distribution of currents.
effective complex inductance (ECI) between filamenti andj is given by
Lij = Lij − Lij′ . (5.26)
Lij is the inductance between the physical filamentsi andj and can be calculated by existing
close-form static inductance formulas, such as Hoer’s formula [76] and Grover’s formula [77].
Lij′ is the inductance between the physical filamenti and the image filamentj′.
Since the calculation ofLij′ depends on ECD, so thatLij′ will depend on frequency and
substrate parameters. Hoer’s formula can be accurately extended to calculate inductances of
rectangular filaments separated by complex distances.
Notice that although applying complex image theory doubles the computational complexity,
106
it will not increase the model size sinceLij′ is basically used to modify the value ofLij after
considering the lossy substrate effects.
The filament impedance matrixZ(ω)1 at frequencyω/2π can be expressed as follows
Z(ω) = RDC + jωL. (5.27)
L is the filament inductance matrix containingLij ’s by using Eq. 5.26.RDC is a diagonal
matrix including DC resistances of physical filaments.
5.3.1 EPEEC Interconnect Modeling Algorithm
For a complicated interconnect system, the number of passive elements will be huge if induc-
tance extraction is applied. Moreover, the discretization of conductors further increases the
model size. We will show that complex image theory can be easily combined with reluctance
extraction to generate compact interconnect models.
Most existing reluctance extraction tools are based on window selection algorithms [78,79].
Here we propose an extended window selection algorithm to handle both physical conductors
and their images.
We illustrate the algorithm in Table 5.1 by a simple example shown in Fig. 5.3. If the current
aggressor is conductor1, its neighboring conductors include3, 4, and5. Therefore, their image
conductors1′, 3′, 4′, and5′ are also included into the current neighboring group.
By using the extended window algorithm, we limit EPEEC to consider couplings within
1A little hat is used to distinguish the symbols for filaments from those for conductor segments.
107
BEGIN
For each conductor in the interconnect system
a. Applying a general window algorithm to select its neighboring
physical conductors;
b. Once one physical conductor is selected as a neighboring
conductor, its corresponding image is also selected.
END
Table 5.1: Extended Window Selection Algorithm.
neighboring conductor groups instead of the whole conductor system, and hence the computa-
tional complexity is significantly reduced.
For the neighboring group of conductori, assume it containsn segments and thekth con-
ductor is discretized intopk filaments, then the total number of filaments within the neighboring
conductor group will benf =∑n
k=1 pk. Let Zif (ω) ∈ Cnf×nf denote the filament impedance
matrix of this neighboring group with the consideration of substrate effects by using Eq. 5.27,
then
Zif (ω) · I i
f = V if , (5.28)
whereI if , V i
f ∈ Cnf are filament terminal current and voltage vectors, respectively.
Physically, a bundle of filaments within the same conductor segment can be treated as par-
allel branches. Merging parallel elements can be facilitated by using admittance instead of
impedance. To directly calculate the admittance of each conductor segment, assume the current
108
1
3'
2
2'
4'
4
5' 1'
3
5
Physical Conductors
Image
Conductors
d+2h
Figure 5.3: Extended window selection algorithm to simultaneously consider physical and im-
age conductors.
aggressor is conductori, we simultaneously set voltages along all itspi filaments to one while
others inV if to zero. The physical meaning of the current distributionI i
f by solving Eq. 5.28
is that: the summation of all the filament currents within the aggressor is the aggressor admit-
tance, while the summation of currents within one victim is the coupling admittance between
the aggressor and that victim.
Those obtained admittance values are composed of two parts
yij = gij + jxij, (5.29)
wheregij is the conductance andxij is the susceptance. Obviously, if we model each conduc-
tor segment as serially connected resistance and reluctance, the equivalent resistancerij and
109
reluctancekij can be synthesized as
rij =gij
g2ij + x2
ij
,
kij =(g2
ij + x2ij)
ωxij
. (5.30)
The detailed EPEEC interconnect modeling algorithm is summarized in Table 5.2.
5.3.2 SPICE Compatible Reluctance Realization
After constructing the resistance matrixR and the reluctance matrixK using the algorithm
in Table 5.2, circuit simulation is required to analyze those models. Unfortunately, traditional
circuit analysis tools cannot handle reluctance directly. Although [58] and [79] incorporate
the capability to simulate reluctance, significant modifications to traditional analysis tools are
inevitable. In this subsection, we present a reluctance realization algorithm to directly convert
reluctance to its mathematically and electrically equivalent circuit model, which only contains
self inductances and voltage control voltage sources (VCVS) [80].
For a general circuit containing reluctances, the branch equation of self and mutual reluc-
tances is given by
Ii =n∑
j=1
KijVj = KiiVi +n∑
j=1,j 6=i
KijVj (5.31)
whereKii is self reluctance andKij is the mutual reluctance betweenKii andKjj. By rear-
ranging the terms in Eq. 5.31, it can be written as:
Vi =1
Kii
Ii −n∑
j=1,j 6=i
Kij
Kii
Vj (5.32)
110
iiK jjK
ijK
iiK/1 jjK/1
iiij KK /
|+
|+
jjij KK /
iV
+
-
+
-
jV
jV iV
Figure 5.4: SPICE compatible model for reluctance. The original reluctance element is substi-
tuted by serial self inductance and VCVSs.
If we take1/Kii as a self inductance, the original voltage drop across the self reluctanceKii
can be viewed as the combination of the voltage drops across that inductance and some VCVSs.
These serial VCVSs are controlled by voltages on other self reluctances which are originally
coupled with the reluctanceKii. The gains of those VCVSs are determined byKij/Kii.
Therefore, Eq. 5.32 can be used to construct the SPICE compatible model for reluctances
shown in Fig. 5.4. The detailed reluctance realization algorithm is presented in Table 5.3. It
can be either combined into an extraction tool or programmed as a post-extraction software.
5.4 Experimental Results
Extensive experimental results are reported to show the efficiency and accuracy of our new
interconnect modeling approach EPEEC. All tests are run on a Pentium IV1.4GHz machine
111
with 768MB memory.
5.4.1 EPEEC Model Validation
To validate the new modeling approach and to illustrate the accuracy, we first compare in-
ductance and resistance values computed by complex image theory using Eq. 5.26 with Fas-
tHenry [74] and a more rigorous full-wave electromagnetic analysis tool, SonnetR©.
The experimental objects are two parallel conductor segments in a power/ground (P/G) net-
work in metal layer 6. They are made of copper with conductivity5.8× 107S/m. Both of them
are90µm long,1.2µm thick, and26µm wide. They are separated60µm apart. The substrate is
composed of two layers. The upper layer has conductivityσ1 = 100S/m while the lower layer
conductivityσ2 = 10000S/m. The thickness of the upper layer is20µm and the lower layer
100µm. The top area of the substrate is1cm×1cm. The distance between the substrate surface
and the bottom of these conductors is5.481µm. Underneath the substrate, there is a ground
plane. The detailed test configuration is shown in Fig. 5.5.
The self inductance of one wire is calculated and compared. Up to20GHz, EPEEC gives
inductance values that are very close to full-wave simulation results (within1.5% error) and
shows over100X speedup compared to FastHenry and SonnetR©.
5.4.2 Substrate Effects
As shown in Fig. 5.6, interconnect models are essentially frequency dependent. Besides fre-
quency, many other factors may also affect inductance and resistance values, such as conductor-
112
Substrate
20
100
60
90
261.2
1000
Gro
und
Pla
ne
Figure 5.5: Test configuration: two parallel copper interconnects above a two-layer substrate
(Length unit:µm).
substrate distance, substrate conductivity, and substrate thickness. The next set of experiments
investigates how those factors can impact parasitic values.
In order to minimize the skin and proximity effects, we select two thin signal lines in metal
layer 6. Both of them are90µm long,1.2µm thick, and0.6µm wide. They are separated1.2µm
apart. The substrate has the same configuration as the above test. Without considering the
substrate, i.e. in free space (PEEC), the self inductance and resistance are91.95pH and2.16Ω
respectively.
First, we discuss the substrate effect under different frequencies and with different conductor-
substrate distances. Figs. 5.7 and 5.8 show that the substrate effect becomes more evident under
higher frequencies and when conductors are getting closer to the substrate. The increased in-
ductive and ohmic substrate losses are expressed by smaller inductance and larger resistance
values. At10GHz and with conductor and substrate separated by10µm, the inductance value
becomes85.14pH which shows8% deviation from the value calculated in free space.
113
5 10 15 20
39.5
40.0
40.5
41.0
41.5
42.0
42.5
43.0
43.5
44.0
44.5
EPEECFastHenrySonnet
Se
lf In
du
cta
nce
(p
H)
Frequency (GHz)
Figure 5.6: Self inductance comparison by using three different extraction tools: FastHenry,
SonnetR©, and EPEEC.
Second, we show the impact of substrate conductivities of different layers. For a multilayer
substrate in real design, the upper layer is usually less conductive in order to facilitate the
functionality of the on-chip analog circuitry. Low conductivity prevents the generation of large
eddy currents in the upper layer. However, since low conductivity medium has large skin depth,
the electromagnetic field can easily penetrate through the upper layer to reach lower layers and
hence lower layers may have more significant effects on interconnect values.
We set the conductor-substrate distance to5.48µm at10GHz. To fairly compare two layers,
they are both set to50µm thick. From Fig. 5.9, one can see that the upper layer will have large
114
0
5
10
15
20 010
2030
4050
84
85
86
87
88
89
90
91
Distance (um)Frequency (GHz)
Sel
f Ind
ucta
nce
(pH
)
Figure 5.7: Self inductance decreases as frequency increases and conductor-substrate distance
decreases.
impact compared to the lower layer when two layers have the same conductivity. However,
if the upper layer conductivity is small, the low layer effect also cannot be ignored. When
σ1 = 200S/m, the upper layer has skin depth355.88µm which is much larger than its thickness.
In this scenario, ifσ2 = 1000S/m, the inductance value will be91.45pH. While changingσ2
to 10000S/m gives the inductance value87.05pH, which is5.1% different from the previous
value.
Therefore, although the upper layer normally has low conductivity, it determines to what
extent the lower layers affect interconnects. In the case that the upper layer thickness is smaller
115
0
5
10
15
20 010
2030
4050
2.15
2.2
2.25
2.3
2.35
2.4
2.45
2.5
2.55
Distance (um)Frequency (GHz)
Res
ista
nce
(Ω)
Figure 5.8: Resistance increases as frequency increases and conductor-substrate distance de-
creases.
than its skin depth, one cannot simply discard the effects from lower layers. To proof this, we
set the upper layer and the low layer conductivity to100S/m and10000S/m respectively, and
then we gradually increase the upper layer thickness to see what will happen on line parameters.
From Figs. 5.10, one can see that at a specific frequency, when the upper layer thickness
grows over its skin depth, increasing its thickness will not have further effects on interconnects.
In this experiment, since the upper layer has low conductivity, when the interaction between
interconnects and the low substrate layer is blocked by a thick upper layer, the overall substrate
116
0
2000
4000
6000
8000
10000
0
2000
4000
6000
8000
10000
82
84
86
88
90
92
Upper Layer Conductivity (S/m)Lower Layer Conductivity (S/m)
Sel
f Ind
ucta
nce
(pH
)
Figure 5.9: With the same conductivity, the upper layer substrate will have larger effect than the
lower layer. However, the lower layer cannot be ignored when the thickness of the upper layer
is less than its skin depth.
effect becomes insignificant.
5.4.3 Inductance vs. Reluctance
The next set of experiments is run to show the computational complexity and model size of
EPEEC compared to PEEC. The testing conductor system includes 604 conductor segments
which are in a P/G network located within metal layer 7 and 6. The substrate configuration is
the same as previous tests.
117
0
5
10
15
20
0
200
400
600
800
100084
85
86
87
88
89
90
91
92
Frequency (GHz)Upper Layer Thickness (µm)
Sel
f Ind
ucta
nce
(pH
)
Figure 5.10: Self inductance saturates when the thickness of the upper layer grows over its skin
depth.
As shown in Table 5.4, PEEC and EPEEC-L2 have identical model size, while the extraction
time of EPEEC-L is roughly doubled since we need to calculate inductances for both physical
and image conductors in Eq. 5.26. However, the model size and extraction time of the EPEEC-
R is greatly reduced. For larger interconnect systems, the improvement will be even more
significant.
Since substrate affects values of passive elements in the EPEEC model, it impacts the tran-
2EPEEC-L means we apply complex image theory while extracting inductance. EPEEC-R is obtained by
extracting reluctance using the algorithm given in Table 5.2.
118
1.0 1.2 1.4
0.8
0.9
1.0
1.1
1.2 V
oltage(V
)
Time(nS)
PEEC EPEEC-L EPEEC-R
Figure 5.11: Waveforms of transient responses by using different interconnect models: PEEC,
EPEEC-L, and EPEEC-R.
sient responses which are critical for timing and signal integrity analysis. To compare different
responses at different frequencies by using PEEC, EPEEC-L, and EPEEC-R models, we ran-
domly select one node in the above P/G network. Since PEEC model does not consider the
substrate, it only depends on geometries of conductors and is frequency independent. However,
at high frequencies, ignoring substrate will lead to significant errors in the transient response.
As shown in Fig. 5.11, at20GHz, the waveforms of PEEC and EPEEC-L exhibit about10%
difference, which is intolerable for the present interconnect modeling accuracy requirement.
119
On the contrary, the reluctance model EPEEC-R demonstrates much smaller model size while
maintaining less than1.5% error compared to EPEEC-L.
120
INPUT: An interconnect system includingn conductor segments;
Extraction frequencyf ;
Substrate parametersµk andσk.
OUTPUT: Resistance matrixR; Reluctance matrixK.
BEGIN
I. Discretize all conductor segments according to their geometries
and the extraction frequencyf .
II. For each conductori in the interconnect system, do the following:
a. Search its neighboring conductorsΥi by adopting the extended
window algorithm;
c. Calculate the filament impedance matrixZif with the
consideration of multilayer substrate effects by using Eq. 5.27;
d. Set entries in the voltage vectorV if corresponding to filaments
in conductori to one while others to zero;
e. Obtain the filament current distributionI if by solving Eq. 5.28;
f. The self admittance of conductori equals the sum of filament
currents within conductori; the summation of filament currents
in conductorj is the coupling between conductori andj;
g. Synthesize admittance into serial resistance and reluctance by
applying Eq. 5.30.
f. Stamp those values into parasitic matricesR andK respectively.
END
Table 5.2: EPEEC Interconnect Modeling Algorithm.
121
BEGIN
For each reluctanceKii between nodeni andnj in a given circuit
a. q=0;
b. Let nqi =ni;
c. For each self-reluctanceKjj that has mutual reluctanceKij
with self-reluctanceKii:
Connect one VCVS controlled byVj with gain−Kij/Kii
betweennqi andnq+1
i ;
q=q+1.
d. Connect inductance1/Kii betweennqi andnj.
END
Table 5.3: Reluctance Realization Algorithm.
Extraction Time Number of Passive Elements
PEEC 38.162s 92,639
EPEEC-L 91.547s 92,639
EPEEC-R 4.094s 2,794
Table 5.4: Extraction Time and Model Size Comparison.
122
Chapter 6
Conclusion
Moore’s law has being described the growth of the semiconductor industry for more that 35
years. The aggressive scaling of integrated circuits relies on an integration of the inter-layer
dielectric and metal layers. At the same time, the industry trend of integrating higher levels
of circuit functionality on one chip and process induced variations which directly impact the
geometry of on-chip interconnects has made the structure and hence the modeling of VLSI
interconnect more and more complicated.
This thesis presents some progress in the area of interconnect parasitic extraction and inter-
connect process variation modeling. First, this thesis presents ICCAP, a fast 3-D capacitance
extraction algorithm. ICCAP proposes a novel technique for sparsifying and reordering the
potential coefficient matrix. The sparse transformation is performed by simply switching ba-
sis from leaf panels to a new set of panels, thus cost-efficient preconditioners can be easily
constructed and hence greatly speedup iterative matrix solvers.
123
To take the process variation into consideration, this thesis presents a fast mask image simu-
lation algorithm LithSim to model the interconnect geometry variation introduced in the optical
lithography process. Then an efficient methodology StatCap for generating explicit statistical
representations of parasitic capacitances is proposed. StatCap applies principle factor analysis
to reduce the number of random variables while preserving the dominant global/local factors
that induce the conductor surface fluctuation due to process variations. The obtained quadratic
form can not only be used to directly generate parasitic capacitance probability distribution to
locate design corners, but it is also fully compatible with statistical model order reduction and
statistical timing analysis tools.
Finally, to model the inductive effects, we propose new frequency dependent interconnect
models, EPEEC, which considers lossy substrate by using complex image theory. EPEEC
is reluctance-based and is obtained by combining complex image theory with an extended
window-based reluctance extraction algorithm. Extensive simulation results demonstrate that
EPEEC have extremely high accuracy and result in significantly small model size.
We hope that by transferring those proposed algorithms into the realm of production, these
building blocks serve the goal of design for manufacturability in the state-of-the-art VLSI cir-
cuits and can improve the fabrication yield and circuit efficiency in the long term.
Bibliography
[1] S. Balakrishnan, J. H. Park, H. Kim, Y.-M. Lee, and C. C.-P. Chen. Linear time hierarchical ca-
pacitance extraction without multipole expansion.International Conference on Computer Design,
pages 98–103, Sept 2001.
[2] M. W. Beattie and L. T. Pileggi. Error bounds for capacitance extraction via window techniques.
IEEE Trans. CAD, 18:311–321, March 1999.
[3] X. Cai, K. Nabors, and J. White. Efficient galerkin techniques for multipole-accelerated capacitance
extraction of 3-d structures with multiple dielectrics.Advanced Research in VLSI, pages 200–211,
March 1995.
[4] W. Hong, W.-K. Sun, Z.-H. Zhu, H. Ji, B. Song, and W.-M. Dai. A novel dimension-reduction
technique for the capacitance extraction of 3-d vlsi interconnects.IEEE Transactions on Microwave
Theory and Techniques, 46:1037–1044, Aug 1998.
[5] T. Lu, Z. Wang, and W. Yu;. Hierarchical block boundary-element method (hbbem): a fast field
solver for 3-d capacitance extraction.IEEE Transactions on Microwave Theory and Techniques,
52:10–19, Jan 2004.
124
125
[6] Y. Yanhong and P. Banerjee. A parallel implementation of a fast multipole based 3-d capacitance
extraction program on distributed memory multicomputers.Proceedings. 14th International Par-
allel and Distributed Processing Symposium, pages 323–330, May 2000.
[7] W. Yu and Z. Wang. Enhanced qmm-bem solver for three-dimensional multiple-dielectric capaci-
tance extraction within the finite domain.IEEE Transactions on Microwave Theory and Techniques,
52:560–566, Feb 2004.
[8] Z. Zhu and W. Hong. A generalized algorithm for the capacitance extraction of 3d vlsi intercon-
nects.IEEE Transactions on Microwave Theory and Techniques, 47:2027–2030, Oct 1999.
[9] B. Krauter, Xia Yu, A. Dengi, and L.T. Pileggi. A sparse image method for bem capacitance
extraction.Proc. DAC, pages 357–362, June 1996.
[10] T. Sometani, “Image method for a dielectric plate and a point charge,”IOP, 2000.
[11] E. Weber,Electromagnetic Fields. John Wiley & Sons, 1950.
[12] A. Balanis,Advanced Engineering Electromagnetics. John Wiley & Sons, 1989.
[13] M. Beattie and L. Pileggi. Electromagnetic parasitic extraction via a multipole method with hierar-
chical refinement.Proc. ICCAD, pages 437–444, 1999.
[14] K. Nabors and J. White. Fastcap: a multipole accelerated 3-d capacitance extraction program.IEEE
Trans. on CAD, pages 1447–1459, 1991.
[15] J. Tausch and J. White. A multiscale method for fast capacitance extraction.Proc. DAC, pages
537–542, 1999.
126
[16] W. Shi, J. Liu, N. Kakani, and T. Yu. A fast hierarchical algorithm for 3-d capacitance extraction.
IEEE Trans. on CAD, pages 330–336, 2002.
[17] S. Yan, V. Sarin, and W. Shi. Sparse transformations and preconditioners for hierarchical 3-d
capacitance extraction with multiple dielectrics.Proc. DAC, pages 788–793, 2004.
[18] J. R. Phillips and J. White. A precorrected fft method for capacitance extraction of complicated 3-d
structures.IEEE Trans. CAD, pages 1059–1072, 1997.
[19] S. Kapur and D. E. Long. Ies3: A fast integral equation solver for efficient 3-dimensional extraction.
Proc. ICCAD, pages 448–455, 1997.
[20] P. Wrschka, J. Hernandez, G. S. Oehrlein, and J. King. Chemical mechanical planarization of
copper damascene structures.Journal of The Electrochemical Society, pages 706–712, 2000.
[21] Peng Li, F. Liu, Xin Li, L. T. Pileggi, and S. R. Nassif. Modeling interconnect variability using
efficient parametric model order reduction.Design Automation and Test in Europe, pages 958–963,
2005.
[22] E. Chiprout and T. Nguyen. Survey of model reduction techniques for analysis of package and
interconnect models of high-speed designs.IEEE 6th Topical Meeting on Electrical Performance
of Electronic Packaging, pages 251–254, 1997.
[23] B. N. Sheehan. Branch merge reduction of rlcm networks.Proc. ICCAD, pages 658–664, 2003.
[24] Hongliang Chang and S. S. Sapatnekar. Statistical timing analysis considering spatial correlations
using a single pert-like traversal.Proc. ICCAD, pages 621–625, 2003.
127
[25] L. Daniel, C. S. Ong, S. C. Low, K. H. Lee, and J. White. A multiparameter moment matching
model reduction approach for generating geometrically parameterized interconnect performance
models. IEEE Trans. on Computer-Aided Design of Integrated Circuits and Systems, 23(5):678–
693, May 2004.
[26] X. Li, J. Le, P. Gopalakrishnan, and L. T. Pileggi. Asymptotic probability extraction for non-normal
distributions of circuit. pages 2–9, 2004.
[27] Zhenhai Zhu, Alper Demir, and Jacob White. A stochastic integral equation method for modeling
the rough surface effect on interconnect capacitance.Proc. ICCAD, pages 887–891, 2004.
[28] R. L Gorsuch.Factor Analysis. Hillsdale, NJ, 1974.
[29] B. V. Gnedenko.Theory of Probability. Gordon and Breach Science Publishers, 1997.
[30] A.M. Mathai and Serge B. Provost.Quadratic Forms in Random Variables: Theory and Applica-
tions. New York Marcel Dekker, 1992.
[31] M. Born and E. Wolf.Principles of Optics. New York: Pergamon, 1980.
[32] P. Ghosh, C. shin Kang, M. Sanie, and D. Pinto. New dfm approach abstracts altpsm lithography
requirements for sub-100nm ic design domains.Proceedings. Fourth International Symposium on
Quality Electronic Design, pages 131–137, March 2003.
[33] J. Gomes and L. Velho.Image processing for computer graphics. Springer, 1996.
[34] E. Hecht.Optics. Addison-Wesley, 1998.
[35] A. B. Kahng. and Y. C. Pati. Subwavelength lithography and its potential impact on design and
eda.Design Automation Conference, pages 799–804, June 1999.
128
[36] L. R. Harriott, “Limits of lithography,”Proceedings of the IEEE, vol. 89, pp. 366–374, March 2001.
[37] G. Pugh, J. Canning, and B. Roman, “Impact of high resolution lithography on ic mask design,”
Custom Integrated Circuits Conference, pp. 149–153, May 1998.
[38] T. Brunner, “Pushing the limits of lithography for ic production,”Electron Devices Meeting, pp.
9–13, Dec. 1997.
[39] M. Sasago, “Lithography solutions for sub-0.1 m generations,”VLSI Technology, pp. 6–9, June
1998.
[40] L. V. den Hove, A. M. Goethals, K. Ronse, M. V. Bavel, and G. Vandenberghe, “Lithography for
sub-90nm applications,”Electron Devices Meeting, pp. 3–8, Dec. 2002.
[41] W. W. Flack and G. E. Flores, “Lithographic manufacturing techniques for wafer scale integration,”
Wafer Scale Integration, pp. 4–13, Jan. 1992.
[42] D. R. Huston and W. Sauter, “Mask stretching for next generation lithography masks,”IEEE Trans-
actions on Semiconductor Manufacturing, vol. 14, pp. 214–217, Aug. 2001.
[43] M. D. Levenson, N. S. Viswanathan, and R. A. Simpson, “Improving resolution in photolithography
with a phase-shifting mask,”IEEE Transactions on Electron Devices, pp. 1828–1836, Dec. 1982.
[44] Y. Liu and A. Zakhor, “Binary and phase shifting mask design for optical lithography,”IEEE Trans-
actions on Semiconductor Manufacturing, vol. 5, pp. 138–152, May 1992.
[45] B. J. Lin, “Phase-shifting masks gain an edge,”Circuits and Devices Magazine, vol. 9, pp. 28–35,
March 1993.
129
[46] Y. Liu, A. Zakhor, and M. A. Zuniga, “Computer-aided phase shift mask design with reduced
complexity,”IEEE Transactions on Semiconductor Manufacturing, vol. 9, pp. 170–181, May 1996.
[47] Z. Li and H. Nakagawa, “Performance-driven opc for mask cost reduction,”Proceedings of the 41st
SICE Annual Conference, vol. 2, pp. 917–920, 2002.
[48] D. Lee and A. R. Neureuther.SPLAT v5.0 User’s Guide. University California Press, Mar 1995.
[49] Y. C. Pati, A. A. Ghazanfarian, and R. F. Pease. Exploiting structure in fast aerial image computa-
tion for integrated circuit patterns.IEEE Transactions on Semiconductor Manufacturing, 10:62–74,
Feb 1997.
[50] F. Schellenberg. A little light magic.IEEE Spectrum, 40:34–39, Sep 2003.
[51] A. K.-K. Wong. Resolution Enhancement Techniques in Optical Lithography. Spie Press, 2001.
[52] R. Panda, S. Sundareswaran, and D. Blaauw, “On the interaction of power distribution network with
substrate,”International Symposium on Low Power Electronics and Design, pp. 388–393, August
2001.
[53] Z. He, M. Celik, and L. T. Pileggi, “SPIE: Sparse partial inductance extraction,”Proceedings of
Design Automation Conference, pp. 137–140, June 1997.
[54] M. W. Beattie and L. T. Pileggi, “Inductance 101: Modeling and extraction,”Proceedings of Design
Automation Conference, pp. 323–328, June 2001.
[55] K. Gala, D. Blaauw, J. Wang, V. Zolotov, and M.Zhao, “Inductance 101: Analysis and design
issues,”Proceedings of Design Automation Conference, pp. 329–334, June 2001.
130
[56] A. E. Ruehli, “Inductance calculatioin in a complex integrated circuit environment,”IBM Journal
of Research and Development, September 1972.
[57] A. Devgan, H. Ji, and W. Dai, “How to efficiently capture on-chip inductance effects:introducing
a new circuit element k,”IEEE/ACM International Conference on Computer Aided Design, pp.
150–155, November 2000.
[58] H. Ji, A. Devgan, and W. Dai, “KSIM: A stable and efficient rkc simulator for capturing on-chip
inductance effect,”Proceedings of Asia and South Pacific Design Automation Conference, pp. 379–
384, January 2001.
[59] L. M. Silveira and N. Vargas, “Characterizing substrate coupling in deep-submicron designs,”IEEE
Design and Test of Computers, vol. 19, pp. 4–15, March 2002.
[60] R. Gharpurey and R. G. Meyer, “Modeling and analysis of substrate coupling in integrated circuits,”
IEEE Journal of Solid-State Circuits, vol. 31, pp. 344–353, March 1996.
[61] B. R. Stanisic, N. K. Verghese, R. A. Rutenbar, L. R. Carley, and D. J. Allstot, “Address substrate
coupling in mixed-mode ic’s simulation and power distribution synthesis,”IEEE Journal of Solid-
State Circuits, vol. 29, pp. 226–238, March 1994.
[62] Y. Massoud and J. White, “Simulation and modeling of the effect of substrate conductivity on
coupling inductance,”IEEE Transactions on Very Large Scale Integration (VLSI) Systems, pp. 286–
291, June 2002.
[63] M. Liu, T. Yu, and W.-M. Dai, “Fast 3-d inductance extraction in lossy multi-layer substrate,”/ACM
International Conference on Computer Aided Design, pp. 424–429, November 2001.
131
[64] H. Ymeri, B. Nauwelaers, K. Maex, S. Vandenberghe, and D. D. Roest, “New analytic expressions
for mutual inductance and resistance of coupled interconnects on lossy silicon substrate,”Digest of
Silicon Monolithic Integrated Circuits in RF Systems, pp. 192–200, September 2001.
[65] T.-H. Chen, C. Luk, H. Kim, and C. C.-P. Chen, “SuPREME: Substrate and power-delivery
reluctance-enhanced macromodel evaluation,”International Conference on Computer Aided De-
sign, pp. 786–792, November 2003.
[66] P. R. Bannister, “Applications of complex image theory,”Radio Science, vol. 21, no. 4, pp. 605–616,
August 1986.
[67] R. Jiang and C. C.-P. Chen, “ESPRIT: A compact reluctance based interconnect model considering
lossy substrate eddy current,”IEEE MTT-S International Microwave Symposium Digest, vol. 3, pp.
1385–1388, June 2004.
[68] A. Weisshaar, H. Lan, and A. Luoh, “Accurate closed-form expressions for the frequency-
dependent line papameters of coupled on-chip interconnects on lossy silicon substrate,”IEEE
Transactions on Advanced Packaging, vol. 25, pp. 288–296, May 2002.
[69] D. Melendy and A. Weusshaar, “A new scalable model for spiral inductors on lossy silicon sub-
strate,” IEEE MTT-S International Microwave Symposium Digest, vol. 2, pp. 1007–1010, June
2003.
[70] J. A. Tegopoulos and E. E. Kriezis,Eddy Currents in Linear Conducting Media. Elsevier Publi-
cations, 1985.
[71] R. L. Stoll,The Analysis of Eddy Currents.Oxford, U.K. Clarendon.
132
[72] M. N. O. Sadiku,Numerical Techniques in Electromagnetics. CRC Publications, 2001.
[73] A. M. Niknejad and R. G. Meyer, “Analysis of eddy-current losses over conductive substrates with
applications to monolithic inductors and transformers,”IEEE Transactions on Microwave Theory
and Techniques, vol. 49, pp. 166–176, January 2001.
[74] M. Kamon, M. J. Tsuk, and J. K. White, “FastHenry: A multipole-accelerated 3-d inductance
extraction program,”IEEE Transactions on Microwave Theory and Techniques, vol. 42, pp. 1750–
1758, September 1994.
[75] A. Weisshaar and H. Lan, “Accurate closed-form expressions for the frequency-dependent line
papameters of on-chip interconnects on lossy silicon substrate,”IEEE MTT-S International Mi-
crowave Symposium Digest, vol. 3, pp. 1753–1756, May 2001.
[76] C. Hoer and C. Love, “Exact inductance equations for rectangular conductors with applications to
more complicated geomotries,”J. Res. Nat. Bureau of Standards, April 1965.
[77] F. W. Grover,Inductance calculations: Working Formulas and Tables. Dover Publications, 1946.
[78] G. Zhong, C.-K. Koh, V. Balakrishnan, and K. Roy, “An adaptive window-based susceptance ex-
traction and its efficient implementation,”Proceedings of Design Automation Conference, pp. 728–
731, June 2003.
[79] T.-H. Chen, C. Luk, H. Kim, and C. C.-P. Chen, “INDUCTWISE: Inductance-wise interconnect
simulator and extractor,”IEEE Transactions on Computer-Aided Design of Integrated Circuits and
Systems, vol. 22, pp. 884–894, July 2003.