Overview of numerical problems and implementation for circuit 3. …ece570/session10_2.pdf ·...

Session 10 – Par t 2 ECE 570 - Computer Aided Engineer ing for Integrated Circuits - IC 752-E

Solution Techniques for Sparse L inear Systems

�� Overview of numerical problems and implementation for circuit simulation

�� 1. Definition of sparse system 2. Problem of fill-ins

3. Sparsity and its preservation 4. Sparse matr ix techniques literature 5. Discussion of numer ical er rors

6. Factor ization and pivoting in sparse systems 7. Pivoting for sparsity and accuracy 8. I terative methods 9. Implementation issues – stor ing sparse systems.

1. Definition of Sparse System Observation: In typical circuit equations generated using the MNA there are ~3 elements per row, which are 0≠≠≠≠ .

Example 1: Assume: 310n ==== (small circuit) Solving Ax b==== using traditional methods requires:

1o Operation count ~ O(n3)=109 flops In 10 Mflop computer this will take 102 seconds ~ 1.7 min. per iteration

2o Memory: n2 entr ies 106 locations

Exploiting the sparsity one can achieve

1o Operation count ~ O(n1.2~1.5) Improvement in comparison to the dense matrix ratio of operation counts:

(((( )))) (((( ))))3 1.2~1.53 3 5.4 4.510 / 10 10 ~ 10====

2o Memory : ~ O(n) ≡≡≡≡ 103 locations Improvement in comparison to the dense matrix ratio of operation counts:

(((( )))) (((( ))))23 3 310 / 10 10====

EXAMPLE 2: Assume : n = 104 (LSI)

Requirement to solve Ax b====

1o Operation count: ~O(n3) = 1012 2o Memory : ~ n2 locations = 108 locations

Exploiting the sparsity:

1o operation count: ~ O(n1.2~1.5) Improvement in comparison to the dense matrix:

(((( )))) (((( ))))3 1.2~1.54 4 7.2 610 / 10 10 ~ 10====

2o Memory: ~ O(n) = 104 locations

Improvement in comparison to the dense matrix: (((( )))) (((( ))))24 4 410 / 10 10==== .

SPARSE MATRIX

There is no precise definition. Definitions encountered in the literature:

1o Using limiting concept A matr ix of order n is sparse if the number of elements ≠≠≠≠ 0 is propor tional to n for n sufficiently large Theoretical, useful in developing mathematical theory.

2o Special structure matr ices (circuits) A matr ix is sparse if the number of nonzero entr ies per row is fixed (independent of n) typically # = 2 ~ 10.

3o Alternative definition A matr ix is sparse if # of nonzero entr ies is n1+γγγγ; γγγγ < 1

typically γγγγ = 0.2 ~ 0.5

4o Practical definition A is sparse if:

a) A – is large b) Most of entr ies are zero (minimum 90%)

and then it pays to exploit the sparsity. The exploitation is not cheap!

In other words: a matr ix is sparse when it is wor thwhile to take advantage of existence of (many) zero entr ies.

Leading concepts

A) Store only the nonzeros

B) Operate only on the nonzeros

C) Preserve the sparsity dur ing the computation (decomposition)

Note: C) is a crucial requirement (perhaps the most crucial) in the elimination process PAQ LU==== .

We want factors LU to be also sparse.

IMPLEMENTATION ISSUES

2. Problem of fill-ins The process of LU factor ization causes so called fill-ins (generation of new nonzero entr ies)

( 1) ( ) ( ) ( ) ( )( / )k k k k kij ij ik kk kja a a a a++++ = −= −= −= −

PIVOT Example

X X X X X X X X X X

x x x x

X X Decomposition x X x x x

X X x x X x x

X X x x x X x

X X x x x x X

� � � �� ⊗⊗⊗⊗� � � ��

�� ⊗⊗⊗⊗� � � �� ⊗⊗⊗⊗� � � �� ⊗⊗⊗⊗� � � ��

� ��

� � � ��

STAGE k

STAGE k+1 Pivot

Possible better solution:

1) Swap the first and last columns X X X X X

� ��

��

� � ��

2) Swap the first and last rows

X X X X

X X Decomposit ion X X

X X X X

X X X X X X X X X

� � � ��

�� ⊗⊗⊗⊗� � � ��

� � � ��

� ��

4. Preserving the sparsity

The above example illustrate that one can control the number of fill-ins, which show that is possible to preserve the sparsity

Sad fact

I t has been shown that finding a permutation of matr ix A, which minimizes the # of fill-ins is NP-complete, i.e. the worst case complexity of an algor ithm is 2q where q is the number of non-zero elements of A.

Example: for q=40, 2q > 1012 !!

Conclusion: No efficient general algor ithms to solve this problem are known. There are heur istic algor ithms used to reduce the number of fill-ins.

Most commonly used and quite successful is MARKOWITZ Algor ithm.

To introduce the algor ithm it is necessary to define a quantity called Markowitz measure of fill-ins in one stage of elimination process.

MARKOWITZ measure, ( )( )kijf a , of fill-in at the stage k for the element ( )k

(which is a candidate for a pivot) is defined as follows

( )( ) ( 1)( 1)def

kij i jf a r c= − −= − −= − −= − − ; 0k

ija ≠≠≠≠

ir - Number of nonzero elements in row “ i ” jc - Number of nonzero elements in col. “ j ”

( )( )kijf a = maximum possible number of fill-ins created by choosing ( )k

ija as

a pivot.

Example

11 12 13 14 15

21 22( )

42 43 44

52 54 55

0 0 0 1

1 ( 1 4 2 2 1 )

a a a a a

−−−−

� �� ====� ��

−−−−

MARKOWITZ measures 4 16 8 8 4

1 4 0 0 0

0 4 2 0 0

0 8 4 4 0

0 8 0 4 2

� ��

==== � ��

Using the measures we see that 21a - is clear ly indicated as an element for pivoting

Thus, since we want to minimize the fill-ins we select 21a as the pivot and obtain:

21 22(2)12 13 14 15

( 1)32 33

42 43 44

52 54 55

0 0 0 10 30 0 0 1

20 020 0

a a ra a a a

� � −� �� =� ��

1 3 2 2 1c −

TIE(OCCURS OFTEN)

BREAKING THE TIES

“ SPICE” breaks ties by selecting the element with minimum column count ( 55a in the example). I f ties still occur , choice is arbitrary (any element in a set after such selection is acceptable).

Example (continued) Selecting 55a and swapping (row 3& 5, col. 2& 5) yields

55 54 52 55

33 32 33 32

43 44 42 43 44 42(2) (3) (3)

15 13 14 12 13 4 12

0 0 0 0

a a a a

a a a a a a

a a a a a a a

��

Active matr ix in (3)A :

43 44 42(3) (3)

13 14 12

��

2 1 21c− 1r−

NOTE: Column count selects 44a , (3)

14athen the selection is arbitrary

MARKOWITZ ALGORITHM – SUMMARY

1o At each stage (k) of decomposition select a set d of elements ( ( )kpsa ) with

minimum MARKOWITZ measure ( ) ( )( ) min ( )k k

ps ijk i nk j n

f a f a≤ ≤≤ ≤≤ ≤≤ ≤≤ ≤≤ ≤≤ ≤≤ ≤

2o I f d contains one element, then ( ) pivotk

psa →→→→

otherwise, select a subset e of elements with minimum column count: sc .

I f e contains one element ( ( )l l

kp sa ) only then

(((( )))) pivotl l

kp sa →→→→

Otherwise select any element in the subset e can be a pivot.

Logical choice: select an element with the largest magnitude.

Comments Concerning The Markowitz Algor ithm

1. Simple and easy to implement – advantage

2. Local – minimizes fill-ins in one stage only. Minimization over several stages might produce fewer fill-ins! Markowitz minimization is not global.

Local character of the algor ithm – disadvantage.

Note:stability refered here is related to accuracy, which is not controlled in the Markowitz algor ithm

3. Stability is not considered – minimization is per formed without consideration of er rors – disadvantage. Stability here is related to accuracy and round-off er ror propagation, which are not controlled.

4. Modifications of the algor ithm are many. Versions especially suitable for circuit analysis will be discussed in conduction with the for thcoming simplified analysis of round-off er rors.

Markowitz algor ithm preserves symmetry, which means that the same order ing is obtained for either A or TA - advantage (symmetry is effected by breaking of ties and

threshold pivoting – discussed later )

5. SPARSE MATRIX TECHNIQUES LITERATURE

Many publications are available. Sample of references in the book form:

1. D.J.Evans (Ed.), Sparsity And I ts Applications, Cambr idge Univ. Press London, 1985

2. I .S. Duff (Ed), Sparse Matr ices And Their Uses, Academic Press, N.York 1981

3. S.Pissanetsky, Sparse Matr ix Technology, Academic Press, N.York 1984

Also see the chapter by K. Kunder t on sparse matr ix techniques in the book “ Advances in CAD for VLSI ,” Ser ies by Nor th-Holland., Vol. 3: A. Ruehli, Ed., Circuit Analysis, Simulation and Design Par t 1: General Aspects of Circuit Analysis and Design; Par t 2: VLSI Circuit Analysis and Simulation.

5. Discussion of numer ical er rors - round off er rors (Elementary)

Definitions: �� - Stands for basic operations:

, , ,+ − × ÷+ − × ÷+ − × ÷+ − × ÷ a b�� - Exact ar ithmetic, no round-off er ror .

( )fl a b�� - Machine ar ithmetic, result with er ror Simple model of error in floating pt. operation

( ) ( )(1 )df

fl a b a b εεεε� � � � = += += += +� � � � � � � �

� ��

where εεεε is a round-off er ror .

NOTE: εεεε - is bounded by the machine er ror Mεεεε such that Mε εε εε εε ε≤≤≤≤ For a CPU with a t-bit word length and rounding-off after flop we have

2 tMεεεε −−−−====

A single precision ar ithmetic is assumed in this discussion.

Typical operation in the decomposition process: ( 1) ( ) ( ) ( ) ( )( )k k k k k

ij ij ik kk kja a a a a++++ = −= −= −= −

We shall use a simplified notation ( / )a b c p d= −= −= −= − NOTE: In view of the introduced notation, the above is exact (theoretical). Numer ical result (a ) (((( )))){{{{ }}}}/a fl b fl fl c p d= − ⋅= − ⋅= − ⋅= − ⋅� � � � � � � �

Applying (sequentially) the er ror model

1-rst flop: 1(1 )c

a fl b fl dp

εεεε� ��

= − += − += − += − +� ��

2-nd flop: 1 2 (1 ) (1 )c

fl b dp

ε εε εε εε ε� �� = − + += − + += − + += − + +� ��

� � � ��

3-st flop: 1 2 3 (1 ) (1 ) (1 )c

ε ε εε ε εε ε εε ε ε� �� = − + + += − + + += − + + += − + + +� ��

1 2 3(1 )(1 ) (1 )c

a b dp

ε ε εε ε εε ε εε ε ε� � � � = − + + += − + + += − + + += − + + +� � � � � � � �

; NOTE: i Mε εε εε εε ε≤≤≤≤ (single precision ar ithmetic).

p – stands for pivot

1-rst order (linear) analysis

(((( ))))2 1 1 2 3(1 ) 1c

a b dp

ε ε ε ε εε ε ε ε εε ε ε ε εε ε ε ε ε� � � � = − + + + += − + + + += − + + + += − + + + +� � � � � � � �

(((( ))))1 2 3(1 ) 1c

a b dp

ε ε εε ε εε ε εε ε ε� � � � ≅ − + + +≅ − + + +≅ − + + +≅ − + + +� � � � � � � �

(((( )))) (((( ))))(((( ))))

(((( ))))

3 1 2 3

3 1 1 3 2 2 3

ca b d

pε ε ε εε ε ε εε ε ε εε ε ε ε

ε ε ε ε ε ε εε ε ε ε ε ε εε ε ε ε ε ε εε ε ε ε ε ε ε

= + − + + += + − + + += + − + + += + − + + +

+ + + + ++ + + + ++ + + + ++ + + + +

(((( )))) (((( ))))3 1 2 31 1c

a b dp

ε ε ε εε ε ε εε ε ε εε ε ε ε� � � � ≅ + − + + +≅ + − + + +≅ + − + + +≅ + − + + +� � � � � � � �

Global er ror of one operation

e a a= −= −= −= − Using the def. of exact result ( ( / )a b c p d= −= −= −= − )

(((( )))) (((( ))))3 1 2 31c c

e b d b dp p

ε ε ε εε ε ε εε ε ε εε ε ε ε= + − + + − += + − + + − += + − + + − += + − + + − +

(((( ))))3 1 2 3c

e b dp

ε ε ε εε ε ε εε ε ε εε ε ε ε= − + += − + += − + += − + +

Er ror bound ( i Mε εε εε εε ε≤≤≤≤ )

e b dp

ε εε εε εε ε≤ +≤ +≤ +≤ +

DISCUSSION OF ERROR BOUND

3M Mcd

ε εε εε εε ε≤ +≤ +≤ +≤ +

2 tMεεεε −−−−====

In the above p - is the element that can be controlled by pivoting strategy. For minimizing the er ror (preventing the error growth) one should choose largest pivoting elements. Usually conflicting with sparsity!

7. Pivoting and factor ization techniques

7.1. Par tial pivoting

Par tial pivoting consists of finding (((( ))))kika such that

(((( )))) (((( ))))max

, 1,...,

k kik jka a

j k k n

===== += += += +

and swaping suitable raws to br ing the element (((( ))))kika into a position of pivot.

Matr ix schematics (active matr ix is shown)

(((( )))) (((( )))) (((( ))))

(((( ))))

(((( )))) (((( ))))

, , 1 ,

. . . .

k k kk k k k k n

k ki k i n

k kn k n n

� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �

swap rows and i k

Another illustration

A(k) =0 i

k The effect of swapping is that

1 ; 1,ikm i k n≤ = +≤ = +≤ = +≤ = + �� Now the A matr ix will be well conditioned. This means that condition number is close to unity. Condition number for a matr ix “ A” :

1K A A−−−−= ⋅= ⋅= ⋅= ⋅

max ( )TA A eigen A A= == == == = , then max

λλλλλλλλ

7. 2. Complete pivoting Complete pivoting consists of finding the element (((( ))))

i ja such that

(((( )))) (((( ))))maxk kij pa a

====≤ ≤≤ ≤≤ ≤≤ ≤≤ ≤≤ ≤≤ ≤≤ ≤

��

and swaping suitable raws and columns to br ing the element (((( )))),k

i ja into

a position of pivot.

Matr ix illustration (active matr ix is shown)

swap columns j, k

(((( )))) (((( )))) (((( ))))

(((( ))))

(((( )))) (((( )))) (((( ))))

k k kk k k j k n

k k ki k i j i n

k k kn k n j n n

� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �

swap raws ,i k

Another illustration

A(k) =0

Pivoting changes the or iginal matr ix A to PAQ where P & Q are permutation matr ices. Thus we have to consider the resulting modification to the or iginal system.

Or iginal system Ax=b (* ) With pivoting we do not get A LU==== , but PAQ LU====

Therefore we transform (* ) into PAx Pb==== . Then since 1TQQ ==== , we wr ite

��T

PAQQ x Pb==== or explicitly

TLUQ x Pb==== Solution procedure

��

LU Q x Pb====

1* Ly Pb y ↵↵↵↵==== �� 2* Uz y z ↵↵↵↵==== ��

3* TQ x z==== or since: 1TQQ ====

��1

TQQ x Qz====

4* x Qz x==== ��

Scaling and Equilibration of Matr ices Scaling a matr ix

Motivation: Range of voltage 310 10 [ ]V−−−−→→→→

Range of cur rents: 3 1510 10 [ ]A− −− −− −− −→→→→ Multiplication of column j by jαααα is equivalent to replacing xj by

αααα====

Multiplication of row i by jββββ is equivalent to scaling the r ight side entry, ib , by the

factor , jββββ , i.e. replacing ib by ˆi i ib lββββ ====

Compact form of scaling

Define: (((( ))))(((( ))))

, , ,m

D diag

α α αα α αα α αα α αβ β ββ β ββ β ββ β β

��

Then scaling of x and b can be wr itten as

1 ˆx D x==== and 2b D b==== which turns Ax b==== into

1 ˆAD x b==== (column scaling)

��2 1 2

D AD x D b==== (row scaling)

or new (scaled) system: ˆ ˆˆAx b==== ; where 2 1A D AD====

Scaling may be used to equilibrate a matr ix. We want to have the entr ies of the matr ix to be of the same order (same size).

Matr ix equilibration

A matr ix is row equilibrated if max 1

≈≈≈≈

≤ ≤≤ ≤≤ ≤≤ ≤ for ∀∀∀∀ i

A matr ix is column equilibrated if max 1

≈≈≈≈

≤ ≤≤ ≤≤ ≤≤ ≤ for ∀∀∀∀ j

A matr ix is equilibrated if it is both row and column equilibrated. Sad observation: there is no unique way to equilibrate a matr ix! Example:

1 1 2 10

2 1 10

� � � � ⋅⋅⋅⋅� � � �

= −= −= −= −� � � � � � � � � � � � � � � �

a) Per forming column equilibration as a first operation, we get:

0.5 0.5 1

1 0.5 0.5

0.5 1 0

A� � � � � � � � = −= −= −= −� � � � � � � � � � � �

Note that A is also row equilibrated.

b) Per forming the rows equilibration as a first operation yields: 10 10

5 10 5 10 1

2 10 10 1

0.5 1 0

− −− −− −− −

� � � � ⋅ ⋅⋅ ⋅⋅ ⋅⋅ ⋅� � � �

= ⋅ −= ⋅ −= ⋅ −= ⋅ −� � � � � � � � � � � � � � � �

Then A is also column equilibrated. But the two matr ices are very different.

Suggested exercises: 1. Wr ite scaling matr ices D1, D2 for the above matr ices (tr ivial) and per form the

multiplication / equilibration. 2. Investigate the effect of scaling on the accuracy (global round-off er ror ) in

the k-stage of factor ization:

( )( 1) ( ) ( )

kk k kik

ij ij kjkkk

aa a a

a++++ = −= −= −= − Assume no round-off er ror in

scaling. Exploiting the sparsity of r ight-hand side

Use the Crout’s decomposition or modified G-E. to get

1iiu ==== This will minimize the number of operations in back substitution (no division by diagonal entr ies)

xu y==== The elements 1iil ≠≠≠≠ in general, but with sparsity in b, this is advantageous. Note that it is beneficial to have non-zero entr ies of the vector , b, clustered at the bottom (explain why?).

8. Pivoting for sparsity and accuracy Modification of Markowitz algor ithm 1o Find an element with max. magnitude ( ) ( )maxk k

mx lpk l nk p n

a a≤ ≤≤ ≤≤ ≤≤ ≤≤ ≤≤ ≤≤ ≤≤ ≤

2o Select set D of elements with minimum Markowitz measure

Eliminate elements for which ( ) ( )k k

ij mxa u a< ⋅< ⋅< ⋅< ⋅

This is a threshold parameter that needs to be chosen. Often : 0.1u ====

Elimination of small elements creates subset D . I f D is empty then : Warning Message Otherwise proceed with acceptance of pivot (if D contains one element only) or per form fur ther elimination like in classical (sparsity or iented) Markowitz algor ithm. The approach descr ibed above is also known as threshold pivoting (or relative threshold).

Circuit simulators based on MNA restr ict, whenever possible (if elements are not too small – this involves some kind of threshold consideration), pivot choice to the main diagonal.

Note: 1o MNA usually yields matr ices, which are diagonally dominant, so the strategy is suitable.

2o This strategy preserves symmetry if threshold is not involved.

Pivoting in SPICE 2g.6 (and newer versions) 1o Choose element ( ( )k

iia ) on main diagonal with minimum Markowitz measure. 2o Check it against the largest element ( mxa ) in the same column, and

if ( )kii A R mxa aε εε εε εε ε< +< +< +< +

1310A pivtolεεεε −−−−= == == == = (absolute tolerance)

310R pivrelεεεε −−−−= == == == = (relative tolerance) reject otherwise ( )k

iia - pivot. 3o I f rejected, select next element on diagonal (go to 1 and repeat operations excluding

( )kiia ).

4o I f all elements on diagonal fail the threshold test, per form selection over all elements of active matr ix using Markowitz algor ithm with threshold (2o).

Many other methods attempting to improve solution of sparse systems are available. There is a class of methods aiming at minimizing profile of matr ix A. Definitions

Bandwidth: A has bandwidth m if 0ija ==== when i j m− >− >− >− >

Par ticular cases: 0m ==== Diagonal 1m ==== Tr idiagonal Profile: im - is a bandwidth of i -th row 1,2, ,i n==== �� P - is a matr ix profile

P m====

==== ��

The methods are aiming at developing such order ing schemes that minimize the profile. Most popular : Cuthill-McKee (C-M) Reverse C-M.

8. I terative methods Basics definitions:

Or iginal system Ax b==== True (theoretical) solution *x Star ting vector 0x Sequence of approximate (numer ical) solutions 1 2, , , kx x x��

I teration error : *k ke x x= −= −= −= − Convergence requirement 0ke →→→→ as k → ∞→ ∞→ ∞→ ∞ .

Constructing an iterative scheme: ( )A B A B= + −= + −= + −= + − plug into the or iginal system

yields: ( )Bx A B x b+ − =+ − =+ − =+ − =

General scheme: ( 1) ( )( )k kBx A B x b++++ = − − += − − += − − += − − +

B – is an arbitrar ily selected matr ix. The sophistication comes in the selection process.

One requirement: B must be easy to inver t. I f B is inver ted we can wr ite ( 1) 1 ( ) 1(1 )k kx B A x B b

+ − −+ − −+ − −+ − −= − += − += − += − +� ��

This formula defines the iteration matrix: 1( )B B A−−−− −−−− The sufficient convergence condition (((( )))) 1(1 ) 1Q B Aρ ρρ ρρ ρρ ρ −−−−= − <= − <= − <= − < where: ( )Qρρρρ - represents spectral radius of matr ix Q. I f eigenvalues are known ( 1 2, , , nλ λ λλ λ λλ λ λλ λ λ�� ) then

1( ) max i

i nQρ λρ λρ λρ λ

≤ ≤≤ ≤≤ ≤≤ ≤====

Examples of par ticular choices of matr ix B Gauss-Jacobi scheme Simple decomposition of A produces A L D U= + += + += + += + + where D is the main diagonal and ,L U are lower and upper tr iagonal matr ices (with zero main diagonal elements). 1. Choosing B D==== - yields Gauss-Jacobi scheme where we get A B L U− = +− = +− = +− = + using this as an iteration matr ix we get 1 1( ) ( )B B A D L U− −− −− −− −− = − +− = − +− = − +− = − + . I f A is diagonally dominant, i.e.

ii ijjj i

a a====≠≠≠≠

>>>> ��

then the spectral radius of the iteration matr ix satisfies the convergence condition 1( ) 1D L Uρρρρ −−−−� � � � + <+ <+ <+ < � � � � .

Other methods: 2. B L D= += += += + Gauss-Seidel

The Gauss-Jacobi (1) and Gauss-Seidel (2) schemes have been around for long time.

wB D Lw

= += += += + Successive Over Relaxation

0 2w< << << << <

relaxation parameter . The relaxation parameter ,w , should be selected to minimize

11 wB Aρρρρ −−−−� � � � −−−− � � � �

The above iteration matr ices are usually un-symmetr ic.

Symmetr ic overrelaxation (SSOR)

wB D Lw

= += += += + in one step

wB D Uw

= += += += + in another step

Note: w - must be the same in 2 consecutive steps. I t was shown that the matr ix taken over 2 steps is symmetr ic if A is symmetr ic.

Alternating direction implicit (ADI) The matr ix A of or iginal system is split A H V= += += += + This produces ( )H V x b+ =+ =+ =+ = or Hx Vx b+ =+ =+ =+ = from here a scheme is constructed.

a) Hx Vx b= − += − += − += − + Hx Ix aIx Vx bαααα+ = − ++ = − ++ = − ++ = − + αααα - parameter ( ) ( )H I x I V x bα αα αα αα α+ = − ++ = − ++ = − ++ = − + 1-rst iterative scheme

1 1( ) ( )k k

k kH I x I V x bα αα αα αα α++++

+ ++ ++ ++ ++ = − ++ = − ++ = − ++ = − + b) Vx Hx b= − += − += − += − +

Vx Ix Ix Hx bα αα αα αα α+ = − ++ = − ++ = − ++ = − + ( ) ( )V I x I H x bα αα αα αα α+ = − ++ = − ++ = − ++ = − + 2-nd iterative scheme

1 1( ) ( )kk

k kV I x I H x bα αα αα αα α++++++++

+ ++ ++ ++ ++ = − ++ = − ++ = − ++ = − + Splitting of A is a problem. In some cases of unsymmetr ic matr ices

2TH A A= += += += +

2TV A A= −= −= −= −

Conjugate gradient methods (many var iations are available)

Or iginally developed for symmetr ic matr ices, but have been expanded to general cases. Basic scheme:

for 1,2,

then EXIT

else if 1

r b Ax

ββββ

αααα

========

← −← −← −← −====

====←←←←

← −← −← −← −

← +← +← +← +

←←←←

← +← +← +← +

General proper ties of conjugate gradient method: 1o One multiplication by A per iteration 2o True solution is reached in m-iterations where m is # of distinct eigenvalues of A. 3o Convergence is faster in case of matr ices which have eigenvalues clustered closely

(well-conditioned matr ices). Advanced monograph: (G. H. Golub, C. F. Van Loan, 1983, Matr ix Computation, J. Hopkins Univ. Press)

9. Implementation issues - stor ing a sparse matr ix

(Data structures) Many techniques (data structures) are available. Trade-off between storage and speed of calculations is involved. SPICE2f (and newer versions) employ:

bi-directional threaded list.

Each entry is represented by - value - row index - column index - pointer to the next nonzero entry in the same column. I f there is no next

element, then pointer is zero. - pointer to the next ( 0≠≠≠≠ ) entry in the same row. I f no next, pointer is zero

Pointers indicate the position of next entry in the storage system.

Graphical I llustration of Information Related to an Entry

VALUE ROW COL. PTR1 PTR2

Position of next entry in thesame column (zero if none)

Posi t i on ofnext entry int h e s a m ecolumn (zeroif none)

Example

9 3 7 2 1

6 5 0 0 2

0 0 1 0 3

0 8 0 7 4

� � � =� � �

1 2 3 4j =

Bidirectional Threaded L ist – concept Col.1 Col.2 Col.3 Col.4

9 11 3 21 7 31

7 0044

1 0033

5 0226 012

StartingCol.PTRs

Starting rowpointers

Bidirectional threaded list – implementation (FORTRAN or iented) Row Column

(1) 9 1 1 2

(2) 3 1 2 3

(3) 7 1 3 4

(4) 2 1 4 0

VAL ; IROW ; JCOL ; IPT ; (5) 6 2 1 6

(6) 5 2 2 0

(7) 1 3 3 0

(8) 8 4 2 9

(9) 9 4 4 0

� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �

ISPT77

JPT ;0

0 JSPT3

Column

� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �

� � � � � � � � � � � �

� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �

Examples of Using the Structure

1. Scanning a column Assume column 2

I . a) Read the PTR: JSPT(2) = 2

b) Read the first entry VAL[JSPT(2)]=VAL(2)=3

I I . c) Read pointer to next entry in the column JPT(2) = 6

d) Read the entry VAL[JPT(2)]=VAL(6)=5

I I I . e) Read the PTR: JPT(6)=8

f) Read the entry VAL[JPT(6)]=VAL(8)=8

IV. g) Read the PTR to next entry JPT(8)=0 Last element – done!

2. Scanning a row

Assume row 4:

a) Read the star ting PTR: ISPT(4)=8

b) Read the entry

VAL[ISPT(4)]=VAL[8]=8

c) Read the PTR to the next entry

IPT(8)=9

d) Read the entry VAL[IPT(8)]=VAL(9)=7

e) Read the PTR to the next entry

IPT(9)=0

Last entry – done!

3. Add an entry (fill-in) Assume fill-in in the position (2, 3) – matr ix The value = 14. The fill-in will be added at the position 10.

A. Thus VAL(10)=2

indexes: IROW(10)=2 JCOL(10)=3

B. Changing the pointers

a) Scan the column 2 till entry in the row 2 and change IPT(6)=0 to IPT(6)=10

b) Scan the column 3 till the entry in the row 1 and change JPT(3)=7 to JPT(3)=10

C. Add PTRs for new entry

IPT(10)=0

JPT(10)=7

Overview of numerical problems and implementation for circuit 3. …ece570/session10_2.pdf ·...

Documents

Transcript of Overview of numerical problems and implementation for circuit 3. …ece570/session10_2.pdf ·...

MatRaptor: A Sparse-Sparse Matrix Multiplication ......Sparse-sparse matrix-matrix multiplication (SpGEMM) is a key computational primitive in many important application do-mains such

Sparse Sparse Bundle Adjustment - bmva.org · sparse Cholesky solvers. sSBA outperforms the current SBA standard implementation on datasets with sparse secondary structure by at least

Faster Compressed Sparse Row (CSR)-based Sparse Matrix ...

SPARSE CODING - dsba.korea.ac.krdsba.korea.ac.kr/wp/wp-content/seminar/Deep learning/SPARSE-CODING...SPARSE CODING Deep learning lab ... The brain might be learning a sparse representation

SPARSE STORAGE RECOMMENDATION SYSTEM FOR SPARSE … · 2015. 8. 4. · SPARSE STORAGE RECOMMENDATION SYSTEM FOR SPARSE MATRIX VECTOR MULTIPLICATION ON GPU Monika Shah Department of

Sparse Matrix-Vector Multiplier - UCLAkodiak.ee.ucla.edu/cite/pdf/Sparse Matrix-Vector Multiplier - Wendi... · Sparse Matrix-Vector Multiplier Wendi Liu 3 1. Introduction Sparse

MacSeNet/SpaRTan Spring School on Sparse ......Sparse Representations and Compressed Sensing 2 o Dictionary Learning o Sparse synthesis model (SimCO algorithm) o Sparse analysis model

Factors of Sparse Polynomials are Sparse

An introduction to Sparse coding, Sparse sensing, and ...disp.ee.ntu.edu.tw/~pujols/An introduction to... · An Introduction to Sparse Coding, Sparse Sensing, and Optimization Speaker:

LSQR: An Algorithm for Sparse Linear Equations and Sparse ...LSQR: An Algorithm for Sparse Linear Equations and Sparse Least Squares • 47 therefore has unsatisfactory numerical properties.

Sparse Optimization - Lecture: Basic Sparse Optimization ...wotaoyin/summer2013/...Sparse Optimization Lecture: Basic Sparse Optimization Models Instructor: Wotao Yin July 2013 online

Sparse Representations

Sparse Class

Direct Methods for Sparse Linear Systems: MATLAB sparse ...yintat.com/teaching/cse599-winter18/12_1.pdf · Direct Methods for Sparse Linear Systems: MATLAB sparse backslash Tim Davis

Intro Sparse

Sparse Matrices

Sparse Matrix Sparse Vector Multiplication using Parallel ...web.eecs.utk.edu/~gdp/pdf/baugher-ms-thesis.pdf · Sparse Matrix Sparse Vector Multiplication using Parallel and Reconfigurable

Dann Sparse

sparse Matrix

ECE 570: Advanced Computer Architectureeecs.oregonstate.edu/research/vlsi/teaching/ECE570_WIN13/ECE570...ECE 570: Advanced Computer Architecture Advanced Pipelining: Dynamic Scheduling