Systolic Array Architecture

April 11, 2023

1

Systolic Array Architecture

April 11, 20232

Definition: Systolic

Definition 1. · sys·to·le (sîs¹te-lê) noun

· The rhythmic contraction of the heart, especially of the ventricles, by which blood is driven through the aorta and pulmonary artery after each dilation or diastole.

Definition 2.Data flows from memory in a rhythmic fashion, passing through many processing elements before it returns to memory.

April 11, 20233

What Is a Systolic Array? Imagine n simple processors arranged in

a row or an array and connected in such a manner that each processor may exchange information with only its neighbours to the right and left. The processors at either end of the row are used for input and output. Such a machine constitutes the simplest example of a systolic array.

April 11, 20234

Basic principle of systolic architecture• Systolic system consists of a set interconnected

cells, each capable of performing some simple

operation.

• Systolic approach reduces the computational

complexity, without complicating the system.

• In a systolic array in particular, we achieve higher

computation throughput without increasing memory

bandwidth

pp. 5

memory

PE

memory

PE PE PE PE PE

Instead of :

100 ns

We have :

100 ns 30 MOPS possible

5 million operations

per second at most

The systolic array

PE

Basic principle of systolic architecture

5

April 11, 20236

Typical structures

1D Linear array 1D Linear array with 2D I/O

Bi-directional two-dimensional network

Hexagonal network

April 11, 20237

Systolic Computing Systolic approach utilizes both Pipelining

and Parallelism

By pipelining, processing may proceed concurrently with input and output, and consequently overall execution time is minimized. Pipelining plus multiprocessing at each stage of a pipeline should lead to the best-possible performance.

Matrix Multiplication

a11 a12 a13

a21 a22 a23

a31 a32 a33 *b11 b12 b13

b21 b22 b23

b31 b32 b33=

c11 c12 c13

c21 c22 c23

c31 c32 c33

Conventional Method: N3

For I = 1 to N For J = 1 to N For K = 1 to N C[I,J] = C[I,J] + A[J,K] * B[K,J];

8

Systolic Method

This will run in O(n) time!

To run in N time we need N x N processing units, in this casewe need 9.

P9P8P7

P6P5P4

P1 P2 P3

9

We need to modify the input data, like so:

a13 a12 a11

a23 a22 a21

a33 a32 a31

b31 b32 b33

b21 b22 b23

b11 b12 b13

Flip columns 1 & 3

Flip rows 1 & 3

and finally stagger the data sets for input.

10

P9P8P7

P6P5P4

P1 P2 P3a13 a12 a11

a23 a22 a21

a33 a32 a31

b31

b21

b11

b32

b22

b12

b33

b23

b13

At every tick of the global system clock data is passed to eachprocessor from two different directions, then it is multiplied and the result is saved in a register.

11

3 4 2

2 5 33 2 5

* =

3 4 2

2 5 33 2 5

23 36 28

25 39 3428 32 37

Lets try this using a systolic array.

P9P8P7

P6P5P4

P1 P2 P32 4 3

3 5 2

323

5 2 3

532

254

12

3*32 4

3 5 2

32

5 2 3

532

254

Clock tick: 1

9 0 0 0 0 0 0 0 0

P1 P2 P3 P4 P6P5 P7 P8 P9

13

2*3

4*2 3*42

3 5

3

5 2 3

532

25

Clock tick: 2

17 12 0 6 0 0 0 0 0

P1 P2 P3 P4 P6P5 P7 P8 P9

14

3*3

2*45*2

2*3 4*5 3*2

3

5 2

532

Clock tick: 3

23 32 6 16 8 0 9 0 0

P1 P2 P3 P4 P6P5 P7 P8 P9

15

3*42*2

2*25*53*3

2*2 4*3

5

5

Clock tick: 4

23 36 18 25 33 4 13 12 0

P1 P2 P3 P4 P6P5 P7 P8 P9

16

3*25*25*3

5*33*2

2*5

Clock tick: 5

23 36 28 25 39 19 28 22 6

P1 P2 P3 P4 P6P5 P7 P8 P9

17

2*35*2

3*5

Clock tick: 6

23 36 28 25 39 34 28 32 12

P1 P2 P3 P4 P6P5 P7 P8 P9

18

5*5

Clock tick: 7

23 36 28 25 39 34 28 32 37

P1 P2 P3 P4 P6P5 P7 P8 P9

19

373228

343925

23 36 28

23 36 28 25 39 34 28 32 37

Same answer! In 2n + 1 time!

P1 P2 P3 P4 P6P5 P7 P8 P9

20

April 11, 202321

Extension to other applications The concepts used in Matrix-Vector

multiplication can be easily extended to compute more complex functions.

Some of these functions include the multiplication of multiple matrices and n-dimensional applications.

Systolic lattice filters used for speech and seismic signal processing

April 11, 202322

Reconfigurable systolic array An array of systolic

elements that can be configured at the lowest level

Relatively new field-programmable gate array (FPGA) technology permits a reconfigurable architecture, as opposed to a reprogrammable architecture.

April 11, 202323

Pipelining Vs. Systolic Array Input data is not consumed Input data streams can flow in different

directions Modules may be organized in a two

dimensional (or higher) configuration Configurable – Different array

configurations available for different processing purposes

April 11, 202324

Why Systolic? Extremely fast. Easily scalable architecture. Can do many tasks single processor

machines cannot attain. Turns some exponential problems into

linear or polynomial time.

April 11, 202325

Why Not Systolic? Expensive. Not needed on most applications, they

are a highly specialized processor type. Difficult to implement and build. No generalized structure, hence

algorithm specific.

April 11, 202326

Summary Systolic Arrays offer a substantial

reduction in the computational complexity.

They are expensive and sometimes complex but yield enormous throughput.

Re-configurability of systolic arrays can be achieved using the FPGA technology

April 11, 202327

References K. T. Johnson, A.R. Hurson, Behrooz

Shirazi, General-Purpose Systolic Arrays, IEEE 1993, pp. 20-31

www.cs.ucf.edu/courses/cot4810/fall04/.../Systolic_Arrays.ppt

www.ee.pdx.edu/~mperkows/temp/May22/jhanduber2.pdf

http://www.cs.ucf.edu/courses/cot4810/fall04/.../Systolic_Arrays.ppt



http://www.ee.pdx.edu/~mperkows/temp/May22/jhanduber2.pdf

http://www.ee.pdx.edu/~mperkows/temp/May22/jhanduber2.pdf

April 11, 202328

Thank You!Any Questions?

Systolic Array Architecture

Documents

Transcript of Systolic Array Architecture