Systolic Array Architecture
-
Upload
abhijeetchandratre -
Category
Documents
-
view
141 -
download
4
Transcript of Systolic Array Architecture
April 11, 2023
1
Systolic Array Architecture
April 11, 20232
Definition: Systolic
Definition 1. · sys·to·le (sîs¹te-lê) noun
· The rhythmic contraction of the heart, especially of the ventricles, by which blood is driven through the aorta and pulmonary artery after each dilation or diastole.
Definition 2.Data flows from memory in a rhythmic fashion, passing through many processing elements before it returns to memory.
April 11, 20233
What Is a Systolic Array? Imagine n simple processors arranged in
a row or an array and connected in such a manner that each processor may exchange information with only its neighbours to the right and left. The processors at either end of the row are used for input and output. Such a machine constitutes the simplest example of a systolic array.
April 11, 20234
Basic principle of systolic architecture• Systolic system consists of a set interconnected
cells, each capable of performing some simple
operation.
• Systolic approach reduces the computational
complexity, without complicating the system.
• In a systolic array in particular, we achieve higher
computation throughput without increasing memory
bandwidth
pp. 5
memory
PE
memory
PE PE PE PE PE
Instead of :
100 ns
We have :
100 ns 30 MOPS possible
5 million operations
per second at most
The systolic array
PE
Basic principle of systolic architecture
5
April 11, 20236
Typical structures
1D Linear array 1D Linear array with 2D I/O
Bi-directional two-dimensional network
Hexagonal network
April 11, 20237
Systolic Computing Systolic approach utilizes both Pipelining
and Parallelism
By pipelining, processing may proceed concurrently with input and output, and consequently overall execution time is minimized. Pipelining plus multiprocessing at each stage of a pipeline should lead to the best-possible performance.
Matrix Multiplication
a11 a12 a13
a21 a22 a23
a31 a32 a33 *b11 b12 b13
b21 b22 b23
b31 b32 b33=
c11 c12 c13
c21 c22 c23
c31 c32 c33
Conventional Method: N3
For I = 1 to N For J = 1 to N For K = 1 to N C[I,J] = C[I,J] + A[J,K] * B[K,J];
8
Systolic Method
This will run in O(n) time!
To run in N time we need N x N processing units, in this casewe need 9.
P9P8P7
P6P5P4
P1 P2 P3
9
We need to modify the input data, like so:
a13 a12 a11
a23 a22 a21
a33 a32 a31
b31 b32 b33
b21 b22 b23
b11 b12 b13
Flip columns 1 & 3
Flip rows 1 & 3
and finally stagger the data sets for input.
10
P9P8P7
P6P5P4
P1 P2 P3a13 a12 a11
a23 a22 a21
a33 a32 a31
b31
b21
b11
b32
b22
b12
b33
b23
b13
At every tick of the global system clock data is passed to eachprocessor from two different directions, then it is multiplied and the result is saved in a register.
11
3 4 2
2 5 33 2 5
* =
3 4 2
2 5 33 2 5
23 36 28
25 39 3428 32 37
Lets try this using a systolic array.
P9P8P7
P6P5P4
P1 P2 P32 4 3
3 5 2
323
5 2 3
532
254
12
3*32 4
3 5 2
32
5 2 3
532
254
Clock tick: 1
9 0 0 0 0 0 0 0 0
P1 P2 P3 P4 P6P5 P7 P8 P9
13
2*3
4*2 3*42
3 5
3
5 2 3
532
25
Clock tick: 2
17 12 0 6 0 0 0 0 0
P1 P2 P3 P4 P6P5 P7 P8 P9
14
3*3
2*45*2
2*3 4*5 3*2
3
5 2
532
Clock tick: 3
23 32 6 16 8 0 9 0 0
P1 P2 P3 P4 P6P5 P7 P8 P9
15
3*42*2
2*25*53*3
2*2 4*3
5
5
Clock tick: 4
23 36 18 25 33 4 13 12 0
P1 P2 P3 P4 P6P5 P7 P8 P9
16
3*25*25*3
5*33*2
2*5
Clock tick: 5
23 36 28 25 39 19 28 22 6
P1 P2 P3 P4 P6P5 P7 P8 P9
17
2*35*2
3*5
Clock tick: 6
23 36 28 25 39 34 28 32 12
P1 P2 P3 P4 P6P5 P7 P8 P9
18
5*5
Clock tick: 7
23 36 28 25 39 34 28 32 37
P1 P2 P3 P4 P6P5 P7 P8 P9
19
373228
343925
23 36 28
23 36 28 25 39 34 28 32 37
Same answer! In 2n + 1 time!
P1 P2 P3 P4 P6P5 P7 P8 P9
20
April 11, 202321
Extension to other applications The concepts used in Matrix-Vector
multiplication can be easily extended to compute more complex functions.
Some of these functions include the multiplication of multiple matrices and n-dimensional applications.
Systolic lattice filters used for speech and seismic signal processing
April 11, 202322
Reconfigurable systolic array An array of systolic
elements that can be configured at the lowest level
Relatively new field-programmable gate array (FPGA) technology permits a reconfigurable architecture, as opposed to a reprogrammable architecture.
April 11, 202323
Pipelining Vs. Systolic Array Input data is not consumed Input data streams can flow in different
directions Modules may be organized in a two
dimensional (or higher) configuration Configurable – Different array
configurations available for different processing purposes
April 11, 202324
Why Systolic? Extremely fast. Easily scalable architecture. Can do many tasks single processor
machines cannot attain. Turns some exponential problems into
linear or polynomial time.
April 11, 202325
Why Not Systolic? Expensive. Not needed on most applications, they
are a highly specialized processor type. Difficult to implement and build. No generalized structure, hence
algorithm specific.
April 11, 202326
Summary Systolic Arrays offer a substantial
reduction in the computational complexity.
They are expensive and sometimes complex but yield enormous throughput.
Re-configurability of systolic arrays can be achieved using the FPGA technology
April 11, 202327
References K. T. Johnson, A.R. Hurson, Behrooz
Shirazi, General-Purpose Systolic Arrays, IEEE 1993, pp. 20-31
www.cs.ucf.edu/courses/cot4810/fall04/.../Systolic_Arrays.ppt
www.ee.pdx.edu/~mperkows/temp/May22/jhanduber2.pdf
April 11, 202328
Thank You!Any Questions?