Anshul Kumar, CSE IITD CSL718 : Superscalar Processors Issue and Despatch 23rd Jan, 2006.
Anshul Kumar, CSE IITD CS718 : Data Parallel Processors 27 th April, 2006.
-
Upload
jocelin-nichols -
Category
Documents
-
view
219 -
download
0
Transcript of Anshul Kumar, CSE IITD CS718 : Data Parallel Processors 27 th April, 2006.
![Page 1: Anshul Kumar, CSE IITD CS718 : Data Parallel Processors 27 th April, 2006.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649e535503460f94b4923e/html5/thumbnails/1.jpg)
Anshul Kumar, CSE IITD
CS718 : Data Parallel ProcessorsCS718 : Data Parallel ProcessorsCS718 : Data Parallel ProcessorsCS718 : Data Parallel Processors
27th April, 2006
![Page 2: Anshul Kumar, CSE IITD CS718 : Data Parallel Processors 27 th April, 2006.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649e535503460f94b4923e/html5/thumbnails/2.jpg)
Anshul Kumar, CSE IITD
Data Parallel ArchitecturesData Parallel ArchitecturesData Parallel ArchitecturesData Parallel Architectures
• SIMD Processors– Multiple processing elements driven by a single
instruction stream• Associative Processors
– SIMD like processors with associative memory• Vector Processors
– Uni-processors with vector instructions• Systolic Arrays
– Application specific VLSI structures
![Page 3: Anshul Kumar, CSE IITD CS718 : Data Parallel Processors 27 th April, 2006.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649e535503460f94b4923e/html5/thumbnails/3.jpg)
Anshul Kumar, CSE IITD
SIMDSIMDSIMDSIMD
C
P
P
MIS
DS
DS
One of the earliest model of parallel computer
![Page 4: Anshul Kumar, CSE IITD CS718 : Data Parallel Processors 27 th April, 2006.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649e535503460f94b4923e/html5/thumbnails/4.jpg)
Anshul Kumar, CSE IITD
ILLIAC IV SIMD ModelILLIAC IV SIMD ModelILLIAC IV SIMD ModelILLIAC IV SIMD Model
P
M
P
M
P
M
P
M
Interconnection network
PE1 PE2 PEn
CU
I/O
bus
Planned for 64 x 4 PEs, built only 64
![Page 5: Anshul Kumar, CSE IITD CS718 : Data Parallel Processors 27 th April, 2006.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649e535503460f94b4923e/html5/thumbnails/5.jpg)
Anshul Kumar, CSE IITD
Burroughs Scientific Processor (BSP) ModelBurroughs Scientific Processor (BSP) ModelBurroughs Scientific Processor (BSP) ModelBurroughs Scientific Processor (BSP) Model
P
M
P1
M1
P2
M2
Pn
Mk
Interconnection network
CU
I/O
bus
![Page 6: Anshul Kumar, CSE IITD CS718 : Data Parallel Processors 27 th April, 2006.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649e535503460f94b4923e/html5/thumbnails/6.jpg)
Anshul Kumar, CSE IITD
SIMD algorithms: sum of vector elementsSIMD algorithms: sum of vector elementsSIMD algorithms: sum of vector elementsSIMD algorithms: sum of vector elements
Si = ai + ai+1 i = 0,2,4,6
Si = Si + Si+2 i = 0,4
Si = Si + Si+4 i = 0
a0 a1 a2 a3 a4 a5 a6 a7
a0+a1 a2+a3 a4+a5 a6+a7
a0+a1+a2+a3
a4+a5+a6+a7
a0+a1+a2+a3+a4+a5+a6+a7
step 1:
step 2:
step 3:
Si = ai + ai+4 i = 0,1,2,3
Si = Si + Si+2 i = 0,1
Si = Si + Si+1 i = 0
OR
![Page 7: Anshul Kumar, CSE IITD CS718 : Data Parallel Processors 27 th April, 2006.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649e535503460f94b4923e/html5/thumbnails/7.jpg)
Anshul Kumar, CSE IITD
No. of processors vs timeNo. of processors vs timeNo. of processors vs timeNo. of processors vs time
Adding vector elements:– n processors – log n steps– n/log n processors – log n steps
Matrix multiplication:– n processor – n2 steps– n2 processors – n steps– n3 processors – log n steps– n3/log n processors – log n steps
Important factors: data distribution, network
![Page 8: Anshul Kumar, CSE IITD CS718 : Data Parallel Processors 27 th April, 2006.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649e535503460f94b4923e/html5/thumbnails/8.jpg)
Anshul Kumar, CSE IITD
Rise and fall of SIMDsRise and fall of SIMDsRise and fall of SIMDsRise and fall of SIMDs• Introduced in 60’s (e.g. Illiac, BSP)• Problems:
– not cost effective– serial fraction and Amdahl’s law– I/O bottle neck
• Overshadowed by Vector Processors• Resurrected in 80’s (MPP from Goodyear,
Connection machine from Thinking Machines Inc., MP-1 from MasPar)
• Did not survive because of high cost
![Page 9: Anshul Kumar, CSE IITD CS718 : Data Parallel Processors 27 th April, 2006.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649e535503460f94b4923e/html5/thumbnails/9.jpg)
Anshul Kumar, CSE IITD
Related ideasRelated ideasRelated ideasRelated ideas
• Coarse grain SIMD with off the shelf processors (synchronized MIMD), e.g. CM5 of Thinking Machines
• This gave rise to SPMD (single program multiple data)
• MMX and SIMD instructions in Pentium
![Page 10: Anshul Kumar, CSE IITD CS718 : Data Parallel Processors 27 th April, 2006.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649e535503460f94b4923e/html5/thumbnails/10.jpg)
Anshul Kumar, CSE IITD
Vector ProcessorsVector ProcessorsVector ProcessorsVector Processors
I-cache
D-cache
Memcontrol
I-unitand
control
V-reg GPRsaddress
unit
VFU VFU FU
Buses
Mem
ory
![Page 11: Anshul Kumar, CSE IITD CS718 : Data Parallel Processors 27 th April, 2006.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649e535503460f94b4923e/html5/thumbnails/11.jpg)
Anshul Kumar, CSE IITD
Four Generations of CRAY systems Four Generations of CRAY systems (vector processors)(vector processors)
Four Generations of CRAY systems Four Generations of CRAY systems (vector processors)(vector processors)
System CPUs Clock Flops/ Words Mflops Gates/
MHz clock/ moved/ chip
CPU clk/CPU
CRAY-1 1 80 2 1 80 2
X-MP 4 105 2 3 840 16
Y-MP 8 166 2 3 2667 2500
C90 16 240 4 6 15360 10000
![Page 12: Anshul Kumar, CSE IITD CS718 : Data Parallel Processors 27 th April, 2006.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649e535503460f94b4923e/html5/thumbnails/12.jpg)
Anshul Kumar, CSE IITD
Cray HistoryCray HistoryCray HistoryCray History
• http://www.cray.com/company/history.html
![Page 13: Anshul Kumar, CSE IITD CS718 : Data Parallel Processors 27 th April, 2006.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649e535503460f94b4923e/html5/thumbnails/13.jpg)
Anshul Kumar, CSE IITD
CRAY C90CRAY C90CRAY C90CRAY C90
• 8GB central memory shared by 16 CPUs
• 128 CPU - mem paths• word =
64 bits + 16 ECC• Dual vector pipes• 128 element segments
Memory
8 sections
8x8 sub sections
8x8x2 bank groups
8x8x2x8 banks
![Page 14: Anshul Kumar, CSE IITD CS718 : Data Parallel Processors 27 th April, 2006.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649e535503460f94b4923e/html5/thumbnails/14.jpg)
Anshul Kumar, CSE IITD
Convex C4/XA systemConvex C4/XA systemConvex C4/XA systemConvex C4/XA system
• CPU: 7.5 ns clock, 1620 MFLOPs
• Mem: 32 MB x 32 banks, 64 bit word, 50ns access time
• 3 FP pipes, 2 results each• Vector regs - FPU cross
bar• 1.1 GB/s per I/O port
5 x 5crossbar
CPUs
mem
orie
s
I/O utilities
![Page 15: Anshul Kumar, CSE IITD CS718 : Data Parallel Processors 27 th April, 2006.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649e535503460f94b4923e/html5/thumbnails/15.jpg)
Anshul Kumar, CSE IITD
Other examplesOther examplesOther examplesOther examples
NEC SX - X
• 4 CPUs• 4 x 2 pipes each
Fujitsu VP5000
• 7 - 222 CPUs• 2 LS pipes• 3 Func pipes• 2 mask pipes
Fujitsu VP2000
1 - 2 CPUs
![Page 16: Anshul Kumar, CSE IITD CS718 : Data Parallel Processors 27 th April, 2006.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649e535503460f94b4923e/html5/thumbnails/16.jpg)
Anshul Kumar, CSE IITD
Systolic Arrays Systolic Arrays (H.T. Kung 1978)(H.T. Kung 1978)Systolic Arrays Systolic Arrays (H.T. Kung 1978)(H.T. Kung 1978)
Simplicity, Regularity, Concurrency, Communication
Example : Band matrix multiplication
666564
56555453
45444342
34333231
232221
1211
666564
56555453
45444342
34333231
232221
1211
000
00
00
00
000
0000
000
00
00
00
000
0000
BBB
BBBB
BBBB
BBBB
BBB
BB
AAA
AAAA
AAAA
AAAA
AAA
AA
C
![Page 17: Anshul Kumar, CSE IITD CS718 : Data Parallel Processors 27 th April, 2006.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649e535503460f94b4923e/html5/thumbnails/17.jpg)
B11 B12
B21
B31
A11
A12
A21
A22
A31
A23
T=0
![Page 18: Anshul Kumar, CSE IITD CS718 : Data Parallel Processors 27 th April, 2006.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649e535503460f94b4923e/html5/thumbnails/18.jpg)
B11 B12
B21
B31
B22
A11
A12
A21
A22
A31
A23
A32
T=1
![Page 19: Anshul Kumar, CSE IITD CS718 : Data Parallel Processors 27 th April, 2006.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649e535503460f94b4923e/html5/thumbnails/19.jpg)
A11
A12
A21
A22
A31
A23
A32
A33
B11 B12
B21
B31
B22
B32
T=2
![Page 20: Anshul Kumar, CSE IITD CS718 : Data Parallel Processors 27 th April, 2006.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649e535503460f94b4923e/html5/thumbnails/20.jpg)
A21
A22
A31
A23
A32
A33
A34
B12
B31
B22
B32
B42
A11 B11
A42 B23A12
B21
T=3
![Page 21: Anshul Kumar, CSE IITD CS718 : Data Parallel Processors 27 th April, 2006.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649e535503460f94b4923e/html5/thumbnails/21.jpg)
A22
A31
A23
A32
A33
A34
B31
B22
B32
B42
A11 B11
A12 B21
A42 B23
A11 B12A21 B11
B33A43
T=4
![Page 22: Anshul Kumar, CSE IITD CS718 : Data Parallel Processors 27 th April, 2006.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649e535503460f94b4923e/html5/thumbnails/22.jpg)
A23
A32
A33
A34
B31 B32
B42
A42 B23
B33A43
A11 B12
A12 B22
A21 B12
A21 B11
A22 B21
C11
A31 B11
T=5
![Page 23: Anshul Kumar, CSE IITD CS718 : Data Parallel Processors 27 th April, 2006.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649e535503460f94b4923e/html5/thumbnails/23.jpg)
A33
A34
B32
B42
A42
B33A43
A21 B12
A22 B22
A21 B11
A22 B21
A23 B31
C11
A31 B12
A31 B11
A32 B21
C12
A12 B23
A53
A44B43
T=6
![Page 24: Anshul Kumar, CSE IITD CS718 : Data Parallel Processors 27 th April, 2006.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649e535503460f94b4923e/html5/thumbnails/24.jpg)
Anshul Kumar, CSE IITD
WARP: Programmable Systolic ProcessorWARP: Programmable Systolic ProcessorWARP: Programmable Systolic ProcessorWARP: Programmable Systolic Processor
[Kung, CMU 1987]
Complete contrast to the original idea
• not application specific
• not a single VLSI
• complex cell (pipelined FP adder, mult, FIFOs, RAM, cross bar)
• linear
• asynchronous
![Page 25: Anshul Kumar, CSE IITD CS718 : Data Parallel Processors 27 th April, 2006.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649e535503460f94b4923e/html5/thumbnails/25.jpg)
Anshul Kumar, CSE IITD
ReferencesReferencesReferencesReferences
• D. Sima, T. Fountain, P. Kacsuk, "Advanced Computer Architectures : A Design Space Approach", Addison Wesley, 1997.
• K. Hwang, "Advanced Computer Architecture : Parallelism, Scalability, Programmability", McGraw Hill, 1993.