Ceg4131 models
-
Upload
anandme07 -
Category
Automotive
-
view
255 -
download
0
description
Transcript of Ceg4131 models
![Page 1: Ceg4131 models](https://reader036.fdocuments.us/reader036/viewer/2022081413/548d161bb4795945138b4cd9/html5/thumbnails/1.jpg)
1
Parallel Computer Models
CEG 4131 Computer Architecture III
Miodrag Bolic
![Page 2: Ceg4131 models](https://reader036.fdocuments.us/reader036/viewer/2022081413/548d161bb4795945138b4cd9/html5/thumbnails/2.jpg)
2
Overview
• Flynn’s taxonomy• Classification based on the memory arrangement• Classification based on communication• Classification based on the kind of parallelism
– Data-parallel – Function-parallel
![Page 3: Ceg4131 models](https://reader036.fdocuments.us/reader036/viewer/2022081413/548d161bb4795945138b4cd9/html5/thumbnails/3.jpg)
3
Flynn’s Taxonomy
– The most universally excepted method of classifying computer systems
– Published in the Proceedings of the IEEE in 1966
– Any computer can be placed in one of 4 broad categories
» SISD: Single instruction stream, single data stream
» SIMD: Single instruction stream, multiple data streams
» MIMD: Multiple instruction streams, multiple data streams
» MISD: Multiple instruction streams, single data stream
![Page 4: Ceg4131 models](https://reader036.fdocuments.us/reader036/viewer/2022081413/548d161bb4795945138b4cd9/html5/thumbnails/4.jpg)
4
SISD
Processing element (PE)
Main memory (M)
Instructions
Data
Control Unit PE MemoryPE
IS
IS DS
![Page 5: Ceg4131 models](https://reader036.fdocuments.us/reader036/viewer/2022081413/548d161bb4795945138b4cd9/html5/thumbnails/5.jpg)
5
Applications:• Image processing• Matrix manipulations• Sorting
SIMD
![Page 6: Ceg4131 models](https://reader036.fdocuments.us/reader036/viewer/2022081413/548d161bb4795945138b4cd9/html5/thumbnails/6.jpg)
6
SIMD Architectures
• Fine-grained– Image processing application– Large number of PEs– Minimum complexity PEs– Programming language is a simple extension of a sequential
language
• Coarse-grained– Each PE is of higher complexity and it is usually built with
commercial devices– Each PE has local memory
![Page 7: Ceg4131 models](https://reader036.fdocuments.us/reader036/viewer/2022081413/548d161bb4795945138b4cd9/html5/thumbnails/7.jpg)
7
MIMD
![Page 8: Ceg4131 models](https://reader036.fdocuments.us/reader036/viewer/2022081413/548d161bb4795945138b4cd9/html5/thumbnails/8.jpg)
8
MISD
Applications:• Classification • Robot vision
![Page 9: Ceg4131 models](https://reader036.fdocuments.us/reader036/viewer/2022081413/548d161bb4795945138b4cd9/html5/thumbnails/9.jpg)
9
Flynn taxonomy
– Advantages of Flynn
» Universally accepted
» Compact Notation
» Easy to classify a system (?)
– Disadvantages of Flynn
» Very coarse-grain differentiation among machine systems
» Comparison of different systems is limited
» Interconnections, I/O, memory not considered in the scheme
![Page 10: Ceg4131 models](https://reader036.fdocuments.us/reader036/viewer/2022081413/548d161bb4795945138b4cd9/html5/thumbnails/10.jpg)
10
Classification based on memory arrangement
PE1 PEn
Processors
Interconnectionnetwork
Shared memory
Shared memory - multiprocessors
I/O1
I/OnPE1
Interconnectionnetwork
M1
P1
PEn
Mn
Pn
Message passing - multicomputers
![Page 11: Ceg4131 models](https://reader036.fdocuments.us/reader036/viewer/2022081413/548d161bb4795945138b4cd9/html5/thumbnails/11.jpg)
11
Shared-memory multiprocessors
• Uniform Memory Access (UMA)• Non-Uniform Memory Access (NUMA)• Cache-only Memory Architecture (COMA)
• Memory is common to all the processors.• Processors easily communicate by means of
shared variables.
![Page 12: Ceg4131 models](https://reader036.fdocuments.us/reader036/viewer/2022081413/548d161bb4795945138b4cd9/html5/thumbnails/12.jpg)
12
The UMA Model
• Tightly-coupled systems (high degree of resource sharing)
• Suitable for general-purpose and time-sharing applications by multiple users.
P1
$
Interconnection network
$
Pn
Mem Mem
![Page 13: Ceg4131 models](https://reader036.fdocuments.us/reader036/viewer/2022081413/548d161bb4795945138b4cd9/html5/thumbnails/13.jpg)
13
Symmetric and asymmetric multiprocessors
• Symmetric: - all processors have equal access to all peripheral devices.- all processors are identical.
• Asymmetric: - one processor (master) executes the operating system- other processors may be of different types and may be dedicated to special tasks.
![Page 14: Ceg4131 models](https://reader036.fdocuments.us/reader036/viewer/2022081413/548d161bb4795945138b4cd9/html5/thumbnails/14.jpg)
14
The NUMA Model
• The access time varies with the location of the memory word.• Shared memory is distributed to local memories.• All local memories form a global address space accessible by
all processors
P1
$
Interconnection network
$
Pn
Mem Mem
Distributed Memory (NUMA)
Access time: Cache, Local memory, Remote memory
COMA - Cache-only Memory Architecture
![Page 15: Ceg4131 models](https://reader036.fdocuments.us/reader036/viewer/2022081413/548d161bb4795945138b4cd9/html5/thumbnails/15.jpg)
15
Distributed memory multicomputers
• Multiple computers- nodes
• Message-passing network
• Local memories are private with its own program and data
• No memory contention so that the number of processors is very large
• The processors are connected by communication lines, and the precise way in which the lines are connected is called the topology of the multicomputer.
• A typical program consists of subtasks residing in all the memories. PE
Interconnectionnetwork
M
PE
M
PE
M
PE
M
PE
M
PE
M
![Page 16: Ceg4131 models](https://reader036.fdocuments.us/reader036/viewer/2022081413/548d161bb4795945138b4cd9/html5/thumbnails/16.jpg)
16
Classification based on type of interconnections
• Static networks
• Dynamic networks
![Page 17: Ceg4131 models](https://reader036.fdocuments.us/reader036/viewer/2022081413/548d161bb4795945138b4cd9/html5/thumbnails/17.jpg)
17
Interconnection Network [1]
• Mode of Operation (Synchronous vs. Asynchronous)
• Control Strategy (Centralized vs. Decentralized)
• Switching Techniques (Packet switching vs. Circuit switching)
• Topology (Static Vs. Dynamic)
![Page 18: Ceg4131 models](https://reader036.fdocuments.us/reader036/viewer/2022081413/548d161bb4795945138b4cd9/html5/thumbnails/18.jpg)
18
Classification based on the kind of parallelism[3]Parallel
architecturesPAs
Data-parallel architectures Function-parallel architectures
Instruction-level
PAs
Thread-level
PAs
Process-levelPAs
ILPS MIMDs
Vectorarchitecture
Associative
architecturearchitectureand neural SIMDs Systolic Pipelined
processorsProcessors)
processorsVLIWs Superscalar Ditributedmemory
(multi-computer)
Sharedmemory(multi-MIMD
DPs
![Page 19: Ceg4131 models](https://reader036.fdocuments.us/reader036/viewer/2022081413/548d161bb4795945138b4cd9/html5/thumbnails/19.jpg)
19
References
1. Advanced Computer Architecture and Parallel Processing, by Hesham El-Rewini and Mostafa Abd-El-Barr, John Wiley and Sons, 2005.
2. Advanced Computer Architecture Parallelism, Scalability, Programmability, by K. Hwang, McGraw-Hill 1993.
3. Advanced Computer Architectures – A Design Space Approach by Desco Sima, Terence Fountain and Peter Kascuk, Pearson, 1997.
![Page 20: Ceg4131 models](https://reader036.fdocuments.us/reader036/viewer/2022081413/548d161bb4795945138b4cd9/html5/thumbnails/20.jpg)
20
Speedup
• S = Speed(new) / Speed(old)
• S = Work/time(new) / Work/time(old)
• S = time(old) / time(new)
• S = time(before improvement) /
time(after improvement)
![Page 21: Ceg4131 models](https://reader036.fdocuments.us/reader036/viewer/2022081413/548d161bb4795945138b4cd9/html5/thumbnails/21.jpg)
21
Speedup
• Time (one CPU): T(1)
• Time (n CPUs): T(n)
• Speedup: S
• S = T(1)/T(n)
![Page 22: Ceg4131 models](https://reader036.fdocuments.us/reader036/viewer/2022081413/548d161bb4795945138b4cd9/html5/thumbnails/22.jpg)
22
Amdahl’s Law
The performance improvement to be gained from using some faster mode of execution is limited by the fraction of the time the faster mode can be used
![Page 23: Ceg4131 models](https://reader036.fdocuments.us/reader036/viewer/2022081413/548d161bb4795945138b4cd9/html5/thumbnails/23.jpg)
23
20 hours
200 miles
A B
Walk 4 miles /hourBike 10 miles / hourCar-1 50 miles / hourCar-2 120 miles / hourCar-3 600 miles /hour
must walk
Example
![Page 24: Ceg4131 models](https://reader036.fdocuments.us/reader036/viewer/2022081413/548d161bb4795945138b4cd9/html5/thumbnails/24.jpg)
24
20 hours
200 miles
A B
Walk 4 miles /hour 50 + 20 = 70 hours S = 1Bike 10 miles / hour 20 + 20 = 40 hours S = 1.8Car-1 50 miles / hour 4 + 20 = 24 hours S = 2.9Car-2 120 miles / hour 1.67 + 20 = 21.67 hours S = 3.2Car-3 600 miles /hour 0.33 + 20 = 20.33 hours S = 3.4
must walk
Example
![Page 25: Ceg4131 models](https://reader036.fdocuments.us/reader036/viewer/2022081413/548d161bb4795945138b4cd9/html5/thumbnails/25.jpg)
25
Amdahl’s Law (1967)
: The fraction of the program that is naturally serial
• (1- ): The fraction of the program that is naturally parallel
![Page 26: Ceg4131 models](https://reader036.fdocuments.us/reader036/viewer/2022081413/548d161bb4795945138b4cd9/html5/thumbnails/26.jpg)
26
S = T(1)/T(N)
T(N) = T(1) + T(1)(1- )
N
S = 1
+ (1- )
N
=N
N + (1- )
![Page 27: Ceg4131 models](https://reader036.fdocuments.us/reader036/viewer/2022081413/548d161bb4795945138b4cd9/html5/thumbnails/27.jpg)
27
Amdahl’s Law