Advanced Computer Architecture (CSL502) PARALLEL PROCESSING · PDF fileAdvanced Computer...

124
Advanced Computer Architecture (CSL502) Unit 1: Introduction To Parallel Processing 1 PARALLEL PROCESSING

Transcript of Advanced Computer Architecture (CSL502) PARALLEL PROCESSING · PDF fileAdvanced Computer...

Page 1: Advanced Computer Architecture (CSL502) PARALLEL PROCESSING · PDF fileAdvanced Computer Architecture (CSL502) Unit 1: Introduction To Parallel Processing 1 PARALLEL PROCESSING

Advanced Computer Architecture (CSL502)

Unit 1: Introduction To Parallel Processing

1

PARALLEL PROCESSING

Page 2: Advanced Computer Architecture (CSL502) PARALLEL PROCESSING · PDF fileAdvanced Computer Architecture (CSL502) Unit 1: Introduction To Parallel Processing 1 PARALLEL PROCESSING

2

Advanced Computer Architecture (CSL502)

•Evolution of Computer Systems

• Parallelism in Uniprocessor Systems

Page 3: Advanced Computer Architecture (CSL502) PARALLEL PROCESSING · PDF fileAdvanced Computer Architecture (CSL502) Unit 1: Introduction To Parallel Processing 1 PARALLEL PROCESSING

Evolution of Computer systems

First Generation - 1940-1956: Vacuum Tubes •The first computers used vacuum tubes for circuitry and magnetic drums for memory, and were often enormous, taking up entire rooms. They were very expensive to operate and in addition to using a great deal of electricity, generated a lot of heat, which was often the cause of malfunctions. •First generation computers relied on machine language to perform operations, and they could only solve one problem at a time. Machine languages are the only languages understood by computers. 3

Page 4: Advanced Computer Architecture (CSL502) PARALLEL PROCESSING · PDF fileAdvanced Computer Architecture (CSL502) Unit 1: Introduction To Parallel Processing 1 PARALLEL PROCESSING

Evolution of Computer systems

First Generation - 1940-1956: Vacuum Tubes •The UNIVAC and ENIAC computers are examples of first- generation computing devices. The UNIVAC was the first commercial computer delivered to a business client, the U.S. Census Bureau in 1951. •Acronym for Electronic Numerical Integrator and

Computer, the world's first operational electronic digital computer, developed by Army Ordnance to compute World War II ballistic firing tables. The ENIAC, weighing 30 tons, using 200 kilowatts of electric power and consisting of 18,000 vacuum tubes,1,500 relays, and hundreds of thousands of resistors, capacitors, and inductors, was completed in 1945. 4

Page 5: Advanced Computer Architecture (CSL502) PARALLEL PROCESSING · PDF fileAdvanced Computer Architecture (CSL502) Unit 1: Introduction To Parallel Processing 1 PARALLEL PROCESSING

Evolution of Computer systems

First Generation - 1940-1956: Vacuum Tubes •In addition to ballistics, the ENIAC's field of application included weather prediction, atomic-energy calculations, cosmic-ray studies, thermal ignition, random-number studies, wind-tunnel design, and other scientific uses. The ENIAC soon became obsolete as the need arose for faster computing speeds.

5

Page 6: Advanced Computer Architecture (CSL502) PARALLEL PROCESSING · PDF fileAdvanced Computer Architecture (CSL502) Unit 1: Introduction To Parallel Processing 1 PARALLEL PROCESSING

Evolution of Computer systems

Second Generation - 1956-1963: Transistors •Transistors replaced vacuum tubes and ushered in the second generation computer. Transistor is a device composed of semiconductor material that amplifies a signal or opens or closes a circuit. Invented in 1947 at Bell Labs, transistors have become the key ingredient of all digital circuits, including computers. Today's latest microprocessor contains tens of millions of microscopic transistors. •Though the transistor still generated a great deal of heat that subjected the computer to damage, it was a vast improvement over the vacuum tube. Second-generation computers still relied on punched cards for input and printouts for output. 6

Page 7: Advanced Computer Architecture (CSL502) PARALLEL PROCESSING · PDF fileAdvanced Computer Architecture (CSL502) Unit 1: Introduction To Parallel Processing 1 PARALLEL PROCESSING

Evolution of Computer systems Second Generation - 1956-1963: Transistors •Second-generation computers moved from cryptic binary

machine language to symbolic, or assembly, languages, which allowed programmers to specify instructions in words. High-level programming languages were also being developed at this time, such as early versions of COBOL and FORTRAN. •These were also the first computers that stored their instructions in their memory, which moved from a magnetic drum to magnetic core technology. •The first computers of this generation were developed for the atomic energy industry.

7

Page 8: Advanced Computer Architecture (CSL502) PARALLEL PROCESSING · PDF fileAdvanced Computer Architecture (CSL502) Unit 1: Introduction To Parallel Processing 1 PARALLEL PROCESSING

Evolution of Computer systems Third Generation - 1964-1971: Integrated Circuits •The development of the integrated circuit was the hallmark of the third generation of computers. Transistors were miniaturized and placed on silicon chips, called semiconductors, which drastically increased the speed and efficiency of computers. •A chip is a small piece of semi conducting material(usually silicon) on which an integrated circuit is embedded. A typical chip is less than ¼-square inches and can contain millions of electronic components(transistors). Computers consist of many chips placed on electronic boards called printed circuit boards. There are different types of chips. For example, CPU chips (also called microprocessors) contain an entire processing unit, whereas memory chips contain blank memory.

8

Page 9: Advanced Computer Architecture (CSL502) PARALLEL PROCESSING · PDF fileAdvanced Computer Architecture (CSL502) Unit 1: Introduction To Parallel Processing 1 PARALLEL PROCESSING

Evolution of Computer systems

Fourth Generation - 1971-Present: Microprocessors •The microprocessor brought the fourth generation of computers, as thousands of integrated circuits we rebuilt onto a single silicon chip. A silicon chip that contains a CPU. In the world of personal computers, the terms microprocessor and CPU are used interchangeably. At the heart of all personal computers and most workstations sits a microprocessor. Microprocessors also control the logic of almost all digital devices, from clock radios to fuel-injection systems for automobiles.

9

Page 10: Advanced Computer Architecture (CSL502) PARALLEL PROCESSING · PDF fileAdvanced Computer Architecture (CSL502) Unit 1: Introduction To Parallel Processing 1 PARALLEL PROCESSING

Evolution of Computer systems Fourth Generation - 1971-Present: Microprocessors •Three basic characteristics differentiate microprocessors: Instruction Set: The set of instructions that the microprocessor can execute. Bandwidth: The number of bits processed in a single instruction.

Clock Speed: Given in megahertz (MHz), the clock speed determines how many instructions per second the processor can execute.

In both cases, the higher the value, the more powerful the CPU. For example, a 32-bit microprocessor that runs at 50MHz is more powerful than a 16-bitmicroprocessor that runs at 25MHz.

10

Page 11: Advanced Computer Architecture (CSL502) PARALLEL PROCESSING · PDF fileAdvanced Computer Architecture (CSL502) Unit 1: Introduction To Parallel Processing 1 PARALLEL PROCESSING

Evolution of Computer systems

Fourth Generation - 1971-Present: Microprocessors •The Intel 4004chip, developed in 1971, located all the components of the computer - from the central processing unit and memory to input/output controls - on a single chip. •In 1981 IBM introduced its first computer for the home user, and in 1984 Apple introduced the Macintosh. Microprocessors also moved out of the realm of desktop computers and into many areas of life as more and more everyday products began to use microprocessors.

11

Page 12: Advanced Computer Architecture (CSL502) PARALLEL PROCESSING · PDF fileAdvanced Computer Architecture (CSL502) Unit 1: Introduction To Parallel Processing 1 PARALLEL PROCESSING

Evolution of Computer systems

Fifth Generation - Present and Beyond: Artificial Intelligence •Fifth generation computing devices, based on artificial intelligence, are still in development, though there are some applications, such as voice recognition, that are being used today. •Artificial Intelligence is the branch of computer science concerned with making computers behave like humans. The term was coined in 1956 by John McCarthy at the Massachusetts Institute of Technology. Artificial

intelligence includes: Games Playing: programming computers to play games such as chess and checkers 12

Page 13: Advanced Computer Architecture (CSL502) PARALLEL PROCESSING · PDF fileAdvanced Computer Architecture (CSL502) Unit 1: Introduction To Parallel Processing 1 PARALLEL PROCESSING

Evolution of Computer systems

Fifth Generation - Present and Beyond: Artificial Intelligence •Expert Systems: programming computers to make decisions in real-life situations (for example, some expert systems help doctors diagnose diseases based on symptoms) •Natural Language: programming computers to

understand natural human languages •Neural Networks: Systems that simulate intelligence by attempting to reproduce the types of physical connections that occur in animal brains •Robotics: programming computers to see and hear and react to other sensory stimuli 13

Page 14: Advanced Computer Architecture (CSL502) PARALLEL PROCESSING · PDF fileAdvanced Computer Architecture (CSL502) Unit 1: Introduction To Parallel Processing 1 PARALLEL PROCESSING

14

Evolution of Computer systems Trends Towards Parallel Processing

Page 15: Advanced Computer Architecture (CSL502) PARALLEL PROCESSING · PDF fileAdvanced Computer Architecture (CSL502) Unit 1: Introduction To Parallel Processing 1 PARALLEL PROCESSING

15

Evolution of Computer systems Trends Towards Parallel Processing

Page 16: Advanced Computer Architecture (CSL502) PARALLEL PROCESSING · PDF fileAdvanced Computer Architecture (CSL502) Unit 1: Introduction To Parallel Processing 1 PARALLEL PROCESSING

16

Evolution of Computer systems Trends Towards Parallel Processing

From an Operating system point of view, computer systems have improved chronologically in four phases:

•Batch processing

•Multiprogramming

•Time sharing

•Multiprocessing

Page 17: Advanced Computer Architecture (CSL502) PARALLEL PROCESSING · PDF fileAdvanced Computer Architecture (CSL502) Unit 1: Introduction To Parallel Processing 1 PARALLEL PROCESSING

Evolution of Computer systems

17

Trends Towards Parallel Processing

Parallel Processing Can Be Challenged in Four Programmatic Levels: •Job or Program level •Task or Procedure level •Interinstruction level •Intrainstruction level The highest job level is conducted algorithmically The lowest Intrainstruction level is implemented by hardware means. There is a trade-off between above two Development in Data communication technologies bridges the

gap between Distributed processing and Parallel processing and we can say distributed processing is a form of parallel processing in a special environment

Page 18: Advanced Computer Architecture (CSL502) PARALLEL PROCESSING · PDF fileAdvanced Computer Architecture (CSL502) Unit 1: Introduction To Parallel Processing 1 PARALLEL PROCESSING

18

Parallelism In Uniprocessor Systems Basic Uniprocessor Architecture

Page 19: Advanced Computer Architecture (CSL502) PARALLEL PROCESSING · PDF fileAdvanced Computer Architecture (CSL502) Unit 1: Introduction To Parallel Processing 1 PARALLEL PROCESSING

19

Parallelism In Uniprocessor Systems Basic Uniprocessor Architecture

Page 20: Advanced Computer Architecture (CSL502) PARALLEL PROCESSING · PDF fileAdvanced Computer Architecture (CSL502) Unit 1: Introduction To Parallel Processing 1 PARALLEL PROCESSING

20

Parallelism In Uniprocessor Systems Parallel Processing Mechanisms

•Multiplicity of functional units

•Parallelism and pipelining within the CPU

•Overlapped CPU and I/O operations

•Use of hierarchical memory system

•Balancing of subsystem bandwidths

•Multiprogramming and time sharing

Page 21: Advanced Computer Architecture (CSL502) PARALLEL PROCESSING · PDF fileAdvanced Computer Architecture (CSL502) Unit 1: Introduction To Parallel Processing 1 PARALLEL PROCESSING

21

Parallelism In Uniprocessor Systems Parallel Processing Mechanisms

Multiplicity of functional units

Page 22: Advanced Computer Architecture (CSL502) PARALLEL PROCESSING · PDF fileAdvanced Computer Architecture (CSL502) Unit 1: Introduction To Parallel Processing 1 PARALLEL PROCESSING

22

Parallelism In Uniprocessor Systems Parallel Processing Mechanisms

Parallelism and pipelining within the CPU

•ALU contains parallel adders with carry lookahead and carry-save

techniques

•For multiply and divide high-speed multiplier recoding and

convergence division techniques are used to explore parallelism and

resource sharing

•Instruction pipelining are used

Page 23: Advanced Computer Architecture (CSL502) PARALLEL PROCESSING · PDF fileAdvanced Computer Architecture (CSL502) Unit 1: Introduction To Parallel Processing 1 PARALLEL PROCESSING

23

Parallelism In Uniprocessor Systems Parallel Processing Mechanisms

Use of hierarchical memory system

Page 24: Advanced Computer Architecture (CSL502) PARALLEL PROCESSING · PDF fileAdvanced Computer Architecture (CSL502) Unit 1: Introduction To Parallel Processing 1 PARALLEL PROCESSING

24

Parallelism In Uniprocessor Systems Parallel Processing Mechanisms

Balancing of subsystem bandwidth

•td > tm > tp

•The BANDWIDTH of a system is defined as the number of operations performed per unit time.

•In case of memory, let W be the number of words delivered per memory cycle tm then

Bm = W / tm (words/s or bytes/s)

•Memory access conflicts may cause delayed access of some of the processor

requests. In practice, the utilized memory bandwidth Bum

Bum = Bm / √ M

Where M=number of interleaved modules in memory system

Page 25: Advanced Computer Architecture (CSL502) PARALLEL PROCESSING · PDF fileAdvanced Computer Architecture (CSL502) Unit 1: Introduction To Parallel Processing 1 PARALLEL PROCESSING

25

Parallelism In Uniprocessor Systems Parallel Processing Mechanisms

Balancing of subsystem bandwidth

Page 26: Advanced Computer Architecture (CSL502) PARALLEL PROCESSING · PDF fileAdvanced Computer Architecture (CSL502) Unit 1: Introduction To Parallel Processing 1 PARALLEL PROCESSING

26

Parallelism In Uniprocessor Systems Parallel Processing Mechanisms

Multiprogramming and Time Sharing

In BATCH PROCESSING approach once a CPU is allocated to a program it remains allocated whether input/output or CPU-bound part is being run. In MULTIPROGRAMMING when a CPU is allocated to a program and its CPU-bound part is over and Input/output- bound part is about to begin, in this situation CPU is taken back from the program and allocated to another program whose CPU-bound part is ready. In case of TIME SHARING equal time slot is given to all programs for execution in Round-robin fashion.

Page 27: Advanced Computer Architecture (CSL502) PARALLEL PROCESSING · PDF fileAdvanced Computer Architecture (CSL502) Unit 1: Introduction To Parallel Processing 1 PARALLEL PROCESSING

27

Parallelism In Uniprocessor Systems Parallel Processing Mechanisms

Multiprogramming and Time Sharing

Page 28: Advanced Computer Architecture (CSL502) PARALLEL PROCESSING · PDF fileAdvanced Computer Architecture (CSL502) Unit 1: Introduction To Parallel Processing 1 PARALLEL PROCESSING

Advanced Computer Architecture (CSL502)

Unit 1: Introduction To Parallel Processing

Page 29: Advanced Computer Architecture (CSL502) PARALLEL PROCESSING · PDF fileAdvanced Computer Architecture (CSL502) Unit 1: Introduction To Parallel Processing 1 PARALLEL PROCESSING

29

Advanced Computer Architecture (CSL502)

•Parallel Computer Structures

• Architectural classification Scheme

Page 30: Advanced Computer Architecture (CSL502) PARALLEL PROCESSING · PDF fileAdvanced Computer Architecture (CSL502) Unit 1: Introduction To Parallel Processing 1 PARALLEL PROCESSING

We

Parallel Computer Structures

can divide it into three architectural

configurations:

Pipeline computers

Array processors

Multiprocessor systems

30

Page 31: Advanced Computer Architecture (CSL502) PARALLEL PROCESSING · PDF fileAdvanced Computer Architecture (CSL502) Unit 1: Introduction To Parallel Processing 1 PARALLEL PROCESSING

executed in overlapped fashion. 31

Parallel Computer Structures Pipeline computers Instruction execution in digital computer can be divided into four major steps •IF(Instruction Fetch): - from main memory •ID(Instruction Decode):- identifying the operation to be performed •OF(Operand Fetch):- accessing data(if any) on which operation to be performed •EX(Execution)-implementing instruction on data

In a nonpipelined computer, these four steps must be

completed before the next instruction can be issued In a pipelined computer, successive instructions are

Page 32: Advanced Computer Architecture (CSL502) PARALLEL PROCESSING · PDF fileAdvanced Computer Architecture (CSL502) Unit 1: Introduction To Parallel Processing 1 PARALLEL PROCESSING

Parallel Computer Structures Pipeline computers

32

Page 33: Advanced Computer Architecture (CSL502) PARALLEL PROCESSING · PDF fileAdvanced Computer Architecture (CSL502) Unit 1: Introduction To Parallel Processing 1 PARALLEL PROCESSING

operations as

Parallel Computer Structures Pipeline computers An instruction cycle consist of multiple pipeline cycles In pipeline the operation of all stages is synchronized under a common clock Interface latches are used between adjacent segments to hold the intermediate results Theoretically, a k-stage pipeline processor could be at most k times faster than nonpipeline processor However, due to memory conflicts, data dependency, branch and interrupts, this ideal speedup may not be achieved For some CPU –bound instructions, the execution phase can be further partitioned into a multiple-stage arithmetic

floating-point 33

logic pipeline,

Page 34: Advanced Computer Architecture (CSL502) PARALLEL PROCESSING · PDF fileAdvanced Computer Architecture (CSL502) Unit 1: Introduction To Parallel Processing 1 PARALLEL PROCESSING

34

Parallel Computer Structures Pipeline computers Some main issues in pipeline design job sequencing Collision prevention Congestion control Branch handling Reconfiguration Hazard resolution

Pipeline computers are suitable for VECTOR PROCESSING

Page 35: Advanced Computer Architecture (CSL502) PARALLEL PROCESSING · PDF fileAdvanced Computer Architecture (CSL502) Unit 1: Introduction To Parallel Processing 1 PARALLEL PROCESSING

Parallel Computer Structures Pipeline computers

35

Page 36: Advanced Computer Architecture (CSL502) PARALLEL PROCESSING · PDF fileAdvanced Computer Architecture (CSL502) Unit 1: Introduction To Parallel Processing 1 PARALLEL PROCESSING

36

Parallel Computer Structures Array Computers

An Array processor is a synchronous parallel

computer with multiple ALUs, called Processing Elements(PE) that can operate in parallel Data routing mechanism is used among PEs Scalar and control-type instructions are directly executed in the control unit(CU) Each PE consists of an ALU with registers and a local memory PEs are passive devices without instruction decoding capabilities

Page 37: Advanced Computer Architecture (CSL502) PARALLEL PROCESSING · PDF fileAdvanced Computer Architecture (CSL502) Unit 1: Introduction To Parallel Processing 1 PARALLEL PROCESSING

Parallel Computer Structures Array Computers

37

Page 38: Advanced Computer Architecture (CSL502) PARALLEL PROCESSING · PDF fileAdvanced Computer Architecture (CSL502) Unit 1: Introduction To Parallel Processing 1 PARALLEL PROCESSING

oMultiport memories 38

Parallel Computer Structures Multiprocessor Systems Multiprocessor systems are used to improve Throughput Reliability Flexibility Availability Multiprocessor system contains two or more processors All processors share access to common sets of memory modules, I/O channels, and peripherals devices Single integrated Operating System governs everything Multiprocessor hardware system organization is determined primarily by the interconnection structure to be used between the memories and processors. Three different interconnections have been used oTime –shared common bus oCrossbar switch network

Page 39: Advanced Computer Architecture (CSL502) PARALLEL PROCESSING · PDF fileAdvanced Computer Architecture (CSL502) Unit 1: Introduction To Parallel Processing 1 PARALLEL PROCESSING

Parallel Computer Structures Multiprocessor Systems

39

Page 40: Advanced Computer Architecture (CSL502) PARALLEL PROCESSING · PDF fileAdvanced Computer Architecture (CSL502) Unit 1: Introduction To Parallel Processing 1 PARALLEL PROCESSING

40

Parallel Computer Structures Performance of Parallel Computers

The theoretical speedup achieved by n identical

parallel processor is at most n times faster than a

single processor

In practice, it is not achieved due to

Memory and communication paths conflicts

Inefficient algorithm etc

Page 41: Advanced Computer Architecture (CSL502) PARALLEL PROCESSING · PDF fileAdvanced Computer Architecture (CSL502) Unit 1: Introduction To Parallel Processing 1 PARALLEL PROCESSING

Parallel Computer Structures Performance of Parallel Computers

41

Page 42: Advanced Computer Architecture (CSL502) PARALLEL PROCESSING · PDF fileAdvanced Computer Architecture (CSL502) Unit 1: Introduction To Parallel Processing 1 PARALLEL PROCESSING

42

Parallel Computer Structures Data Flow Computer

The conventional von Neumann machines are called control flow

computers Program Counter controls execution of program To exploit maximum parallelism in a program, Data Flow Computer were suggested The basic concept is to enable the execution of an instruction whenever its required operand become available Programs for data-driven computations can be represented by Data Flow Graphs

Next slide shows Data Flow Graph for z=(x+y)*2

Page 43: Advanced Computer Architecture (CSL502) PARALLEL PROCESSING · PDF fileAdvanced Computer Architecture (CSL502) Unit 1: Introduction To Parallel Processing 1 PARALLEL PROCESSING

Parallel Computer Structures Data Flow Computer

43

Page 44: Advanced Computer Architecture (CSL502) PARALLEL PROCESSING · PDF fileAdvanced Computer Architecture (CSL502) Unit 1: Introduction To Parallel Processing 1 PARALLEL PROCESSING

Parallel Computer Structures Data Flow Computer

44

Page 45: Advanced Computer Architecture (CSL502) PARALLEL PROCESSING · PDF fileAdvanced Computer Architecture (CSL502) Unit 1: Introduction To Parallel Processing 1 PARALLEL PROCESSING

45

Parallel Computer Structures Data Flow Computer

The basic mechanism for the execution of a data flow program

Each instruction in data flow computer is implemented

as template Activity templates are stored in the activity store Each activity template has a unique address Activity template's address is entered in instruction queue when instruction is ready to execute Instruction fetch and data access are handled by fetch and update units The operation unit performs the required operation

Page 46: Advanced Computer Architecture (CSL502) PARALLEL PROCESSING · PDF fileAdvanced Computer Architecture (CSL502) Unit 1: Introduction To Parallel Processing 1 PARALLEL PROCESSING

46

Architectural Classification Schemes Multiplicity of Instruction – Data Streams

It was introduced by Michael J. Flynn

According to it computer organization is characterized by the multiplicity of hardware provided to service the instruction and the data stream. There are four categorizations

Single instruction stream-single data stream(SISD)

Single instruction stream-multiple data stream(SIMD) Multiple instruction stream-single data stream(MISD) Multiple instruction stream-multiple data

stream(MIMD)

Page 47: Advanced Computer Architecture (CSL502) PARALLEL PROCESSING · PDF fileAdvanced Computer Architecture (CSL502) Unit 1: Introduction To Parallel Processing 1 PARALLEL PROCESSING

Architectural Classification Schemes Multiplicity of Instruction – Data Streams

47

Page 48: Advanced Computer Architecture (CSL502) PARALLEL PROCESSING · PDF fileAdvanced Computer Architecture (CSL502) Unit 1: Introduction To Parallel Processing 1 PARALLEL PROCESSING

Architectural Classification Schemes Multiplicity of Instruction – Data Streams

48

Page 49: Advanced Computer Architecture (CSL502) PARALLEL PROCESSING · PDF fileAdvanced Computer Architecture (CSL502) Unit 1: Introduction To Parallel Processing 1 PARALLEL PROCESSING

Architectural Classification Schemes Multiplicity of Instruction – Data Streams

49

Page 50: Advanced Computer Architecture (CSL502) PARALLEL PROCESSING · PDF fileAdvanced Computer Architecture (CSL502) Unit 1: Introduction To Parallel Processing 1 PARALLEL PROCESSING

50

Architectural Classification Schemes Serial versus Parallel Processing

It was given by Feng

It uses the degree of parallelism to classify various computer architectures The max no of bits processed by a computer in unit time is called “Maximun Parallelism Degree” P Let Pi bits processed by processor in ith processor cycle

T Consider T processor cycles indexed by i-1,2,3, The average parallelism degree, Pa

Page 51: Advanced Computer Architecture (CSL502) PARALLEL PROCESSING · PDF fileAdvanced Computer Architecture (CSL502) Unit 1: Introduction To Parallel Processing 1 PARALLEL PROCESSING

51

Architectural Classification Schemes Serial versus Parallel Processing

In general, Pi<=P

Utilization rate of a computer system within T cycles

Page 52: Advanced Computer Architecture (CSL502) PARALLEL PROCESSING · PDF fileAdvanced Computer Architecture (CSL502) Unit 1: Introduction To Parallel Processing 1 PARALLEL PROCESSING

52

Architectural Classification Schemes Serial versus Parallel Processing

The max parallelism degree P ( C ) of a given computer C is represented by the product of the word length w and the bit-slice length m

P( C ) =n.m

There are four types of processing methods Word-serial and bit-serial(WSBS) Word-parallel and bit-serial(WPBS) Word-serial and bit-parallel(WSBP) Word-parallel and bit-parallel(WPBP)

Page 53: Advanced Computer Architecture (CSL502) PARALLEL PROCESSING · PDF fileAdvanced Computer Architecture (CSL502) Unit 1: Introduction To Parallel Processing 1 PARALLEL PROCESSING

Architectural Classification Schemes Serial versus Parallel Processing

53

Page 54: Advanced Computer Architecture (CSL502) PARALLEL PROCESSING · PDF fileAdvanced Computer Architecture (CSL502) Unit 1: Introduction To Parallel Processing 1 PARALLEL PROCESSING

54

Architectural Classification Schemes Parallelism versus Pipelining

It was proposed by Handler It is based on parallelism in Processor Control Unit, ALU, Bit-level circuit

Page 55: Advanced Computer Architecture (CSL502) PARALLEL PROCESSING · PDF fileAdvanced Computer Architecture (CSL502) Unit 1: Introduction To Parallel Processing 1 PARALLEL PROCESSING

Architectural Classification Schemes Parallelism versus Pipelining

55

Page 56: Advanced Computer Architecture (CSL502) PARALLEL PROCESSING · PDF fileAdvanced Computer Architecture (CSL502) Unit 1: Introduction To Parallel Processing 1 PARALLEL PROCESSING

Architectural Classification Schemes Parallelism versus Pipelining

56

Page 57: Advanced Computer Architecture (CSL502) PARALLEL PROCESSING · PDF fileAdvanced Computer Architecture (CSL502) Unit 1: Introduction To Parallel Processing 1 PARALLEL PROCESSING

Architectural Classification Schemes Parallelism versus Pipelining

57

Page 58: Advanced Computer Architecture (CSL502) PARALLEL PROCESSING · PDF fileAdvanced Computer Architecture (CSL502) Unit 1: Introduction To Parallel Processing 1 PARALLEL PROCESSING

Architectural Classification Schemes Parallelism versus Pipelining

58

Page 59: Advanced Computer Architecture (CSL502) PARALLEL PROCESSING · PDF fileAdvanced Computer Architecture (CSL502) Unit 1: Introduction To Parallel Processing 1 PARALLEL PROCESSING

Advanced Computer Architecture (CSL502)

Unit 1: Introduction To Parallel Processing

59

Page 60: Advanced Computer Architecture (CSL502) PARALLEL PROCESSING · PDF fileAdvanced Computer Architecture (CSL502) Unit 1: Introduction To Parallel Processing 1 PARALLEL PROCESSING

Advanced Computer Architecture (CSL502)

Parallel Processing Applications

60

Page 61: Advanced Computer Architecture (CSL502) PARALLEL PROCESSING · PDF fileAdvanced Computer Architecture (CSL502) Unit 1: Introduction To Parallel Processing 1 PARALLEL PROCESSING

61

Introduction To Parallel Processing Parallel Processing Applications

Fast and efficient computing is highly demanded in many areas like Scientific Engineering Energy resource Medical Military Artificial intelligence Basic research areas

Page 62: Advanced Computer Architecture (CSL502) PARALLEL PROCESSING · PDF fileAdvanced Computer Architecture (CSL502) Unit 1: Introduction To Parallel Processing 1 PARALLEL PROCESSING

62

Introduction To Parallel Processing Parallel Processing Applications

Large-scale scientific problem solving involves

three interactive disciplines:

Theories

Experiments

computations

Page 63: Advanced Computer Architecture (CSL502) PARALLEL PROCESSING · PDF fileAdvanced Computer Architecture (CSL502) Unit 1: Introduction To Parallel Processing 1 PARALLEL PROCESSING

Introduction To Parallel Processing Parallel Processing Applications

63

Page 64: Advanced Computer Architecture (CSL502) PARALLEL PROCESSING · PDF fileAdvanced Computer Architecture (CSL502) Unit 1: Introduction To Parallel Processing 1 PARALLEL PROCESSING

64

Introduction To Parallel Processing Parallel Processing Applications

Computer simulations has several advantages:

Computer simulation are far cheaper and faster than

physical experiments

Computers can solve a wider range of problems than

scientific laboratory equipment can

Computational approaches are only limited by

computer speed and memory capacity, while physical

experiments have many practical constraints

Page 65: Advanced Computer Architecture (CSL502) PARALLEL PROCESSING · PDF fileAdvanced Computer Architecture (CSL502) Unit 1: Introduction To Parallel Processing 1 PARALLEL PROCESSING

65

Introduction To Parallel Processing Parallel Processing Applications

We can divide parallel processing applications in FOUR

categories

Predictive Modeling and Simulations

Engineering Design and Automation

Energy Resources Exploration

Medical, Military, and Basic Research

Page 66: Advanced Computer Architecture (CSL502) PARALLEL PROCESSING · PDF fileAdvanced Computer Architecture (CSL502) Unit 1: Introduction To Parallel Processing 1 PARALLEL PROCESSING

66

Introduction To Parallel Processing Parallel Processing Applications

Predictive Modeling and Simulations

World scientists are concerned about multidimensional

modeling of the atmosphere, the earth environment,

outer space, and the world economy

Predictive modeling is done through extensive

computer simulation experiments which needs computing

speed of 1000million megaflops or above

FLOPS=Floating Point Operation Per Second

Page 67: Advanced Computer Architecture (CSL502) PARALLEL PROCESSING · PDF fileAdvanced Computer Architecture (CSL502) Unit 1: Introduction To Parallel Processing 1 PARALLEL PROCESSING

67

Introduction To Parallel Processing Parallel Processing Applications

Predictive Modeling and Simulations are required in the

following areas

Numerical weather forecasting

Oceanography and astrophysics

Socioeconomics and government use

Page 68: Advanced Computer Architecture (CSL502) PARALLEL PROCESSING · PDF fileAdvanced Computer Architecture (CSL502) Unit 1: Introduction To Parallel Processing 1 PARALLEL PROCESSING

68

Introduction To Parallel Processing Parallel Processing Applications

Numerical weather forecasting

Computations are carried out on a three dimensional grid that

partitions the atmosphere vertically into K levels and horizontally

into M intervals of longitude and N intervals of latitude

Using 270-mile grid(between New York and Washington, D.C.) , a

24-hour forecast would need to perform about 100 billion data

operations

A 100 megaflops computer needs 100 minutes to compute

Page 69: Advanced Computer Architecture (CSL502) PARALLEL PROCESSING · PDF fileAdvanced Computer Architecture (CSL502) Unit 1: Introduction To Parallel Processing 1 PARALLEL PROCESSING

Introduction To Parallel Processing Parallel Processing Applications

69

Page 70: Advanced Computer Architecture (CSL502) PARALLEL PROCESSING · PDF fileAdvanced Computer Architecture (CSL502) Unit 1: Introduction To Parallel Processing 1 PARALLEL PROCESSING

70

Introduction To Parallel Processing Parallel Processing Applications

Oceanography and astrophysics

To do a complete simulation of the Pacific ocean with adequate

resolution (10 grid) for 50 years would take 1000 hours on a Cyber-

205 computer

The formation of the earth from planetesimals in the solar system

can be simulated with a high speed computer

The dynamic range of astrophysics studies may be from billions of

years to milliseconds

Interesting problems include the physics of supernovae and the

dynamics of galaxies. Illiac-IV array processor was used.

Page 71: Advanced Computer Architecture (CSL502) PARALLEL PROCESSING · PDF fileAdvanced Computer Architecture (CSL502) Unit 1: Introduction To Parallel Processing 1 PARALLEL PROCESSING

71

Introduction To Parallel Processing Parallel Processing Applications

Oceanography and astrophysics

Since oceans exchange heat with the atmosphere, a good

understanding of oceans would help in the following areas

Climate predictive analysis

Fishery management

Ocean resource exploration

Coastal dynamics and tides

Oceanography studies use a grid size on a smaller scale and a time

variability on a large scale than those used for atmospheric studies

Page 72: Advanced Computer Architecture (CSL502) PARALLEL PROCESSING · PDF fileAdvanced Computer Architecture (CSL502) Unit 1: Introduction To Parallel Processing 1 PARALLEL PROCESSING

72

Introduction To Parallel Processing Parallel Processing Applications

Socioeconomics and government use

Nobel laureate W. W. Leontief(1980) has proposed an input

output model of the world economy which performs large scale

matrix operations on a CDC scientific computer. This United Nations

supported world economic simulation suggests how a system of

international economic relations that features a partial

disarmament could narrow the gap between the rich and the poor.

In US, FBI uses large computers for crime control

IRS, uses large number of fast mainframe for tax collection and

auditing.

Page 73: Advanced Computer Architecture (CSL502) PARALLEL PROCESSING · PDF fileAdvanced Computer Architecture (CSL502) Unit 1: Introduction To Parallel Processing 1 PARALLEL PROCESSING

73

Introduction To Parallel Processing Parallel Processing Applications

We can divide parallel processing applications in FOUR

categories

Predictive Modeling and Simulations

Engineering Design and Automation

Energy Resources Exploration

Medical, Military, and Basic Research

Page 74: Advanced Computer Architecture (CSL502) PARALLEL PROCESSING · PDF fileAdvanced Computer Architecture (CSL502) Unit 1: Introduction To Parallel Processing 1 PARALLEL PROCESSING

74

Introduction To Parallel Processing Parallel Processing Applications

Engineering Design and Automation

Some of the area where fast computers are used

Finite –element analysis

Computational aerodynamics

Artificial intelligence and automation

Remote sensing application

Page 75: Advanced Computer Architecture (CSL502) PARALLEL PROCESSING · PDF fileAdvanced Computer Architecture (CSL502) Unit 1: Introduction To Parallel Processing 1 PARALLEL PROCESSING

75

Introduction To Parallel Processing Parallel Processing Applications

We can divide parallel processing applications in FOUR

categories

Predictive Modeling and Simulations

Engineering Design and Automation

Energy Resources Exploration

Medical, Military, and Basic Research

Page 76: Advanced Computer Architecture (CSL502) PARALLEL PROCESSING · PDF fileAdvanced Computer Architecture (CSL502) Unit 1: Introduction To Parallel Processing 1 PARALLEL PROCESSING

76

Introduction To Parallel Processing Parallel Processing Applications

Energy Resources Exploration

Seismic exploration- in oil finding

Reservoir modeling- modeling of oil fields

Plasma fusion power- in nuclear fusion research

Nuclear reactor safety

Page 77: Advanced Computer Architecture (CSL502) PARALLEL PROCESSING · PDF fileAdvanced Computer Architecture (CSL502) Unit 1: Introduction To Parallel Processing 1 PARALLEL PROCESSING

77

Introduction To Parallel Processing Parallel Processing Applications

We can divide parallel processing applications in FOUR

categories

Predictive Modeling and Simulations

Engineering Design and Automation

Energy Resources Exploration

Medical, Military, and Basic Research

Page 78: Advanced Computer Architecture (CSL502) PARALLEL PROCESSING · PDF fileAdvanced Computer Architecture (CSL502) Unit 1: Introduction To Parallel Processing 1 PARALLEL PROCESSING

78

Introduction To Parallel Processing Parallel Processing Applications

Medical, Military, and Basic Research

Computer-assisted tomography-the human body can be

modeled by it(CAT) scanning

Genetic engineering- biological systems can be

simulated on supercomputers. A highly pipelined

machine, called the Cytocomputer, has been developed at

the Michigan Environmental research institute for

biomedical image processing.

Weapon research and defence

Page 79: Advanced Computer Architecture (CSL502) PARALLEL PROCESSING · PDF fileAdvanced Computer Architecture (CSL502) Unit 1: Introduction To Parallel Processing 1 PARALLEL PROCESSING

Advanced Computer Architecture (CSL502)

Unit 1: Memory and Input-Output Subsystem

79

Page 80: Advanced Computer Architecture (CSL502) PARALLEL PROCESSING · PDF fileAdvanced Computer Architecture (CSL502) Unit 1: Introduction To Parallel Processing 1 PARALLEL PROCESSING

Advanced Computer Architecture (CSL502)

Hierarchical Memory Structure Virtual Memory System Memory Allocation and Management Cache Memories and Management Input-Output Subsystems

80

Page 81: Advanced Computer Architecture (CSL502) PARALLEL PROCESSING · PDF fileAdvanced Computer Architecture (CSL502) Unit 1: Introduction To Parallel Processing 1 PARALLEL PROCESSING

81

Hierarchical Memory Structure Memory Hierarchy

The design objective of hierarchical memory in a parallel processing system and a multiprogramed uniprocessor system are to attempt to match the processor speed with the rate of information transfer or the bandwidth of the memory at the lowest level and at a reasonable cost In multiprocessor systems, it is frequent that the arrival of concurrent memory requests to memory at the same level of the hierarchy If two or more requests are directed to the same section of the memory at the same level, a conflict is said to occur, which could degrade the performance of the system To avoid conflict the partitioning of the memory at the same level into several modules are done so that some degree of concurrent access can be achieved

Page 82: Advanced Computer Architecture (CSL502) PARALLEL PROCESSING · PDF fileAdvanced Computer Architecture (CSL502) Unit 1: Introduction To Parallel Processing 1 PARALLEL PROCESSING

82

Hierarchical Memory Structure Memory Hierarchy

Memories in the hierarchy can be classified on the basis of Accessing method Random Access Memory(RAM)-the access time ta of a memory word is independent of its location Sequential Access Memory(SAM)-information is accessed serially Direct Access Storage Device(DASD)-rotational devices made of magnetic materials where any block of information can be accessed directly Speed or Access time –in memory hierarchy the highest level has fastest memory speed and lowest level has slowest Primary- Example is RAM Secondary- Example is DASD

Page 83: Advanced Computer Architecture (CSL502) PARALLEL PROCESSING · PDF fileAdvanced Computer Architecture (CSL502) Unit 1: Introduction To Parallel Processing 1 PARALLEL PROCESSING

Hierarchical Memory Structure Memory Hierarchy

83

Page 84: Advanced Computer Architecture (CSL502) PARALLEL PROCESSING · PDF fileAdvanced Computer Architecture (CSL502) Unit 1: Introduction To Parallel Processing 1 PARALLEL PROCESSING

Hierarchical Memory Structure Memory Hierarchy

84

CCD=Charged Coupled Devices

Page 85: Advanced Computer Architecture (CSL502) PARALLEL PROCESSING · PDF fileAdvanced Computer Architecture (CSL502) Unit 1: Introduction To Parallel Processing 1 PARALLEL PROCESSING

85

Hierarchical Memory Structure Memory Hierarchy Example- Three level memory hierarchy

Page 86: Advanced Computer Architecture (CSL502) PARALLEL PROCESSING · PDF fileAdvanced Computer Architecture (CSL502) Unit 1: Introduction To Parallel Processing 1 PARALLEL PROCESSING

86

Hierarchical Memory Structure Memory Hierarchy

The processor usually references an item in memory by providing the address of that item A memory hierarchy is usually organized so that the address space in level i is a subset of that in level i+1 Address Ak in level i is not necessarily address Ak in level i+1, but any information in level i may also exist in level i+1. some of the information in level i may be more updated than that in level i+1 This different copies of same data creates DATA CONSISTENCY or COHERENCE problem between adjacent levels The data consistency problem may also exist between the local memories or caches when two cooperating processes, which are executing concurrently or on separate processors, interact via one or more shared variable.

Page 87: Advanced Computer Architecture (CSL502) PARALLEL PROCESSING · PDF fileAdvanced Computer Architecture (CSL502) Unit 1: Introduction To Parallel Processing 1 PARALLEL PROCESSING

87

P1

Hierarchical Memory Structure Memory Hierarchy P2

X=1

In modeling the performance of a hierarchical memory HIT RATIO(H) is used, which is a probability of finding the requested information in the memory of a given level H depends upon the granularity of information transfer, the capacity, the management strategy any other factors The hit ratio/success function may be written as H(s), where s=memory size The miss ratio is F(s)=1-H(s) Access frequency at level i, relative number of successful access to level i, is

hi=H(si ) – H(si - 1 )

Page 88: Advanced Computer Architecture (CSL502) PARALLEL PROCESSING · PDF fileAdvanced Computer Architecture (CSL502) Unit 1: Introduction To Parallel Processing 1 PARALLEL PROCESSING

88

The granularity(Block size) transferred & the

management policy Design of the processor-memory interconnection network Two performance measures Effective memory access time Utilization of processor

Hierarchical Memory Structure Memory Hierarchy Optimization of Memory Hierarchy

The of n-level memory hierarchy designing is a tradeoff in between performance and cost Performance depends on Program behavior with respect to memory references The access time & memory size of each level

Page 89: Advanced Computer Architecture (CSL502) PARALLEL PROCESSING · PDF fileAdvanced Computer Architecture (CSL502) Unit 1: Introduction To Parallel Processing 1 PARALLEL PROCESSING

89

Which is

T= Ʃ* H(sn ) – H(si – 1 )] . ti i=1

Hierarchical Memory Structure Memory Hierarchy Optimization of Memory Hierarchy

The effective access time Ti from the processor to the ith

level of memory hierarchy is the sum of the individual

average access times tk of each level from k=1 to i i

Ti=Ʃ tk

k=1 The effective access time for each memory reference in n-level memory hierarchy is n

T= Ʃ hi .Ti

i=1 n

Page 90: Advanced Computer Architecture (CSL502) PARALLEL PROCESSING · PDF fileAdvanced Computer Architecture (CSL502) Unit 1: Introduction To Parallel Processing 1 PARALLEL PROCESSING

90

n n

Which is T= Ʃ* 1 – H(si – 1 )] . ti = Ʃ* F(si – 1 )] . ti

i=1 i=1

Total cost of memory system is n

C= Ʃ c(ti ) . si

i=1 Where c (ti ) is the cost per byte of memory at level i and si is size at level i A typical memory-hierarchy design problem involves

n min T=Ʃ [ F(si – 1 )] . ti

i=1

Subject to the constraints C<=C0 where si >0 and ti> 0, for i=1,2,3, …….n

Hierarchical Memory Structure Memory Hierarchy Optimization of Memory Hierarchy

All data are available at level n, thus H(sn )=1

Page 91: Advanced Computer Architecture (CSL502) PARALLEL PROCESSING · PDF fileAdvanced Computer Architecture (CSL502) Unit 1: Introduction To Parallel Processing 1 PARALLEL PROCESSING

91

Hierarchical Memory Structure Addressing Schemes for Main Memory

Main memory is partitioned into several independent modules and the address distributed across these modules This scheme is called interleaving. The interleaving of

address among M modules is called M-way

interleaving Two methods of interleaving The high-order m bits are used to select the modules while remaining n-m bits select the address within the module The low-order m bits are used to select the modules while remaining n-m bits select the address within the module

Page 92: Advanced Computer Architecture (CSL502) PARALLEL PROCESSING · PDF fileAdvanced Computer Architecture (CSL502) Unit 1: Introduction To Parallel Processing 1 PARALLEL PROCESSING

Hierarchical Memory Structure Addressing Schemes for Main Memory

92

High order bits

Page 93: Advanced Computer Architecture (CSL502) PARALLEL PROCESSING · PDF fileAdvanced Computer Architecture (CSL502) Unit 1: Introduction To Parallel Processing 1 PARALLEL PROCESSING

Hierarchical Memory Structure Addressing Schemes for Main Memory

93

Page 94: Advanced Computer Architecture (CSL502) PARALLEL PROCESSING · PDF fileAdvanced Computer Architecture (CSL502) Unit 1: Introduction To Parallel Processing 1 PARALLEL PROCESSING

94

Virtual Memory System The Concept of Virtual Memory Using virtual memory(VM) concept, a program whose size is larger than the

available free memory space can be executed

In VM concept a program is divided into pages(equal sized parts) and loaded

into memory one by one as demanded by the cpu

Memory management is required in the following phases

Program structure and design

Compiler assigns names while translating the program modules from

programming language into the modules of machine code or unique identifiers

A linker then combines these modules of unique identifiers

Composite is translated by a loader into main memory location

The set of unique identifiers defines the virtual space or name space

The set of main memory locations allocated to the program defines the physical

memory space

The last phase is dynamic memory management required during the execution

of program

Page 95: Advanced Computer Architecture (CSL502) PARALLEL PROCESSING · PDF fileAdvanced Computer Architecture (CSL502) Unit 1: Introduction To Parallel Processing 1 PARALLEL PROCESSING

95

Virtual Memory System The Concept of Virtual Memory Let the name space Vj generated by the jth program running on a processor consists of a set of n unique identifiers

Vj={0,1, ……….n-1}

Let the memory space allocated to the program in execution has m locations

M={0,1, ………..m-1}

Since the allocated memory space may vary with program execution, m is a

function of time. At any time t and for each referenced name x ϵ Vj

there is an address map fj (t) : Vj -> M U { Ф}

The function fj (t) is defined by

fj [x,t] = {

y

Ф

if at time t item x is in location y

if at time t item x is missing from M

When item is missing, a fault handler takes following actions

A placement policy selects a location in memory where the fetched item will

be placed

If memory is full, a replacement policy selects item(s) to remove

A fetch policy decides when an item is to be fetched from lower Memory

Page 96: Advanced Computer Architecture (CSL502) PARALLEL PROCESSING · PDF fileAdvanced Computer Architecture (CSL502) Unit 1: Introduction To Parallel Processing 1 PARALLEL PROCESSING

96

Virtual Memory System The Concept of Virtual Memory Program Locality : due to the looping, sequential and blocked formatted control

structures inherent in the grouping of instructions, and data in program, cpu

reference generation pattern is predictable. This property is called locality of

reference.

There are three types of localities

Temporal- there is a tendency for a process to reference in the near future the

elements of the reference strings referenced in the recent past. It is due to loops,

temporary variables, or process blocks

Spatial- there is a tendency for a process to make references to a portion of the

virtual address space in the neighborhood of the last reference

Working set(W)

if we consider a hypothetical interval time window ∆ which moves across the

virtual time axis, it can be seen that only a subset of the virtual address space is

needed during the time interval of the history of a process. The subset of the

virtual space referenced during the interval t, t + ∆ is called the working set W (t, ∆).

∆ is a critical parameter to optimize the working set of the process

Page 97: Advanced Computer Architecture (CSL502) PARALLEL PROCESSING · PDF fileAdvanced Computer Architecture (CSL502) Unit 1: Introduction To Parallel Processing 1 PARALLEL PROCESSING

Virtual Memory System The Concept of Virtual Memory

97

Page 98: Advanced Computer Architecture (CSL502) PARALLEL PROCESSING · PDF fileAdvanced Computer Architecture (CSL502) Unit 1: Introduction To Parallel Processing 1 PARALLEL PROCESSING

98

Virtual Memory System The Concept of Virtual Memory Program Relocation : During the program execution, processor generates logical addresses which are mapped into physical addresses in main

memory When the program is initially loaded, the address mapping is called static relocation and address mapping during the execution is called dynamic relocation static relocation makes it difficult for process to share information which is modifiable during execution One technique for dynamic relocation is the use of relocation/base register. Program may be loaded initially using

static relocation, after which that may be displaced within memory and the contents of relocation register adjusted to reflect the displacement. Two or more processes may share the programs by using different relocation register.

Page 99: Advanced Computer Architecture (CSL502) PARALLEL PROCESSING · PDF fileAdvanced Computer Architecture (CSL502) Unit 1: Introduction To Parallel Processing 1 PARALLEL PROCESSING

Virtual Memory System Paged Memory System

99

DIRECT MAPPING

##This method Needs two memory access, slow##

C=Changed bit

Page 100: Advanced Computer Architecture (CSL502) PARALLEL PROCESSING · PDF fileAdvanced Computer Architecture (CSL502) Unit 1: Introduction To Parallel Processing 1 PARALLEL PROCESSING

100

Virtual Memory System Paged Memory System In this scheme, the virtual space is partitioned into pages and memory is partitioned into frames

Each virtual page contains virtual page no. ip (mapped)

and displacement iw (unmapped) The address map consist of a page table(PT), which contains base address of the frame in memory, if exist The simplest page table may contain one entry for each possible virtual page One page table for each process, and it is created in main memory at the initiation of the process PTBR(page table base reg) in each processor contains the base address of the page table of the currently running process on the processor

Page 101: Advanced Computer Architecture (CSL502) PARALLEL PROCESSING · PDF fileAdvanced Computer Architecture (CSL502) Unit 1: Introduction To Parallel Processing 1 PARALLEL PROCESSING

101

Virtual Memory System Paged Memory System Technique of maintaining multiple virtual address space

Page 102: Advanced Computer Architecture (CSL502) PARALLEL PROCESSING · PDF fileAdvanced Computer Architecture (CSL502) Unit 1: Introduction To Parallel Processing 1 PARALLEL PROCESSING

102

Virtual Memory System Paged Memory System Technique of maintaining multiple virtual address space

In multiprogrammed processor page map contains virtual page

number (ip), a process identification, the RWX, a modified bit (C), and PFA in shared memory

Process identification of currently running process is present in current process register (CPR) of processor

in this scheme virtual page no. of virtual address(VA) is associatively compared with all page map entries(PME) that have same process identification as the current running process. If matched, page frame number is retrived and displacement is concatenated to form physical address. If no match, a page fault interrupt occur which locate the page.

Page 103: Advanced Computer Architecture (CSL502) PARALLEL PROCESSING · PDF fileAdvanced Computer Architecture (CSL502) Unit 1: Introduction To Parallel Processing 1 PARALLEL PROCESSING

103

Virtual Memory System Paged Memory System

Problem with pure paged memory system

It is inefficient if the virtual space is large.

For example, for 32-bit VA and 1K page size, page address is 22-

bits needs 222 page table entries. Assume for 8 MB main memory

223 / 210 =213 page frames. PTE has 13 bits page frame field

No mechanism for reasonable implementation of sharing

Internal fragmentation, last page may have unused space

Table fragmentation, main memory occupied by PT and so are

unavailable for virtual pages

Page 104: Advanced Computer Architecture (CSL502) PARALLEL PROCESSING · PDF fileAdvanced Computer Architecture (CSL502) Unit 1: Introduction To Parallel Processing 1 PARALLEL PROCESSING

104

Virtual Memory System Paged Memory System Example- Address & page table entry formats of VAX-11/780 virtual mem.(VM)

Page 105: Advanced Computer Architecture (CSL502) PARALLEL PROCESSING · PDF fileAdvanced Computer Architecture (CSL502) Unit 1: Introduction To Parallel Processing 1 PARALLEL PROCESSING

105

Virtual Memory System Paged Memory System Example-Partition of virtual address space of VAX-11/780 virtual mem.(VM)

Page 106: Advanced Computer Architecture (CSL502) PARALLEL PROCESSING · PDF fileAdvanced Computer Architecture (CSL502) Unit 1: Introduction To Parallel Processing 1 PARALLEL PROCESSING

106

Virtual Memory System Paged Memory System Example-Region addressing scheme of VAX-11/780

P0 for program’s region page table

P1 for control’s region page table

Page 107: Advanced Computer Architecture (CSL502) PARALLEL PROCESSING · PDF fileAdvanced Computer Architecture (CSL502) Unit 1: Introduction To Parallel Processing 1 PARALLEL PROCESSING

Virtual Memory System Paged Memory System

107

Example- VAX-11/780

Page 108: Advanced Computer Architecture (CSL502) PARALLEL PROCESSING · PDF fileAdvanced Computer Architecture (CSL502) Unit 1: Introduction To Parallel Processing 1 PARALLEL PROCESSING

Virtual Memory System Paged Memory System

108

Example- VAX-11/780

Page 109: Advanced Computer Architecture (CSL502) PARALLEL PROCESSING · PDF fileAdvanced Computer Architecture (CSL502) Unit 1: Introduction To Parallel Processing 1 PARALLEL PROCESSING

109

Virtual Memory System Segmented Memory System

Block-structured HLL programs have high degree of modularity(eg C etc.)

modules are compiled to produce machine codes in a logical space which

further loaded, linked and executed

Set of logically related contiguous data elements are called segments

Segments are allowed to grow and shrink almost arbitrarily, unlike pages

Segmentation is technique for managing virtual space allocation, whereas

paging is a concept to manage the physical space

An element in a segment is referenced by the segment name-element name

pair(<s>, [i]).

During program execution, the segment name<s> is translated into a segment

address by OS and element name is a displacement within a segment

A program consists of a set of linked segments where links are created as a

result of procedure segment calls within the program segment

Page 110: Advanced Computer Architecture (CSL502) PARALLEL PROCESSING · PDF fileAdvanced Computer Architecture (CSL502) Unit 1: Introduction To Parallel Processing 1 PARALLEL PROCESSING

110

Virtual Memory System Segmented Memory System Example- Segmentation was used in Burroughs B5500

Each process has ST pointed by STBR Address fields contains base address of segment in main memory

Page 111: Advanced Computer Architecture (CSL502) PARALLEL PROCESSING · PDF fileAdvanced Computer Architecture (CSL502) Unit 1: Introduction To Parallel Processing 1 PARALLEL PROCESSING

111

Virtual Memory System Segmented Memory System Example- Segmentation was used in Burroughs B5500

Page 112: Advanced Computer Architecture (CSL502) PARALLEL PROCESSING · PDF fileAdvanced Computer Architecture (CSL502) Unit 1: Introduction To Parallel Processing 1 PARALLEL PROCESSING

112

Virtual Memory System Segmented Memory System

When a segment <s> is initially referenced in a process, its segment number is not established in this case an entry must be created in ST A global table, active segment table(AST) is searched to determine whether the segment is active in memory. If it is, the base address and its attribute are returened from AST and an entry is made in AST to indicate that the process is using segment. If entry is not present in AST, file directory search is initiated and appropriate entries are made in AST and ST Known segment table is associated with each process, which contains entries on a set of segments known to the process

Page 113: Advanced Computer Architecture (CSL502) PARALLEL PROCESSING · PDF fileAdvanced Computer Architecture (CSL502) Unit 1: Introduction To Parallel Processing 1 PARALLEL PROCESSING

Virtual Memory System Paged Segmentation Memory System

113

Page 114: Advanced Computer Architecture (CSL502) PARALLEL PROCESSING · PDF fileAdvanced Computer Architecture (CSL502) Unit 1: Introduction To Parallel Processing 1 PARALLEL PROCESSING

Virtual Memory System Paged Segmentation Memory System

114

Page 115: Advanced Computer Architecture (CSL502) PARALLEL PROCESSING · PDF fileAdvanced Computer Architecture (CSL502) Unit 1: Introduction To Parallel Processing 1 PARALLEL PROCESSING

Advanced Computer Architecture (CSL502)

Unit 1: Memory and Input-Output Subsystem

115

Page 116: Advanced Computer Architecture (CSL502) PARALLEL PROCESSING · PDF fileAdvanced Computer Architecture (CSL502) Unit 1: Introduction To Parallel Processing 1 PARALLEL PROCESSING

Advanced Computer Architecture (CSL502)

Hierarchical Memory Structure Virtual Memory System Memory Allocation and Management Cache Memories and Management Input-Output Subsystems

116

Page 117: Advanced Computer Architecture (CSL502) PARALLEL PROCESSING · PDF fileAdvanced Computer Architecture (CSL502) Unit 1: Introduction To Parallel Processing 1 PARALLEL PROCESSING

117

Memory Allocation & Management Classification of Memory Policies

Two policies: Fixed and variable partitioning for allocation of memory pages to active process If the resident set size zi(t) is fixed for all t during which process Pi is active, then the size vector Z(t) is constant during any interval in which the set of active process is fixed; this is called fixed partitioning In variable partitioning the partition vector Z(t) varies with time. The advantage of fixed partitioning is low overhead of implementation but memory utilization degraded Besides fixed and variable partitioning strategies, a memory policy can be global or local. A local policy involves only the resident set of the faulting process; the global policy considers the history of the resident sets of all active processes in making a decision.

Page 118: Advanced Computer Architecture (CSL502) PARALLEL PROCESSING · PDF fileAdvanced Computer Architecture (CSL502) Unit 1: Introduction To Parallel Processing 1 PARALLEL PROCESSING

118

Memory Allocation & Management Classification of Memory Policies

When a page fault occurs, two memory-fetching policies used in fetching the pages of a process. Demand perfecting and demand fetching. In demand perfecting, a number of pages including the faulting page of the process are fetched in anticipation of the process’s future requirements. In demand fetching, only the page referenced is fetched on a miss The ith process’s behavior is described in terms of its reference string, which is a sequence:

Ri(T)=ri(1)ri(2)…….ri(T) Where, ri(k) is the number of the page containing the virtual address references of the process Pi at time k, where k=1,2,….T measures the execution time or virtual time

Page 119: Advanced Computer Architecture (CSL502) PARALLEL PROCESSING · PDF fileAdvanced Computer Architecture (CSL502) Unit 1: Introduction To Parallel Processing 1 PARALLEL PROCESSING

Memory Allocation & Management Optimal Load Control

In multiprogramming environment memory is used dynamically The number of active processes(degree of multiprogramming) in a parallel processor system is usually greater than the number of processors so that switching among processes can be done This capability requires the memory be able to hold the pages of the active processes in order to reduce context switching time Multiprogramming improves concurrency in the use of all system resources, but the degree of multiprogramming should be varied dynamically to maintain both a low overhead on the system and a high degree of concurrency

Page 120: Advanced Computer Architecture (CSL502) PARALLEL PROCESSING · PDF fileAdvanced Computer Architecture (CSL502) Unit 1: Introduction To Parallel Processing 1 PARALLEL PROCESSING

120

A multiprogrammed multiprocessing virtual memory system model

Memory Allocation & Management Optimal Load Control The network has two portion:

(a) Active network-which contains the processor, memory and the file

memory, (b) Passive network-contains

process queue and the policies for admitting new processes to active status. A process is active if it is in active network, where it is eligible to receive processing and have pages in main memory. Each

active process is waiting or in service

Page 121: Advanced Computer Architecture (CSL502) PARALLEL PROCESSING · PDF fileAdvanced Computer Architecture (CSL502) Unit 1: Introduction To Parallel Processing 1 PARALLEL PROCESSING

Memory Allocation & Management Optimal Load Control

121

Page 122: Advanced Computer Architecture (CSL502) PARALLEL PROCESSING · PDF fileAdvanced Computer Architecture (CSL502) Unit 1: Introduction To Parallel Processing 1 PARALLEL PROCESSING

Memory Allocation & Management Memory Management Policies

122

Page 123: Advanced Computer Architecture (CSL502) PARALLEL PROCESSING · PDF fileAdvanced Computer Architecture (CSL502) Unit 1: Introduction To Parallel Processing 1 PARALLEL PROCESSING

123

Memory Allocation & Management Cache Memories and Management

Characteristics of Cache It consists of two parts: cache directory and RAM The memory portion is partitioned into a number of equal-sized blocks called block frames The directory is implemented as some form of associative memory, consists of block address tags and some control bits such as “dirty” bit , a “valid” bit and protection bits The address tag contains the block address of the blocks that are currently in the memory

Page 124: Advanced Computer Architecture (CSL502) PARALLEL PROCESSING · PDF fileAdvanced Computer Architecture (CSL502) Unit 1: Introduction To Parallel Processing 1 PARALLEL PROCESSING

Memory Allocation & Management Cache Memories and Management

124

Simplified flowchart of cache operation for fetch