CMPT 250 Computer Architecture
description
Transcript of CMPT 250 Computer Architecture
![Page 1: CMPT 250 Computer Architecture](https://reader035.fdocuments.us/reader035/viewer/2022062422/568139e4550346895da19b9b/html5/thumbnails/1.jpg)
Instructor: Yuzhuang [email protected]
![Page 2: CMPT 250 Computer Architecture](https://reader035.fdocuments.us/reader035/viewer/2022062422/568139e4550346895da19b9b/html5/thumbnails/2.jpg)
Assembly LinesAn assembly line is a manufacture process in
which parts are added into a product in a sequential manner using optimally planned logistics to create a finished product much faster than handcrafting-type methods.
The Ford Motor Company built the world’s first assembly line between 1908 and 1915.
This pipeline made the Ford Model T affordable and brought high wages to Ford workers.
![Page 3: CMPT 250 Computer Architecture](https://reader035.fdocuments.us/reader035/viewer/2022062422/568139e4550346895da19b9b/html5/thumbnails/3.jpg)
Some Pictures of the Ford 1913 Assembly Line
![Page 4: CMPT 250 Computer Architecture](https://reader035.fdocuments.us/reader035/viewer/2022062422/568139e4550346895da19b9b/html5/thumbnails/4.jpg)
A CalculationConsider assembly the car. Assume it has
three steps: install the engine, install the hood, and install the wheel.
One car takes 35 minutes. Three cars take 105 minutes, if only one car can be operated at once.
Install the hood
Install the engine
Install the wheel
5 minutes 20 minutes
10 minutes
![Page 5: CMPT 250 Computer Architecture](https://reader035.fdocuments.us/reader035/viewer/2022062422/568139e4550346895da19b9b/html5/thumbnails/5.jpg)
A Calculation contd.What if we have three workers for each part?
Ideally, a car can be assembled in every 20 minutes.
Install the hood
Install the engine
Install the wheel
5
25
35
1st car
Install the hood
Install the engine
Install the wheel
Install the hood
Install the engine
Install the wheel
45
55
65
75
2nd car
3rd car
![Page 6: CMPT 250 Computer Architecture](https://reader035.fdocuments.us/reader035/viewer/2022062422/568139e4550346895da19b9b/html5/thumbnails/6.jpg)
Pipeline Design
Separate the process into different stages of almost the same length.
These stages are separated by registers.
These registers provide temporary storage for data passing through the pipeline and are called pipeline platforms.
![Page 7: CMPT 250 Computer Architecture](https://reader035.fdocuments.us/reader035/viewer/2022062422/568139e4550346895da19b9b/html5/thumbnails/7.jpg)
A Pipelined DatapathConventional: 0.6, 0.6, 0.2, 0.8, 0.2 ns (new) in total: 2.4 ns rate: 416.7 MHzPipelined: 0.6, 0.6, 0.2, 0.2, 0.8, 0.2, 0.2 ns (new version) in total: 1 ns rate: 1 GHz
0.6
0.6
0.2
0.8
0.2
0.6
0.6
0.2
0.2
0.8
0.2
0.2
![Page 8: CMPT 250 Computer Architecture](https://reader035.fdocuments.us/reader035/viewer/2022062422/568139e4550346895da19b9b/html5/thumbnails/8.jpg)
D LatchEliminate the undesirable undefined state in the
SR latch: ensure S and R are never 1 at the same time.
D
Q
Q
C
![Page 9: CMPT 250 Computer Architecture](https://reader035.fdocuments.us/reader035/viewer/2022062422/568139e4550346895da19b9b/html5/thumbnails/9.jpg)
Negative-Edge-Triggered D Flip-Flop1s-Catching behaviour is eliminated as S and
R can not both be 0 in a D Flip-Flop.
D
C
D
C
S
C
R
![Page 10: CMPT 250 Computer Architecture](https://reader035.fdocuments.us/reader035/viewer/2022062422/568139e4550346895da19b9b/html5/thumbnails/10.jpg)
Assume no data hazards.
![Page 11: CMPT 250 Computer Architecture](https://reader035.fdocuments.us/reader035/viewer/2022062422/568139e4550346895da19b9b/html5/thumbnails/11.jpg)
How much can we gain?Conventional: 2.4 * 7 ns Pipeline: 9 * 1 ns
![Page 12: CMPT 250 Computer Architecture](https://reader035.fdocuments.us/reader035/viewer/2022062422/568139e4550346895da19b9b/html5/thumbnails/12.jpg)
Assume no data and control hazards.
![Page 13: CMPT 250 Computer Architecture](https://reader035.fdocuments.us/reader035/viewer/2022062422/568139e4550346895da19b9b/html5/thumbnails/13.jpg)
Pipeline contd.In the first four clock cycles, the pipeline is filling.
In the next four clock cycles, all stages of the pipeline are active. The pipeline is fully utilized.
In the last three clock cycles, not all stages of the pipeline are active, since the pipeline is emptying.
![Page 14: CMPT 250 Computer Architecture](https://reader035.fdocuments.us/reader035/viewer/2022062422/568139e4550346895da19b9b/html5/thumbnails/14.jpg)
The Reduced Instruction Set Computer (RISC)The goal of a RISC architecture is high
throughput and fast execution. To achieve these goals, accesses to memory are to be avoided.
A RISC architecture has the following properties: Memory accesses are restricted to load and store
instructions, and data-manipulation instructions are register-to-register.
Addressing modes are limited in number. Instruction formats are all of the same length. Instructions perform elementary operations.
![Page 15: CMPT 250 Computer Architecture](https://reader035.fdocuments.us/reader035/viewer/2022062422/568139e4550346895da19b9b/html5/thumbnails/15.jpg)
A RISC Instruction Set Architecture32 registers R0 through R31. R0 is a special
register storing the value zero.
![Page 16: CMPT 250 Computer Architecture](https://reader035.fdocuments.us/reader035/viewer/2022062422/568139e4550346895da19b9b/html5/thumbnails/16.jpg)
![Page 17: CMPT 250 Computer Architecture](https://reader035.fdocuments.us/reader035/viewer/2022062422/568139e4550346895da19b9b/html5/thumbnails/17.jpg)
![Page 18: CMPT 250 Computer Architecture](https://reader035.fdocuments.us/reader035/viewer/2022062422/568139e4550346895da19b9b/html5/thumbnails/18.jpg)
Datapath OrganizationThe new datapath has 32 32-bit registers. The address
inputs are therefore five bits.
The replacement of the single-bit position shifter with a barrel shifter to permit multiple-position (SH) shifting.
In the function unit, the ALU is expanded to 32 bits.
The constant unit performs zero fill for CS=0 and sign extension for CS=1.
MUX A is added to provide a path from the updated PC, PC-
1, for implementation of the JML instruction.
![Page 19: CMPT 250 Computer Architecture](https://reader035.fdocuments.us/reader035/viewer/2022062422/568139e4550346895da19b9b/html5/thumbnails/19.jpg)
Datapath Organization contd.Adding an additional input to MUX D to implement
the Set if Less Than (SLT) instruction. It is 1 when N is 1 and V is 0, or N is 0 and V is 1.
A final difference is that the register file is no longer edge triggered and is no longer a part of a pipeline platform at the end of the write-back (WB) stage.
In the second half of the cycle, it is possible to read data written into the register file during the first half of the same clock cycle. It is called a read-after-write register file.
![Page 20: CMPT 250 Computer Architecture](https://reader035.fdocuments.us/reader035/viewer/2022062422/568139e4550346895da19b9b/html5/thumbnails/20.jpg)
Control OrganizationSH is added to IR, CS is added to the instruction
decoder, MD is expanded to two bits.
MUX C selects from three different sources for the next value of PC.
BrA is formed from the sum of the updated PC value for the branch instruction and the target offset.
BAA is used for the register jump.
BS, PS and Z are used to select the next PC value.
![Page 21: CMPT 250 Computer Architecture](https://reader035.fdocuments.us/reader035/viewer/2022062422/568139e4550346895da19b9b/html5/thumbnails/21.jpg)
Control Organization contd.To determine the control codes, the CPU is
viewed much as is the single cycle CPU.
However, it is important to examine the timing carefully to be sure that various parts of the register transfer statement take place in the right stage of the pipeline.
Note that BrA and RAA are obtained in the EX stage.
![Page 22: CMPT 250 Computer Architecture](https://reader035.fdocuments.us/reader035/viewer/2022062422/568139e4550346895da19b9b/html5/thumbnails/22.jpg)
More on Instruction Set ArchitectureThe format of an instruction is depicted in a
rectangular box symbolizing the bits of the binary instruction.
The bits are divided into groups called fields.An opcode field.An address field.A mode field, which specifies the way the
address field is to be interpreted.
![Page 23: CMPT 250 Computer Architecture](https://reader035.fdocuments.us/reader035/viewer/2022062422/568139e4550346895da19b9b/html5/thumbnails/23.jpg)
Operand AddressingTo illustrate the influence of the number of
operands on computer programs, we will evaluate the arithmetic statement X=(A+B)(C+D).
Three address instructions:ADD T1, A, B M[T1]<-M[A]+M[B]ADD T2, C, D M[T2]<-M[C]+M[D]MUL X, T1, T2 M[X]<=M[T1]*M[T2]OrADD R1, A, B R1<-M[A]+M[B]ADD R2, C, D R2<-M[C]+M[D]MUL X, R1, R2 M[X]<=R1*R2
![Page 24: CMPT 250 Computer Architecture](https://reader035.fdocuments.us/reader035/viewer/2022062422/568139e4550346895da19b9b/html5/thumbnails/24.jpg)
Operand Addressing contd. Two-Address Instructions
MOVE T1, A M[T1]<-M[A] ADD T1, B M[T1]<-M[T1]+M[B] MOVE X, C M[X]<-M[C] ADD X, D M[X]<-M[X]+M[D] MUL X, T1 M[X]<-M[X]*M[T1]
One-Address Instructions LD A ACC<-M[A] ADD B ACC<-ACC+M[B] ST X M[X]<-ACC LD C ACC<-M[C] ADD D ACC<-ACC+M[D] MUL X ACC<-ACC*M[X] ST X M[X]<-ACC
![Page 25: CMPT 250 Computer Architecture](https://reader035.fdocuments.us/reader035/viewer/2022062422/568139e4550346895da19b9b/html5/thumbnails/25.jpg)
Zero-Address InstructionsWe use a stack. The top of the stack is
referred to as TOS. The word below is TOS-1.PUSH A TOS<-M[A]PUSH B TOS<-M[B]ADD TOS<-TOS+TOS-1.PUSH C TOS<-M[C]PUSH D TOS<-M[D]ADD TOS<-TOS+TOS-1
MUL TOS<-TOS*TOS-1
POP X M[X]<-TOS
![Page 26: CMPT 250 Computer Architecture](https://reader035.fdocuments.us/reader035/viewer/2022062422/568139e4550346895da19b9b/html5/thumbnails/26.jpg)
Addressing ModesThe addressing mode of an instruction
specifies a rule for interpreting or modifying the address field of the instruction.
The address of the operand produced by such a rule is called the effective address. Give programming flexibility to the user.To reduce the number of bits in the address
fields of the instruction.
![Page 27: CMPT 250 Computer Architecture](https://reader035.fdocuments.us/reader035/viewer/2022062422/568139e4550346895da19b9b/html5/thumbnails/27.jpg)
Addressing Modes contd.Implied Mode: the operand is specified implicitly in
the opcode, e.g. ADD in a stack computer.
Immediate Mode: LDI R0, 3
Register and Register-Indirect ModesRegister Mode: the address field specifies a register.Register-Indirect Mode: the address field specifies a
register whose content gives the address of the operand in memory.
Auto Increment/Decrement Mode:ADD (R1)+,3 M[R1]<-M[R1]+3, R1<-R1+1
![Page 28: CMPT 250 Computer Architecture](https://reader035.fdocuments.us/reader035/viewer/2022062422/568139e4550346895da19b9b/html5/thumbnails/28.jpg)
Addressing Mode contd.Direct Addressing Mode: the address field of the
instruction gives the address of the operand in memory.
Indirect Addressing Mode: the address field of the instruction gives the address at which the effective address is stored in memory.
Relative Addressing Mode:Effective address = Address part of the instruction + PC
![Page 29: CMPT 250 Computer Architecture](https://reader035.fdocuments.us/reader035/viewer/2022062422/568139e4550346895da19b9b/html5/thumbnails/29.jpg)
Addressing Mode contd.Index Addressing Mode: the content of an
index register is added to the address part of the instruction to obtain the effective address.
The index register may be a special CPU register or simply a register in a register file, e.g. for arrays.
The Base-Register Mode: the contents of a base register are added to the address part of the instruction to obtain the effective address.
![Page 30: CMPT 250 Computer Architecture](https://reader035.fdocuments.us/reader035/viewer/2022062422/568139e4550346895da19b9b/html5/thumbnails/30.jpg)
Addressing Modes ExamplesOpcode: Load to ACC
PC=250
R1=400
ACC
250 251 252
400
500
752
800
900
Memory
![Page 31: CMPT 250 Computer Architecture](https://reader035.fdocuments.us/reader035/viewer/2022062422/568139e4550346895da19b9b/html5/thumbnails/31.jpg)
Addressing Modes Examples contd.
Addressing mode Mnemonic Register Transfer Effective address
Contents of ACC
Immediate
Direct
Indirect
Relative
Index
Register
Register-Indirect
LDA ADRS
LDA #NBR
LDA [ADRS]
LDA $ADRS
LDA ADRS(R1)
LDA R1
LDA (R1)
ACC M[ADRS]
ACC NBR
ACC M[M[ADRS]]
ACC M[ADRS+PC]
ACC M[ADRS+R1]
ACC R1
ACC M[R1]
500
251
800
752
900
-----
400
800
500
300
600
200
400
700
![Page 32: CMPT 250 Computer Architecture](https://reader035.fdocuments.us/reader035/viewer/2022062422/568139e4550346895da19b9b/html5/thumbnails/32.jpg)
CISC ArchitectureThe goal of the CISC architecture is to match more
closely the operations used in programming language and to provide instructions that facilitate compact programs and conserve memory.
A purely CISC architecture has the following properties: Memory access is directly available to most types of
instructions. Addressing modes are substantial in number. Instruction formats are of different lengths. Instructions perform both elementary and complex
operations.
![Page 33: CMPT 250 Computer Architecture](https://reader035.fdocuments.us/reader035/viewer/2022062422/568139e4550346895da19b9b/html5/thumbnails/33.jpg)
THANKS!