ECE 172 Digital Systems Chapter 3 Registers - Computer Action...
Transcript of ECE 172 Digital Systems Chapter 3 Registers - Computer Action...
![Page 1: ECE 172 Digital Systems Chapter 3 Registers - Computer Action …web.cecs.pdx.edu/~herb/ece172s18/l03_registers.pdf · level of abstraction for AL higher than for RTL l Instead, RTL](https://reader033.fdocuments.us/reader033/viewer/2022042103/5e80a117f6fbc21794598102/html5/thumbnails/1.jpg)
1
ECE 172 Digital Systems
Chapter 3Registers
Herbert G. Mayer, PSUStatus 7/12/2018
![Page 2: ECE 172 Digital Systems Chapter 3 Registers - Computer Action …web.cecs.pdx.edu/~herb/ece172s18/l03_registers.pdf · level of abstraction for AL higher than for RTL l Instead, RTL](https://reader033.fdocuments.us/reader033/viewer/2022042103/5e80a117f6fbc21794598102/html5/thumbnails/2.jpg)
2
Syllabusl Definitions, Introductionl Register Transfer & RTLl Register Shift Operationsl Register Windowsl Vector Registersl Score Boardl Zero Register Architecturel Register Dependenciesl Actual Register Setsl Bibliography
![Page 3: ECE 172 Digital Systems Chapter 3 Registers - Computer Action …web.cecs.pdx.edu/~herb/ece172s18/l03_registers.pdf · level of abstraction for AL higher than for RTL l Instead, RTL](https://reader033.fdocuments.us/reader033/viewer/2022042103/5e80a117f6fbc21794598102/html5/thumbnails/3.jpg)
3
Definitions Topic: Register; AKA machine register; AKA
processor register
l A machine register is an ISA visible system resource holding data of specific size; data can be accessed + processed fast. Data size generally: word
l Getting data from memory into register is referred to as loading; moving bits from register to memory is called storing
l Generally, information in registers can be processed fast, faster than any other data in digital systems
l Register size dictated by computer architecture; e.g. 64-bit architecture has registers holding 64 bits of information; may be data, addresses or other
l CPU may have many registers (Itanium) or few (x86)l Registers are identified via index, i.e. their name
![Page 4: ECE 172 Digital Systems Chapter 3 Registers - Computer Action …web.cecs.pdx.edu/~herb/ece172s18/l03_registers.pdf · level of abstraction for AL higher than for RTL l Instead, RTL](https://reader033.fdocuments.us/reader033/viewer/2022042103/5e80a117f6fbc21794598102/html5/thumbnails/4.jpg)
4
Definitionsl Register is a CPU resource holding operands for
computation; different operands at different timesl Register is key resource in digital systems: to store
information while powered on; register content is volatilel CPU may have 0 (yes zero) or more user-visible registersl Register operand may be source for computationl Or destination, holding next result after operationl Registers may be both, source and destination, in which
case computation changes the original source operandl Type of architecture with registers: general purpose
register architecture, GPRAl Not all computer architectures have registersl For example, stack machines hold operands on top
of a stack; stack grows and shrinks; has 0 user-visible registers!
![Page 5: ECE 172 Digital Systems Chapter 3 Registers - Computer Action …web.cecs.pdx.edu/~herb/ece172s18/l03_registers.pdf · level of abstraction for AL higher than for RTL l Instead, RTL](https://reader033.fdocuments.us/reader033/viewer/2022042103/5e80a117f6fbc21794598102/html5/thumbnails/5.jpg)
5
Definitionsl Early architectures, AKA von Neumann machines, or
Princeton architecture, had but 1 register, known as accumulator
l AKA single-accumulator architecture SAA!l Note: Architectural registers of CPU are user visiblel However, real HW implementations of any defined
architecture may provide hidden registerl For example, ancient Intel x86 processor nowadays
has many internal registers that are not user visiblel They are only available to the running HW which
may manage them to speed up executionl Also, Register windows on Sparc may have more
internal –i.e. not visible– registers to speed up execution, by performing register saving and restoring at calls and returns; not detailed here
![Page 6: ECE 172 Digital Systems Chapter 3 Registers - Computer Action …web.cecs.pdx.edu/~herb/ece172s18/l03_registers.pdf · level of abstraction for AL higher than for RTL l Instead, RTL](https://reader033.fdocuments.us/reader033/viewer/2022042103/5e80a117f6fbc21794598102/html5/thumbnails/6.jpg)
6
Definitions
Von Neumann Architecture, © teach-ict
![Page 7: ECE 172 Digital Systems Chapter 3 Registers - Computer Action …web.cecs.pdx.edu/~herb/ece172s18/l03_registers.pdf · level of abstraction for AL higher than for RTL l Instead, RTL](https://reader033.fdocuments.us/reader033/viewer/2022042103/5e80a117f6fbc21794598102/html5/thumbnails/7.jpg)
7
Introductionl Since accumulator on SAA was sole source + target
of operations, instructions never needed to explicitly name that unique register: always implied!
l Modern architectures have multiple registers, e.g. for integer ops, floating-point ops, program counter, status, segment registers, stack addresses, etc.
l Early architectures were register-starved, e.g. Intel x86; yet hidden registers in modern versions of x86 alleviate slowness due to register shortage!
l Program status register holds ALU status of last op; is indirectly visible via branch conditions; e.g. branch_if_zero, or branch_if_less, etc.
l Recent architectures (e.g. Itanium) have a large number of visible, also large number of hidden (not user visible) registers, to speed up execution through –hidden− register renaming
![Page 8: ECE 172 Digital Systems Chapter 3 Registers - Computer Action …web.cecs.pdx.edu/~herb/ece172s18/l03_registers.pdf · level of abstraction for AL higher than for RTL l Instead, RTL](https://reader033.fdocuments.us/reader033/viewer/2022042103/5e80a117f6fbc21794598102/html5/thumbnails/8.jpg)
8
Introductionl An architecture ought ☺ to have many registers!l Yet accessing any such HW resource requires
naming –via indexing– a register, AKA addressing itl Number of bits used in instruction to address register
is ⎡ log2( number-of-regs ) ⎤, increasing object code size! Where is the optimum?
l Way to reduce number of bits: to partition registers into different classes: integer, float, branch, status
l Desirable for execution speed: ideally all data reside in registers! Yet not feasible!
l Data set is way larger than total size of register file, hence this ideal is impossible
l Architecture solution via memory hierarchy: register, slower registers, cache, slower cache, memory, etc.
![Page 9: ECE 172 Digital Systems Chapter 3 Registers - Computer Action …web.cecs.pdx.edu/~herb/ece172s18/l03_registers.pdf · level of abstraction for AL higher than for RTL l Instead, RTL](https://reader033.fdocuments.us/reader033/viewer/2022042103/5e80a117f6fbc21794598102/html5/thumbnails/9.jpg)
9
IntroductionMemory hierarchy: registers & other HW modules holding data
![Page 10: ECE 172 Digital Systems Chapter 3 Registers - Computer Action …web.cecs.pdx.edu/~herb/ece172s18/l03_registers.pdf · level of abstraction for AL higher than for RTL l Instead, RTL](https://reader033.fdocuments.us/reader033/viewer/2022042103/5e80a117f6fbc21794598102/html5/thumbnails/10.jpg)
10
IntroductionAnother, common view of memory hierarchy: Registers at top
![Page 11: ECE 172 Digital Systems Chapter 3 Registers - Computer Action …web.cecs.pdx.edu/~herb/ece172s18/l03_registers.pdf · level of abstraction for AL higher than for RTL l Instead, RTL](https://reader033.fdocuments.us/reader033/viewer/2022042103/5e80a117f6fbc21794598102/html5/thumbnails/11.jpg)
11
Logical Register
l Logical register (LR) is a hypothetical machine resource to hold data as operands for computation, addressing, indexing, decision making, etc.
l Convenient model for discussing architecturel Logical registers are used as abstract design tool to
explore or refine a computer architecturel To propose code sequences for simulation etc.l LR doesn’t suffer ☺ from physical constraints, such
as slowness, limit of data size, number of unitsl As eventual result of computer design process, LR
may end up defining key attributes for a to-be-defined actual register of a digital system being built
![Page 12: ECE 172 Digital Systems Chapter 3 Registers - Computer Action …web.cecs.pdx.edu/~herb/ece172s18/l03_registers.pdf · level of abstraction for AL higher than for RTL l Instead, RTL](https://reader033.fdocuments.us/reader033/viewer/2022042103/5e80a117f6fbc21794598102/html5/thumbnails/12.jpg)
12
Physical Registerl A physical or processor register (PR) is a machine
resource holding data as operands for addressing, computing, and decision making, etc.
l Each PR has a unique name, specific width, defined data types, and set of operations
l Width (number of bits) of a PR is defined by architecture of digital system, for which the PR is a resource
l Or a PR may be defined by the maximum precision of the data it is ever expected to compute
l Frequently, these two precisions are identical; e.g. on a 32-bit architecture the maximum numeric precision for integer of floating-point data was also 32 bits
l But they may differ; e.g. during the evolution of 32-bit architectures, wider numeric precisions such as 64-bit integer or float data became commonplace
![Page 13: ECE 172 Digital Systems Chapter 3 Registers - Computer Action …web.cecs.pdx.edu/~herb/ece172s18/l03_registers.pdf · level of abstraction for AL higher than for RTL l Instead, RTL](https://reader033.fdocuments.us/reader033/viewer/2022042103/5e80a117f6fbc21794598102/html5/thumbnails/13.jpg)
13
Physical Registerl Actual number of PRs dictates required number of
bits in instructions that specify register source and destination of register operations
l Old Intel x86 architecture has 4 general purpose and 4 dedicated (segment) registers; 8 total! Not a typo!
l SPARC architecture has 32 visible physical registers; yet has large number of hidden registers, available to allow smooth use of circular register window, when limit of 32 is exceeded
l After consuming all 32 (register 0 .. 31), count of next register restarts at 0; yet old register 0 must be saved
l Physical registers are actually built from flip-flops, thus are clocked, hold data between clock pulses
l PRs need reset (or clear) function at start of computation
![Page 14: ECE 172 Digital Systems Chapter 3 Registers - Computer Action …web.cecs.pdx.edu/~herb/ece172s18/l03_registers.pdf · level of abstraction for AL higher than for RTL l Instead, RTL](https://reader033.fdocuments.us/reader033/viewer/2022042103/5e80a117f6fbc21794598102/html5/thumbnails/14.jpg)
14
Physical Registerr1 Register r1: generic complete register; bits unspecified
r2 63 0
Register r2: some 64-bit register
byte1 byte0
15 8 7 0
byte3 byte2
31 24 23 16
32-bit, 4-byte register r3 r3
l Sample registers, with specified length: r2 having 64 bits; r3 only 32 bits, 4 bytes
l Register r3 shows bit indices right to left, byte addresses right to left within one word: Little Endian
l Else byte addresses increase left to right: Big Endianl Generic register r1 with length unspecified
![Page 15: ECE 172 Digital Systems Chapter 3 Registers - Computer Action …web.cecs.pdx.edu/~herb/ece172s18/l03_registers.pdf · level of abstraction for AL higher than for RTL l Instead, RTL](https://reader033.fdocuments.us/reader033/viewer/2022042103/5e80a117f6fbc21794598102/html5/thumbnails/15.jpg)
15
Register Transfer &
Register Transfer Language
![Page 16: ECE 172 Digital Systems Chapter 3 Registers - Computer Action …web.cecs.pdx.edu/~herb/ece172s18/l03_registers.pdf · level of abstraction for AL higher than for RTL l Instead, RTL](https://reader033.fdocuments.us/reader033/viewer/2022042103/5e80a117f6fbc21794598102/html5/thumbnails/16.jpg)
16
Register Transfer Language RTLl Register transfer language (RTL) is not a general
purpose programming language (PL): PL offers higher-level abstractions to discuss computing environment
l RTL is not an assembly language (AL): Executable operations of AL map directly onto CPU instructions: level of abstraction for AL higher than for RTL
l Instead, RTL is a low-level language, used to define digital systems including key processor component; RTL specifies operations on registers or between multiple registers with one another
l RTL shares certain operations with high level PLs or ALs: e.g. transferring bits, copying bits, specifying bit ranges, zeroing bit fields, or shifting bits, etc.
l . . . where source or destination or both are registers
![Page 17: ECE 172 Digital Systems Chapter 3 Registers - Computer Action …web.cecs.pdx.edu/~herb/ece172s18/l03_registers.pdf · level of abstraction for AL higher than for RTL l Instead, RTL](https://reader033.fdocuments.us/reader033/viewer/2022042103/5e80a117f6fbc21794598102/html5/thumbnails/17.jpg)
17
Register Transfer Language
l Reason for continued use of HDL is logic synthesisl HDL description of a system can be written in
intermediate language, AKA RTLl Logic synthesis tools can convert a HW description
into interconnection of simple components that implement such circuit!
l . . . and can transform RTL specification of a circuit in HDL into an equivalent netlist
l Optimized netlist with storage elements and with combinational logic
l Netlist can be mapped into actual IC layoutl Becoming basis for IC manufacturing
![Page 18: ECE 172 Digital Systems Chapter 3 Registers - Computer Action …web.cecs.pdx.edu/~herb/ece172s18/l03_registers.pdf · level of abstraction for AL higher than for RTL l Instead, RTL](https://reader033.fdocuments.us/reader033/viewer/2022042103/5e80a117f6fbc21794598102/html5/thumbnails/18.jpg)
18
Register Micro OperationsMicro operations (micro-ops) are low-level (primitive) operations executed in digital systems, involving register operands, such as:1. Register to register transfer micro-ops, moving bits from one to another register2. Arithmetic micro-ops, performing arithmetic on numeric data in registers 3. Logical micro-ops, performing bit manipulations on non-numeric data in registers4. Shift micro-ops, moving bits or bit fields inside a register5. Setting micro-ops, clearing to 0 or setting to 1 the selected bit fields (or all bits) inside a register
![Page 19: ECE 172 Digital Systems Chapter 3 Registers - Computer Action …web.cecs.pdx.edu/~herb/ece172s18/l03_registers.pdf · level of abstraction for AL higher than for RTL l Instead, RTL](https://reader033.fdocuments.us/reader033/viewer/2022042103/5e80a117f6fbc21794598102/html5/thumbnails/19.jpg)
19
Register Micro OperationsRTL allows various addressing modes, needed per operand location; permissible due to power of digital system HW versus more restrictive SW environment:1. Immediate: literal is immediate in opcode, e.g. r1 " #92. Implied: Operand on stack architecture, e.g. add, refers to 2 top operands stack[top] and stack[top-1]3. Register to register: natural instruction on GPR architecture, assign r1 value of r2, e.g. r1 " r24. Memory indirect case 1: operand in memory, memory address 1234 is immediate operand, and the refer to r1 " mem[ #1234 ]5. Register indirect case 2: operand is in memory, address is in another register, e.g. r1 " mem[ r2 ]6. Indexed: operand is in memory, address in register plus numeric offset n, e.g. r1 " mem[ r2 + n ]
![Page 20: ECE 172 Digital Systems Chapter 3 Registers - Computer Action …web.cecs.pdx.edu/~herb/ece172s18/l03_registers.pdf · level of abstraction for AL higher than for RTL l Instead, RTL](https://reader033.fdocuments.us/reader033/viewer/2022042103/5e80a117f6fbc21794598102/html5/thumbnails/20.jpg)
20
Register Transferl Abstract view of register transfer is akin to assignment
in SW, except RTL operands specify registers, as opposed to general program objects, e.g.:
r1 = r2; -- no predicate used: assign unconditionally!if ( predicate_p1 == true ) then r1 = r2; -- with predicate!
l In RTL more tersely expressed as:r1 " r2 -- no predicate: assign register to register!p1: r1 " r2 --with predicate, meaning: if p1 is true
l Digital circuits have one grand power ☺ other tools often lack: parallelism!
p2: r3 " r4, and r5 " r6l Multiple register transfers performed simultaneously,
provided predicate p2 holds!l In preparation for register transfer, we review Flip-
Flops, specifically D Flip-Flops; registers are built from arrays of flip-flops, one flip-flop per register bit
![Page 21: ECE 172 Digital Systems Chapter 3 Registers - Computer Action …web.cecs.pdx.edu/~herb/ece172s18/l03_registers.pdf · level of abstraction for AL higher than for RTL l Instead, RTL](https://reader033.fdocuments.us/reader033/viewer/2022042103/5e80a117f6fbc21794598102/html5/thumbnails/21.jpg)
21
Register Shift Operations
![Page 22: ECE 172 Digital Systems Chapter 3 Registers - Computer Action …web.cecs.pdx.edu/~herb/ece172s18/l03_registers.pdf · level of abstraction for AL higher than for RTL l Instead, RTL](https://reader033.fdocuments.us/reader033/viewer/2022042103/5e80a117f6fbc21794598102/html5/thumbnails/22.jpg)
22
Register Shift Opl Register shift operations (AKA shifts) move bits of a
source register a defined number of bit positions into a destination register
l Source and destination may be different registers, or could be the same
l Depending on shift type, there are side effects in addition to destination register change:l Bits are discarded into a large bit bucket Jl 0-bits, 1-bits, sign-bits are pulled inl Flags are set (e.g. sign, overflow, zero, etc.)
l So called left shift moves bits of a register toward the most significant bit
l Right shift moves register bits toward least significant bit, AKA right hand side
![Page 23: ECE 172 Digital Systems Chapter 3 Registers - Computer Action …web.cecs.pdx.edu/~herb/ece172s18/l03_registers.pdf · level of abstraction for AL higher than for RTL l Instead, RTL](https://reader033.fdocuments.us/reader033/viewer/2022042103/5e80a117f6fbc21794598102/html5/thumbnails/23.jpg)
23
Register Shift Opl If bits in register are viewed as string of 0s and 1s,
shift operation may lose bits without side-effectl AKA logical view of bit values
l Shift right may pull in zeros on left hand side (high bit index)l Or shift can be arithmetic, then sign bit value is to be
considered:l Shift right extends (pulls in) sign bit on left hand sidel Shift left may cause overflow on twos-complement, if leftmost
2 bits differ during right shift! Sign change!l For arithmetic shifts, convenient to view left shift by 1
position as multiply by 2l Or right shift to divide by 2: Must extend the sign!l Sample, pseudo shift operations below:
![Page 24: ECE 172 Digital Systems Chapter 3 Registers - Computer Action …web.cecs.pdx.edu/~herb/ece172s18/l03_registers.pdf · level of abstraction for AL higher than for RTL l Instead, RTL](https://reader033.fdocuments.us/reader033/viewer/2022042103/5e80a117f6fbc21794598102/html5/thumbnails/24.jpg)
24
Register Shift OpPseudo shift instructions, not selected from any real
computer architecture:r1 ← lsl r1 -- logical shift left 1 bit same reg
r1 ← lsr r1 -- logical shift right 1 bit same reg
r1 ← lsl r2 -- logical shift left 1 bit different
r1 ← lsr r2 -- logical shift right 1 bit
r1 ← asl r2 -- arithmetic shift left 1 bit
r1 ← asr r2 -- arithmetic shift right 1 bit
r1 ← 2 lsl r2 -- logical shift left 2 bits
r1 ← 3 lsr r2 -- logical shift right 3 bits
r1 ← 4 asl r2 -- arithmetic shift left 4 bits
r1 ← 5 asr r2 -- arithmetic shift right 5 bits
r1 ← rotl r2 -- rotate r2 left 1 bit, result in r1
![Page 25: ECE 172 Digital Systems Chapter 3 Registers - Computer Action …web.cecs.pdx.edu/~herb/ece172s18/l03_registers.pdf · level of abstraction for AL higher than for RTL l Instead, RTL](https://reader033.fdocuments.us/reader033/viewer/2022042103/5e80a117f6fbc21794598102/html5/thumbnails/25.jpg)
25
Register Shift Op 8-bits
![Page 26: ECE 172 Digital Systems Chapter 3 Registers - Computer Action …web.cecs.pdx.edu/~herb/ece172s18/l03_registers.pdf · level of abstraction for AL higher than for RTL l Instead, RTL](https://reader033.fdocuments.us/reader033/viewer/2022042103/5e80a117f6fbc21794598102/html5/thumbnails/26.jpg)
26
Register Shift Op 8-bitsOperation Pseudo op r2 before r1 afterRight logical r1 ← lsr r2 1011,0111 0101,1011Logical shift right 1 position: leftmost bit ‘1’ in r2 is not interpreted as sign bit, thus ‘0’s pulled in from left
Operation Pseudo op r2 before r1 afterRight artihmet. r1 ← asr r2 1011,0111 1101,1011Arithmetic shift right 1, leftmost bit ‘1’ in r2 “interpreted” by HW as sign bit, thus ‘1’s pulled in, i.e. sign-extended
Operation Pseudo op r2 before r1 afterLeft logical r1 ← 2 lsl r2 1011,0111 1101,1100Logical shift left, 2 positions, left 2 bits into bit bucket; and from right side, two ‘0’s are pulled in
![Page 27: ECE 172 Digital Systems Chapter 3 Registers - Computer Action …web.cecs.pdx.edu/~herb/ece172s18/l03_registers.pdf · level of abstraction for AL higher than for RTL l Instead, RTL](https://reader033.fdocuments.us/reader033/viewer/2022042103/5e80a117f6fbc21794598102/html5/thumbnails/27.jpg)
27
Register Shift Arithmetic
l Arithmetic shift left asl pulls in ‘0’s from right sidel Views bits as signed two’s complement numbersl Sign bit change causes overflow; to be flagged?
high order sign bit lost
arithmetic shift left: asl low order input: ‘0’s
sign bit is extended
arithmetic shift right: asr low order bits lost
l Arithmetic shift right asr extends sign bitl In contrast, lsr pulls in ‘0’s from left hand side
![Page 28: ECE 172 Digital Systems Chapter 3 Registers - Computer Action …web.cecs.pdx.edu/~herb/ece172s18/l03_registers.pdf · level of abstraction for AL higher than for RTL l Instead, RTL](https://reader033.fdocuments.us/reader033/viewer/2022042103/5e80a117f6fbc21794598102/html5/thumbnails/28.jpg)
28
Register Shift Logical
l Logical shift l views bits as string, not signed numberl Hence lsl pulls in ‘0’s from right hand side
high order bits lost
logical shift left: lsl low order input: ‘0’s
high order Input: ‘0’s
logical shift right: lsr low order bits lost
l Logical shift r views bits as string, not signed numberl Hence lsr also pulls in ‘0’s from left hand side
![Page 29: ECE 172 Digital Systems Chapter 3 Registers - Computer Action …web.cecs.pdx.edu/~herb/ece172s18/l03_registers.pdf · level of abstraction for AL higher than for RTL l Instead, RTL](https://reader033.fdocuments.us/reader033/viewer/2022042103/5e80a117f6fbc21794598102/html5/thumbnails/29.jpg)
29
Register Shift Circular
l Circular shift left sees bit string, not signed numberl No bits lost during csl, leftmost bits pulled in on right
l Circular shift right sees bit string, not signed numberl No bits lost during csr, rightmost bits pulled in on left
high order bits move to low end
circular shift left: csl low order bits move to left
high order bits move to right
circular shift right: csr low order bits move to high end
![Page 30: ECE 172 Digital Systems Chapter 3 Registers - Computer Action …web.cecs.pdx.edu/~herb/ece172s18/l03_registers.pdf · level of abstraction for AL higher than for RTL l Instead, RTL](https://reader033.fdocuments.us/reader033/viewer/2022042103/5e80a117f6fbc21794598102/html5/thumbnails/30.jpg)
30
Register Windows
![Page 31: ECE 172 Digital Systems Chapter 3 Registers - Computer Action …web.cecs.pdx.edu/~herb/ece172s18/l03_registers.pdf · level of abstraction for AL higher than for RTL l Instead, RTL](https://reader033.fdocuments.us/reader033/viewer/2022042103/5e80a117f6fbc21794598102/html5/thumbnails/31.jpg)
31
Register Windowl SPARC architecture popularized register windowsl All visible registers in set 0 are indexed from 00 .. n-10l When another is needed from set 0 after n-1 have
been used, restart using register 01 again, acting as if it (register 00) were free
l But if 00 is actually still needed, let HW transparently back it up in hidden register file, simultaneously with continuing execution
l Later to be restored, when the overflowing register 01 is no longer needed; thus 00 comes back to lifel Requires more physical registers, many invisible,
thus needs more control HW, but does not extend index bits for registers in instruction to encode particular register resource; beyond log2( n ) with n being number of registers: no code bloat! ☺
![Page 32: ECE 172 Digital Systems Chapter 3 Registers - Computer Action …web.cecs.pdx.edu/~herb/ece172s18/l03_registers.pdf · level of abstraction for AL higher than for RTL l Instead, RTL](https://reader033.fdocuments.us/reader033/viewer/2022042103/5e80a117f6fbc21794598102/html5/thumbnails/32.jpg)
32
Register WindowCyclic register naming
![Page 33: ECE 172 Digital Systems Chapter 3 Registers - Computer Action …web.cecs.pdx.edu/~herb/ece172s18/l03_registers.pdf · level of abstraction for AL higher than for RTL l Instead, RTL](https://reader033.fdocuments.us/reader033/viewer/2022042103/5e80a117f6fbc21794598102/html5/thumbnails/33.jpg)
33
Vector Registers
![Page 34: ECE 172 Digital Systems Chapter 3 Registers - Computer Action …web.cecs.pdx.edu/~herb/ece172s18/l03_registers.pdf · level of abstraction for AL higher than for RTL l Instead, RTL](https://reader033.fdocuments.us/reader033/viewer/2022042103/5e80a117f6fbc21794598102/html5/thumbnails/34.jpg)
34
Vector Register Architecture VRAl Registers on VRA are implemented as a HW array
of functionally identical registers, named: vri[j], i = 0 .. n-1, and j = 0 .. m-1, AKA vector registersl VRA may have scalar registers, named r0, r1, etc.l Vector registers vri[*] can each load/store blocks
of contiguous datal Still in sequence, but overlapped; number of
clocks to complete a full vector load/store depends on bus width
l Vector registers perform multiple identical operations on contiguous blocks of operands
![Page 35: ECE 172 Digital Systems Chapter 3 Registers - Computer Action …web.cecs.pdx.edu/~herb/ece172s18/l03_registers.pdf · level of abstraction for AL higher than for RTL l Instead, RTL](https://reader033.fdocuments.us/reader033/viewer/2022042103/5e80a117f6fbc21794598102/html5/thumbnails/35.jpg)
35
Vector Register ArchitectureVRA operates sequentially; but processes n ≥ 1
vector operands in n registers simultaneously: faster than n sequential, scalar ops!
![Page 36: ECE 172 Digital Systems Chapter 3 Registers - Computer Action …web.cecs.pdx.edu/~herb/ece172s18/l03_registers.pdf · level of abstraction for AL higher than for RTL l Instead, RTL](https://reader033.fdocuments.us/reader033/viewer/2022042103/5e80a117f6fbc21794598102/html5/thumbnails/36.jpg)
36
Vector Register ArchitectureGraph shows parallel data processing in one single
operation, using multiple registers
![Page 37: ECE 172 Digital Systems Chapter 3 Registers - Computer Action …web.cecs.pdx.edu/~herb/ece172s18/l03_registers.pdf · level of abstraction for AL higher than for RTL l Instead, RTL](https://reader033.fdocuments.us/reader033/viewer/2022042103/5e80a117f6fbc21794598102/html5/thumbnails/37.jpg)
37
Vector Register Architecturel Otherwise operations look similar to GPR architecturel Sample vector operations, assuming 64-unit ops:
ldv vr1, memi -- loads 64 memory locs from [mem+i=0..63]stv vr2, memj -- stores vr2[0..63] in 64 contig. locsvadd vr1, vr2, vr3 -- register-register vector addcvaddf r0, vr1, vr2, vr3 -- semantics: condition via bit in r0-- sequential equivalent:for i = 0 to 63 do
if biti in r0 = 1 then vr1[i] = vr2[i] + vr3[i]else – must be 0 -- do not move corresponding bits into vr1[i]end if
end for
-- parallel syntax equivalent:forall i = 0 to 63 doparallel -- parallel semantics
if bit i in r0 = 1 then vr1[i] = vr2[i] + vr3[i]end if
end parallel for
![Page 38: ECE 172 Digital Systems Chapter 3 Registers - Computer Action …web.cecs.pdx.edu/~herb/ece172s18/l03_registers.pdf · level of abstraction for AL higher than for RTL l Instead, RTL](https://reader033.fdocuments.us/reader033/viewer/2022042103/5e80a117f6fbc21794598102/html5/thumbnails/38.jpg)
38
Score Board
![Page 39: ECE 172 Digital Systems Chapter 3 Registers - Computer Action …web.cecs.pdx.edu/~herb/ece172s18/l03_registers.pdf · level of abstraction for AL higher than for RTL l Instead, RTL](https://reader033.fdocuments.us/reader033/viewer/2022042103/5e80a117f6fbc21794598102/html5/thumbnails/39.jpg)
39
Score Boardl Score-board sb[*] supports out of order
executionl Is not user visible, hence not accessible to
programmer! EE students need to know ☺l Instead, score-board sb[*] is array of HW
programmable bits, or single-bit registers named sb[*], each identified by index; not visible in ISA! Owned by processor HW!
l Score-board manages actual HW registersl Is single-bit HW array sb[]l Every bit i in sb[i] is associated with one of
the real, specific registers: the one identified by index i , e.g. ri
![Page 40: ECE 172 Digital Systems Chapter 3 Registers - Computer Action …web.cecs.pdx.edu/~herb/ece172s18/l03_registers.pdf · level of abstraction for AL higher than for RTL l Instead, RTL](https://reader033.fdocuments.us/reader033/viewer/2022042103/5e80a117f6fbc21794598102/html5/thumbnails/40.jpg)
40
Score Board
l Association by index: sb[i] belongs to reg ril Only if score board sb[i] = 0 does register ri hold valid data; else must wait! Do not access!l Also a load register ri may proceed if sb[i] = 0l Or we can say, if sb[i] = 0, then register ri is
currently NOT in the process of being writtenl If bit i is set, i.e. if sb[i] = 1, that register ri is
reserved, i.e. it is off limits for the moment; HW must wait, until sb[i] = 0
l Initially all sb[*] are free to use, i.e. all are set to: sb[i] = 0
![Page 41: ECE 172 Digital Systems Chapter 3 Registers - Computer Action …web.cecs.pdx.edu/~herb/ece172s18/l03_registers.pdf · level of abstraction for AL higher than for RTL l Instead, RTL](https://reader033.fdocuments.us/reader033/viewer/2022042103/5e80a117f6fbc21794598102/html5/thumbnails/41.jpg)
41
Score Boardl Execution constraints, assume:
rd ← rs op rtl If either sb[s] or sb[t] are being set: → RAW
dependence, hence HW stalls computation; wait until both rs and rt are available, i.e. until sb[s] = 0 and sb[t] = 0
l if sb[d] is set→ WAW dependence, hence HW stalls the write; waits until rd has been used; processor or even SW (compiler) can sometimes determine to use another register instead of rd that is known to be free
l Else, if none of the 3 registers are in use, i.e. if all score board entries s, t, and d are 0, then HW can dispatch instruction immediately
![Page 42: ECE 172 Digital Systems Chapter 3 Registers - Computer Action …web.cecs.pdx.edu/~herb/ece172s18/l03_registers.pdf · level of abstraction for AL higher than for RTL l Instead, RTL](https://reader033.fdocuments.us/reader033/viewer/2022042103/5e80a117f6fbc21794598102/html5/thumbnails/42.jpg)
42
Score Board & ooo ExecutionTo allow out of order (ooo) execution, by using any available ri and rj
1. For uses (AKA references), HW may take any register i, whose sb[i] is 0
2. For definitions (AKA assignments), HW may set any register j, whose sb[j] is 0
3. Independent of original order, in which source program was written, i.e. possibly ooo
4. Provided, in the end all ISA visible registers hold the intended, programmed results
![Page 43: ECE 172 Digital Systems Chapter 3 Registers - Computer Action …web.cecs.pdx.edu/~herb/ece172s18/l03_registers.pdf · level of abstraction for AL higher than for RTL l Instead, RTL](https://reader033.fdocuments.us/reader033/viewer/2022042103/5e80a117f6fbc21794598102/html5/thumbnails/43.jpg)
43
Score Board & ooo Executionl Out of order execution (ooo), AKA dynamic execution l CDC supercomputers broke complex instruction (e.g.
FP divide) into a semantically equivalent sequence of simpler FP sub-operations
l Each of which could be executed very swiftlyl On pipelined architecture, numerous sub-operations or
multiple instructions are live and make progress in various phases of completionl First invented for CDC 6600 during late 1960sl IBM 360/91 during 1970s, Tomasulo’s genuine ooo algorithml IBM POWER1 μP in 1990l Intel x86 family, since 1995 on Pentium Pro®
![Page 44: ECE 172 Digital Systems Chapter 3 Registers - Computer Action …web.cecs.pdx.edu/~herb/ece172s18/l03_registers.pdf · level of abstraction for AL higher than for RTL l Instead, RTL](https://reader033.fdocuments.us/reader033/viewer/2022042103/5e80a117f6fbc21794598102/html5/thumbnails/44.jpg)
44
Score Board & ooo Executionl Multiple sub-operations progress simultaneously, yet
in any order, that’s why we name it: oool As long as the retiring order is logically equivalent to
sequential operation of original instruction sequencel Detail of ooo execution paradigm:
1. Fetch next instruction i2. Dispatch i to instruction queue, AKA reservation station3. Then i waits in queue until input operands are available4. When available, then i can leave queue, and run possibly even
before earlier, older instructions5. i is issued to appropriate functional unit for execution6. Results are queued up, to preserve original order7. Once older instructions have written back results to register
file rn, then i’s result is written back to rd −called retire stage, with rd being instruction i’s destination register; i.e. holding the result
![Page 45: ECE 172 Digital Systems Chapter 3 Registers - Computer Action …web.cecs.pdx.edu/~herb/ece172s18/l03_registers.pdf · level of abstraction for AL higher than for RTL l Instead, RTL](https://reader033.fdocuments.us/reader033/viewer/2022042103/5e80a117f6fbc21794598102/html5/thumbnails/45.jpg)
45
Zero Register Architecture
![Page 46: ECE 172 Digital Systems Chapter 3 Registers - Computer Action …web.cecs.pdx.edu/~herb/ece172s18/l03_registers.pdf · level of abstraction for AL higher than for RTL l Instead, RTL](https://reader033.fdocuments.us/reader033/viewer/2022042103/5e80a117f6fbc21794598102/html5/thumbnails/46.jpg)
46
Zero Registers?l Zero register architectures are known as stack
machinesl Semi-tongue-in-cheek claim is: “Registers are not
needed for computing!”l As long as a stack is available −stack just being a
policy of accessing main memory− all computations can be done without registers
l Technically correct!l Yet all such computations will be slow; as operations
are completed solely with memory operands! l Speed gap is several decimal orders of magnitude
worse, to the heavy loss of the stack machine, i.e. to the loss of the zero register architecture!
![Page 47: ECE 172 Digital Systems Chapter 3 Registers - Computer Action …web.cecs.pdx.edu/~herb/ece172s18/l03_registers.pdf · level of abstraction for AL higher than for RTL l Instead, RTL](https://reader033.fdocuments.us/reader033/viewer/2022042103/5e80a117f6fbc21794598102/html5/thumbnails/47.jpg)
47
Code For Stack Architecturel Solution? Implement a few top of stack
elements via HW shadow registers ⇒ Cachel Let us compare equivalent code sequences
with and without consideration of a cachel The top-of-stack register “tos” points to the
last (topmost) valid word on physical stackl Two hidden shadow registers may hold 0, 1,
or 2 true top of stack wordsl Top of stack cache counter tcc specifies
number of shadow registers actually usedl Thus tos plus tcc jointly specify the true top
of stack
![Page 48: ECE 172 Digital Systems Chapter 3 Registers - Computer Action …web.cecs.pdx.edu/~herb/ece172s18/l03_registers.pdf · level of abstraction for AL higher than for RTL l Instead, RTL](https://reader033.fdocuments.us/reader033/viewer/2022042103/5e80a117f6fbc21794598102/html5/thumbnails/48.jpg)
48
Abstract Stack Architecture
free free
0,1,20,1,2
tcc tcc
2 tos registers 2 tos registers
stack stack
tos tos
![Page 49: ECE 172 Digital Systems Chapter 3 Registers - Computer Action …web.cecs.pdx.edu/~herb/ece172s18/l03_registers.pdf · level of abstraction for AL higher than for RTL l Instead, RTL](https://reader033.fdocuments.us/reader033/viewer/2022042103/5e80a117f6fbc21794598102/html5/thumbnails/49.jpg)
49
Code For Stack Architecturel Timings for push, pushlit, add, pop operations
depend on top of stack cachel Operations in shadow registers are fast, typically 1
cycle; includes register access and the operationl Generally, memory access adds numerous cyclesl To track dynamic changes of the stack, use some
defined policy, say try to keep top 50% fulll Table below refines timings for stack with
transparent shadow registersl For example, pushing element mem[ x ] into top of
stack cache, we arbitrarily define this requires 2 cycles; due to the memory fetch
l Note: 2 cycles for memory access, highly idealized! In reality more likely multiple tens of cycles!
![Page 50: ECE 172 Digital Systems Chapter 3 Registers - Computer Action …web.cecs.pdx.edu/~herb/ece172s18/l03_registers.pdf · level of abstraction for AL higher than for RTL l Instead, RTL](https://reader033.fdocuments.us/reader033/viewer/2022042103/5e80a117f6fbc21794598102/html5/thumbnails/50.jpg)
50
Code For Stack Architecture
operation Cycles tcc before tcc after tos change comment add 1 tcc = 2 tcc = 1 no change add 1+2 tcc = 1 tcc = 1 tos-- underflow? add 1+2+2 tcc = 0 tcc = 1 tos -= 2 underflow? push x 2 tcc = 0,1 tcc++ no change tcc update
in parallel push x 2+2 tcc = 2 tcc = 2 tos++ overflow? pushlit #3 1 tcc = 0,1 tcc++ no change pushlit #3 1+2 tcc = 2 tcc = 2 tos++ overflow? pop y 2 tcc = 1,2 tcc-- no change pop y 2+2 tcc = 0 tcc = 0 tos-- underflow?
![Page 51: ECE 172 Digital Systems Chapter 3 Registers - Computer Action …web.cecs.pdx.edu/~herb/ece172s18/l03_registers.pdf · level of abstraction for AL higher than for RTL l Instead, RTL](https://reader033.fdocuments.us/reader033/viewer/2022042103/5e80a117f6fbc21794598102/html5/thumbnails/51.jpg)
51
Code For Stack Architecturel Code emission for source snippet on SA:
a + b * c ^ ( d + e * f ^ g )l Let + and * be commutative, conventional
language rulel Architecture here has 2 shadow registersl Assembly language programmer (or HLL
compiler) exploits thisl Assume initially empty 2-word cache
![Page 52: ECE 172 Digital Systems Chapter 3 Registers - Computer Action …web.cecs.pdx.edu/~herb/ece172s18/l03_registers.pdf · level of abstraction for AL higher than for RTL l Instead, RTL](https://reader033.fdocuments.us/reader033/viewer/2022042103/5e80a117f6fbc21794598102/html5/thumbnails/52.jpg)
52
# 1 Left - to - Right cycles 1 2 Exploit Cache cycles 2
1 push a 2 push f 2 2 push b 2 push g 2 3 push c 4 expo 1 4 push d 4 push e 2 5 push e 4 mult 1 6 push f 4 push d 2 7 push g 4 add 1 8 expo 1 push c 2 9 mult 3 expo 1
10 add 3 push b 2 11 expo 3 mult 1 12 mult 3 push a 2 13 add 3 add 1
Code For Stack Architecture
![Page 53: ECE 172 Digital Systems Chapter 3 Registers - Computer Action …web.cecs.pdx.edu/~herb/ece172s18/l03_registers.pdf · level of abstraction for AL higher than for RTL l Instead, RTL](https://reader033.fdocuments.us/reader033/viewer/2022042103/5e80a117f6fbc21794598102/html5/thumbnails/53.jpg)
53
Code For Stack Architecturel Brute-force code emission costs 40 cycles; i.e. failing
to take advantage of tcc knowledgel Code emission with shadow register consideration
costs 20 cyclesl True penalty for memory access is worse in practicel Tremendous speed-up always possible when fixing
system with severe flaws ☺l Return of investment for 2 registers is double the
original performance!l Such strong speedup is an indicator that the starting
architecture was poor in the first place!l Stack Machine can be fast, if purity of top-of-stack
memory-access is relaxed for performancel Indexing, looping, indirection, call/return etc. are not
addressed here
![Page 54: ECE 172 Digital Systems Chapter 3 Registers - Computer Action …web.cecs.pdx.edu/~herb/ece172s18/l03_registers.pdf · level of abstraction for AL higher than for RTL l Instead, RTL](https://reader033.fdocuments.us/reader033/viewer/2022042103/5e80a117f6fbc21794598102/html5/thumbnails/54.jpg)
54
Register Dependencies
![Page 55: ECE 172 Digital Systems Chapter 3 Registers - Computer Action …web.cecs.pdx.edu/~herb/ece172s18/l03_registers.pdf · level of abstraction for AL higher than for RTL l Instead, RTL](https://reader033.fdocuments.us/reader033/viewer/2022042103/5e80a117f6fbc21794598102/html5/thumbnails/55.jpg)
55
Register Dependenciesl Inter-instruction dependencies, in EE parlance also
known as dependences, arise between registers or memory locations being defined (AKA assigned, or written) and used (AKA read, or referenced)
l One instruction computes a result into a register (or memory); another instruction needs that result from that same register (or same memory location)
l Or, one instruction uses a register; and after use the same register is newly recomputed (written)
l Dependences cause sequential execution, lest the result is unpredictable
l On next page: op is any arithmetic/logical operation
![Page 56: ECE 172 Digital Systems Chapter 3 Registers - Computer Action …web.cecs.pdx.edu/~herb/ece172s18/l03_registers.pdf · level of abstraction for AL higher than for RTL l Instead, RTL](https://reader033.fdocuments.us/reader033/viewer/2022042103/5e80a117f6fbc21794598102/html5/thumbnails/56.jpg)
56
Register DependenciesTrue-Dependence, AKA Data Dependence: <- synonymous!r3 ← r1 op r2 1: Write r3, op is some arithmetic opcoder5 ← r3 op r4 2: Read r3 after Write, RAW
Anti-Dependence, not a true dependenceparallelize under right conditionr3 ← r1 op r2 1: Read r1r1 ← r5 op r4 2: Write r1 after Read, WAR
Output Dependence, similar to Anti-Dependence is not true dep.r3 ← r1 op r2 1: Write r3r5 ← r3 op r4 2: Read r3r3 ← r6 op r7 3: Write r3 after Write, WAW, use between
![Page 57: ECE 172 Digital Systems Chapter 3 Registers - Computer Action …web.cecs.pdx.edu/~herb/ece172s18/l03_registers.pdf · level of abstraction for AL higher than for RTL l Instead, RTL](https://reader033.fdocuments.us/reader033/viewer/2022042103/5e80a117f6fbc21794598102/html5/thumbnails/57.jpg)
57
Register DependenciesControl Dependence:
// ri, i = 1..4 come in “live”
if ( condition1 ) {
r3 = r1 op r2;
}else{ " see the jump here?
r5 = r3 op r4;
} // end if
write( r3 );
![Page 58: ECE 172 Digital Systems Chapter 3 Registers - Computer Action …web.cecs.pdx.edu/~herb/ece172s18/l03_registers.pdf · level of abstraction for AL higher than for RTL l Instead, RTL](https://reader033.fdocuments.us/reader033/viewer/2022042103/5e80a117f6fbc21794598102/html5/thumbnails/58.jpg)
58
Register Renamingl Only data dependence is a real dependence,
hence called true dependencel Other dependences are artifacts of insufficient
resources, generally insufficient registersl This means: if additional registers were
available, then replacing some of these conflicting registers with other registers, could make the conflict (dependence) disappear!
l Anti- and Output-Dependences are indeed such false dependences
![Page 59: ECE 172 Digital Systems Chapter 3 Registers - Computer Action …web.cecs.pdx.edu/~herb/ece172s18/l03_registers.pdf · level of abstraction for AL higher than for RTL l Instead, RTL](https://reader033.fdocuments.us/reader033/viewer/2022042103/5e80a117f6fbc21794598102/html5/thumbnails/59.jpg)
59
Register Renaming Original Code:-- r2, r3, r5, r6, r7 come in “live” from code before
-- r1, r4 are not “live”, don’t have initial values
-- r1, r2, r3, r4, r5, r6, r7 must go out “live”
L1: r1 ← r2 op r3
L2: r4 ← r1 op r5
L3: r1 ← r3 op r6
L4: r3 ← r1 op r7
Initial Dependences:Lx: Ly: x, y = 1..4, which dependence? Next page
![Page 60: ECE 172 Digital Systems Chapter 3 Registers - Computer Action …web.cecs.pdx.edu/~herb/ece172s18/l03_registers.pdf · level of abstraction for AL higher than for RTL l Instead, RTL](https://reader033.fdocuments.us/reader033/viewer/2022042103/5e80a117f6fbc21794598102/html5/thumbnails/60.jpg)
60
Register RenamingOriginal Code: L1: r1 ← r2 op r3
L2: r4 ← r1 op r5
L3: r1 ← r3 op r6
L4: r3 ← r1 op r7
Initial Dependences: numerous!! L1, L2 true-Dep with r1
L1, L3 output-Dep with r1
L1, L4 anti-Dep with r3
L3, L4 true-Dep with r1
L2, L3 anti-Dep with r1
L3, L4 anti-Dep with r3
![Page 61: ECE 172 Digital Systems Chapter 3 Registers - Computer Action …web.cecs.pdx.edu/~herb/ece172s18/l03_registers.pdf · level of abstraction for AL higher than for RTL l Instead, RTL](https://reader033.fdocuments.us/reader033/viewer/2022042103/5e80a117f6fbc21794598102/html5/thumbnails/61.jpg)
61
Register Renaming
l What could be changed and improved for better performance, if we had additional registers?
l Hidden or real (visible architecture) registers could be advantageous!
l Compute and use other temporaries via other registers to reduce dependences!
l May at times allows higher degree of parallelism, due to lower degree of dependence
l More parallelism è faster execution!l Register renaming conducted by HW; invisible
to assembly programmer or compiler
![Page 62: ECE 172 Digital Systems Chapter 3 Registers - Computer Action …web.cecs.pdx.edu/~herb/ece172s18/l03_registers.pdf · level of abstraction for AL higher than for RTL l Instead, RTL](https://reader033.fdocuments.us/reader033/viewer/2022042103/5e80a117f6fbc21794598102/html5/thumbnails/62.jpg)
62
Register RenamingOriginal Code: New Code, added regs, in r30 instead r3:L1: r1 ← r2 op r3 r10 ← r2 op r30 –- r30 instead
L2: r4 ← r1 op r5 r4 ← r10 op r5 –- r10 instead
L3: r1 ← r3 op r6 r1 ← r30 op r6
L4: r3 ← r1 op r7 r3 ← r1 op r7
Dependences before: Dependences after:L1, L2 true-Dep with r1 L1, L2 true-Dep with r10
L1, L3 output-Dep with r1 L3, L4 true-Dep with r1
L1, L4 anti-Dep with r3 // ri, i = 1..7 are “live”
L3, L4 true-Dep with r1
L2, L3 anti-Dep with r1
L3, L4 anti-Dep with r3
![Page 63: ECE 172 Digital Systems Chapter 3 Registers - Computer Action …web.cecs.pdx.edu/~herb/ece172s18/l03_registers.pdf · level of abstraction for AL higher than for RTL l Instead, RTL](https://reader033.fdocuments.us/reader033/viewer/2022042103/5e80a117f6fbc21794598102/html5/thumbnails/63.jpg)
63
Register Renaming• With these additional, renamed regs, the new code
could execute in half the time!
• First: Compute into free/hidden reg r10 instead of r1, but needs additional register r10; no time penalty!
• Also: Compute in preceding code into r30 instead of r3, if r30 available; also no time penalty!
• Then all 7 regs are live afterwards: r1, r3, r4, plus the non-modified ones! E.g. r2 came in live, must go out live!
• While r10 and r30 are don’t cares afterwards; free to use again by HW
![Page 64: ECE 172 Digital Systems Chapter 3 Registers - Computer Action …web.cecs.pdx.edu/~herb/ece172s18/l03_registers.pdf · level of abstraction for AL higher than for RTL l Instead, RTL](https://reader033.fdocuments.us/reader033/viewer/2022042103/5e80a117f6fbc21794598102/html5/thumbnails/64.jpg)
64
Actual Register SetArchitecture Examples
![Page 65: ECE 172 Digital Systems Chapter 3 Registers - Computer Action …web.cecs.pdx.edu/~herb/ece172s18/l03_registers.pdf · level of abstraction for AL higher than for RTL l Instead, RTL](https://reader033.fdocuments.us/reader033/viewer/2022042103/5e80a117f6fbc21794598102/html5/thumbnails/65.jpg)
65
Intel x86 Registers
![Page 66: ECE 172 Digital Systems Chapter 3 Registers - Computer Action …web.cecs.pdx.edu/~herb/ece172s18/l03_registers.pdf · level of abstraction for AL higher than for RTL l Instead, RTL](https://reader033.fdocuments.us/reader033/viewer/2022042103/5e80a117f6fbc21794598102/html5/thumbnails/66.jpg)
66
Intel x86 Registers 32-bitl Intel x86 is infamous for being register-starved!l Need for object code compatibility extended life of x86
architecture beyond anyone’s imagination
![Page 67: ECE 172 Digital Systems Chapter 3 Registers - Computer Action …web.cecs.pdx.edu/~herb/ece172s18/l03_registers.pdf · level of abstraction for AL higher than for RTL l Instead, RTL](https://reader033.fdocuments.us/reader033/viewer/2022042103/5e80a117f6fbc21794598102/html5/thumbnails/67.jpg)
67
Intel x86 Registers 32-bit
![Page 68: ECE 172 Digital Systems Chapter 3 Registers - Computer Action …web.cecs.pdx.edu/~herb/ece172s18/l03_registers.pdf · level of abstraction for AL higher than for RTL l Instead, RTL](https://reader033.fdocuments.us/reader033/viewer/2022042103/5e80a117f6fbc21794598102/html5/thumbnails/68.jpg)
68
Intel x86 Registers 32-bitl Intel x86 has mmx and xmm registersl Can be used as array of 8, 16, 32, etc. sub registersl Also referred to as SSE (streaming SIMD Extension)
![Page 69: ECE 172 Digital Systems Chapter 3 Registers - Computer Action …web.cecs.pdx.edu/~herb/ece172s18/l03_registers.pdf · level of abstraction for AL higher than for RTL l Instead, RTL](https://reader033.fdocuments.us/reader033/viewer/2022042103/5e80a117f6fbc21794598102/html5/thumbnails/69.jpg)
69
Intel x86 Registers 64-bit
![Page 70: ECE 172 Digital Systems Chapter 3 Registers - Computer Action …web.cecs.pdx.edu/~herb/ece172s18/l03_registers.pdf · level of abstraction for AL higher than for RTL l Instead, RTL](https://reader033.fdocuments.us/reader033/viewer/2022042103/5e80a117f6fbc21794598102/html5/thumbnails/70.jpg)
70
ItaniumTM Registers
![Page 71: ECE 172 Digital Systems Chapter 3 Registers - Computer Action …web.cecs.pdx.edu/~herb/ece172s18/l03_registers.pdf · level of abstraction for AL higher than for RTL l Instead, RTL](https://reader033.fdocuments.us/reader033/viewer/2022042103/5e80a117f6fbc21794598102/html5/thumbnails/71.jpg)
71
Itanium Registersl Intel’s newer 64-bit ItaniumTM processor has 128
general registers (GR), 128 floating-point registers (FR), 64 single-bit predicate registers (PR), 8 branch registers (BR), 128 application registers (AR)
l Also, there are Performance Monitor Data registers (PMD), processor identifiers (CPUID), a Current Frame Marker register (CFM), user mask (UM), and instruction pointer registers (IP)
l GRs, FRs, BRs, ARs, CPUIDs, IP, and PMDs are 64 bits wide
l PRs are 1 bit wide, while the UM holds 6 and the CFM 38 bits; depicted below:
![Page 72: ECE 172 Digital Systems Chapter 3 Registers - Computer Action …web.cecs.pdx.edu/~herb/ece172s18/l03_registers.pdf · level of abstraction for AL higher than for RTL l Instead, RTL](https://reader033.fdocuments.us/reader033/viewer/2022042103/5e80a117f6fbc21794598102/html5/thumbnails/72.jpg)
72
Itanium Register FileGR FR PR BR AR
gr0 63…0 fr0 63…0 pr0 0 br0 63…0 ar0 Kr0gr1 63…0 fr1 63…0 pr1 0 br1 63…0 . . .gr2 63…0 fr2 63…0 pr2 0 br2 63…0 ar7 Kr7gr3 63…0 fr3 63…0 pr3 0 br3 63…0 . . .gr4 63…0 fr4 63…0 pr4 0 br4 63…0 ar16 RSCgr5 63…0 fr5 63…0 pr5 0 br5 63…0 ar17 BSP. . . . . . . . . . . . . . . . . . br6 63…0 ar18 BSPST
Ogr16 63…0 fr16 63…0 pr10 0 br7 63…0 ar19 RNAT. . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . ip 63…0 ar21 FCR
gr126 63…0 fr126 63…0 pr62 0 . . . . . .gr127 63…0 fr127 63…0 pr63 0 cfm 37…0 ar30 FDR
User M ar32 CCVCPUID um 5…0 ar36 UNAT
cpuid0 63…0 PMD ar40 FSPRcpuid1 63…0 pmd0 63…0 ar44 ITC
. . . . . . pmd1 63…0 ar64 LCcpuidn 63…0 . . . . . . ar66 EC
pmdm 63…0 ar127
![Page 73: ECE 172 Digital Systems Chapter 3 Registers - Computer Action …web.cecs.pdx.edu/~herb/ece172s18/l03_registers.pdf · level of abstraction for AL higher than for RTL l Instead, RTL](https://reader033.fdocuments.us/reader033/viewer/2022042103/5e80a117f6fbc21794598102/html5/thumbnails/73.jpg)
73
Itanium Register GRl The 128 GR registers are the common workhorses
during computationl They contain integer values being computedl It is possible to use these integer values as machine
addresses, thus GRs can be used as pointers in load- and store-operations
l All machine instructions can refer to these registers, for reading and writing values
l In addition to the 64 data bits, each GR has an associated NAT bit, which stands for Not A Thing
l NAT is 1, if the associated register has not been initialized with valid data
![Page 74: ECE 172 Digital Systems Chapter 3 Registers - Computer Action …web.cecs.pdx.edu/~herb/ece172s18/l03_registers.pdf · level of abstraction for AL higher than for RTL l Instead, RTL](https://reader033.fdocuments.us/reader033/viewer/2022042103/5e80a117f6fbc21794598102/html5/thumbnails/74.jpg)
74
Itanium Register GRl NATs support speculationl For example, if a speculative load is issued
but aborted, before the value arrives in its destined GR, the NAT state records that fact
l Enables integrity of the machine’s exception process
l There are 2 groups of GR registers:l The first 32, GR0 through GR31, are visible to
all software, and are used to hold globally computed, intermediate values
l However, GR0 is read-only, providing the constant 0, 64 bits long
![Page 75: ECE 172 Digital Systems Chapter 3 Registers - Computer Action …web.cecs.pdx.edu/~herb/ece172s18/l03_registers.pdf · level of abstraction for AL higher than for RTL l Instead, RTL](https://reader033.fdocuments.us/reader033/viewer/2022042103/5e80a117f6fbc21794598102/html5/thumbnails/75.jpg)
75
Itanium Register GRl The next 96 registers, GR32 to GR127, are used to
implement a small but frequently used portion of the top of the run-time stack; i.e. work like a special-purpose top-of-stack cache
l These stack registers are made available to SW by allocation of a register stack frame, and include between 0 and 96 registers
l Registers not used from this subset are inaccessible to general SW
l The stack frame portion implemented via GRs is further partitioned into subsections, one meant to hold local registers, the other output registers, i.e. results of the current function call
![Page 76: ECE 172 Digital Systems Chapter 3 Registers - Computer Action …web.cecs.pdx.edu/~herb/ece172s18/l03_registers.pdf · level of abstraction for AL higher than for RTL l Instead, RTL](https://reader033.fdocuments.us/reader033/viewer/2022042103/5e80a117f6fbc21794598102/html5/thumbnails/76.jpg)
76
Itanium Predicate Registers PRl Execution of most IPF (Itanium Processor
family) instructions can be predicated by a PR (predicate register)
l Value 1 in the PR means: the operation can be completed normally
l PR value 0 means the result will not be posted (committed), even if it has been computed already. I.e. there will be no stores and no impact on any AR of the machine
l Exception of an instruction that cannot be predicated is the loop operation
![Page 77: ECE 172 Digital Systems Chapter 3 Registers - Computer Action …web.cecs.pdx.edu/~herb/ece172s18/l03_registers.pdf · level of abstraction for AL higher than for RTL l Instead, RTL](https://reader033.fdocuments.us/reader033/viewer/2022042103/5e80a117f6fbc21794598102/html5/thumbnails/77.jpg)
77
Itanium Predicate Registers
l The PRs are also partitioned into 2 sections:l PR0 through PR15 are static PRsl The other 48 are so called rotating PRsl PR0 is an exceptional register, it can only be
read, and its value is always 1, meaning, the predicate is true; thus PR0 denotes unconditional execution
l The remaining 48 PRs are used to hold stage predicates, used during software pipelining
![Page 78: ECE 172 Digital Systems Chapter 3 Registers - Computer Action …web.cecs.pdx.edu/~herb/ece172s18/l03_registers.pdf · level of abstraction for AL higher than for RTL l Instead, RTL](https://reader033.fdocuments.us/reader033/viewer/2022042103/5e80a117f6fbc21794598102/html5/thumbnails/78.jpg)
78
Itanium Branch Registers BRl IPF instructions are grouped in bundles, which are 16-byte
aligned byte sequences holding executable code. Hence their rightmost 4 address bits will always be 0 due to alignment; these 4 address bits don’t need to be stored explicitly
l Execution of an indirect branch requires an explicit operandl On the Itanium architecture this operand is a branch register;
a branch register BR holds the branch destinationl The machine then loads the value of the referenced BR into
the IP register and execution continues from there; IP stands for Instruction Pointer
l Executing branch-related instructions is about the only way to directly affect the value in the instruction pointer, the register that holds the address of the next bundle to be executed
![Page 79: ECE 172 Digital Systems Chapter 3 Registers - Computer Action …web.cecs.pdx.edu/~herb/ece172s18/l03_registers.pdf · level of abstraction for AL higher than for RTL l Instead, RTL](https://reader033.fdocuments.us/reader033/viewer/2022042103/5e80a117f6fbc21794598102/html5/thumbnails/79.jpg)
79
Current Frame Marker Register CFMNote: Frame Marker is often referred to as Stack Frame,
and its fixed portion as the Stack Marker
l Each function has a specific stack frame associated with it, which is created at function invocation; it is cleared at function return
l If all the relevant data of a function’s stack frame do fit, they are placed in the stack of general registers; else the overflowing data must reside in memory
l Either way, the current frame marker (CFM) holds the frame marker for the function that is currently active
l Generally, most functions have small stack frames
![Page 80: ECE 172 Digital Systems Chapter 3 Registers - Computer Action …web.cecs.pdx.edu/~herb/ece172s18/l03_registers.pdf · level of abstraction for AL higher than for RTL l Instead, RTL](https://reader033.fdocuments.us/reader033/viewer/2022042103/5e80a117f6fbc21794598102/html5/thumbnails/80.jpg)
80
Current Frame Marker Register CFMLayout of the CFM:
CFM- 37 .. 32 31 .. 25 24 .. 18 17 .. 14 13 .. 7 6 .. 0 register Rrb.pr Rrb.fr Rrb.gr sor sol sof
Meaning of Bits in CFM:Name Bit Field meaning
Sof 0..6 Total size of stack frame Sol 7..13 Size of local part of stack frame, in words Sor 14..17 Size of rotating portion of stack frame. The number
of the rotating registers is 8 times the sor value rrb.gr 18..24 Register rename base for grs rrb.fr 25..31 Register rename base frs rrb.pr 32..37 Register rename base prs
![Page 81: ECE 172 Digital Systems Chapter 3 Registers - Computer Action …web.cecs.pdx.edu/~herb/ece172s18/l03_registers.pdf · level of abstraction for AL higher than for RTL l Instead, RTL](https://reader033.fdocuments.us/reader033/viewer/2022042103/5e80a117f6fbc21794598102/html5/thumbnails/81.jpg)
81
Itanium Application Registers AR
Application Registers – t.b.d.:
register Mnemonic Description of register ar0 – ar7 KR0 – KR7 Kernel registers 0 .. 7 ar8 – ar15 Reserved ar16 t.b.d.
![Page 82: ECE 172 Digital Systems Chapter 3 Registers - Computer Action …web.cecs.pdx.edu/~herb/ece172s18/l03_registers.pdf · level of abstraction for AL higher than for RTL l Instead, RTL](https://reader033.fdocuments.us/reader033/viewer/2022042103/5e80a117f6fbc21794598102/html5/thumbnails/82.jpg)
82
Itanium Instruction Pointer IP
l IPF instructions are fetched in units of bundles: chunks of 16 bytes, or 128 bits
l Bundles are stored bundle-alignedl The ip addresses 18,446,744,073,709,551,616
different bytes (aligned at bundle addresses)l The rightmost 4 bits of the ip thus will always
be zero, due to the bundle-alignmentl Hence these 4 bits don’t needs to be stored
on the microprocessor silicon
![Page 83: ECE 172 Digital Systems Chapter 3 Registers - Computer Action …web.cecs.pdx.edu/~herb/ece172s18/l03_registers.pdf · level of abstraction for AL higher than for RTL l Instead, RTL](https://reader033.fdocuments.us/reader033/viewer/2022042103/5e80a117f6fbc21794598102/html5/thumbnails/83.jpg)
83
Performance Monitor Data Register
l These are architecture-provided resources that record the use of HW modules
l Contents is read-only by SWl But contrary to the performance monitor
registers on Intel Pentium architectures, they are user visible on Itanium
![Page 84: ECE 172 Digital Systems Chapter 3 Registers - Computer Action …web.cecs.pdx.edu/~herb/ece172s18/l03_registers.pdf · level of abstraction for AL higher than for RTL l Instead, RTL](https://reader033.fdocuments.us/reader033/viewer/2022042103/5e80a117f6fbc21794598102/html5/thumbnails/84.jpg)
84
Alpha Registers
![Page 85: ECE 172 Digital Systems Chapter 3 Registers - Computer Action …web.cecs.pdx.edu/~herb/ece172s18/l03_registers.pdf · level of abstraction for AL higher than for RTL l Instead, RTL](https://reader033.fdocuments.us/reader033/viewer/2022042103/5e80a117f6fbc21794598102/html5/thumbnails/85.jpg)
85
Alpha Registersl On MP Alpha system, each processor has its own,
full complement of architecture registersl The pc register always addresses the next
instruction in 4-byte aligned instruction streaml The pc is 64-bits wide, yet the rightmost 2 bits are
implied 0 and not explicitly stored, due to the 4-byte instruction alignment
l Alpha has 32 integer registers, each 64 bits wide, conventionally named R0 .. R31
l R31 has special meaning: R31 always supplies integer 0 as a source operand
l Clearly, R31 is not writeablel Exceptions are not raised, when R31 is specified
as a destination for a load!
![Page 86: ECE 172 Digital Systems Chapter 3 Registers - Computer Action …web.cecs.pdx.edu/~herb/ece172s18/l03_registers.pdf · level of abstraction for AL higher than for RTL l Instead, RTL](https://reader033.fdocuments.us/reader033/viewer/2022042103/5e80a117f6fbc21794598102/html5/thumbnails/86.jpg)
86
Alpha Registersl Alpha has 32 floating-point registers,
named F0 .. F31l Each float register is 64 bits widel Register F31 always holds the true 0.0
floating-point value as a constant, cannot be written
l Note: An exception is not signaled for a load, specifying F31 as destination!
l Float instructions computing single-precision data –only 32-bits wide– still write all 64 bits of their respective floating point destination register, sign-extended!
![Page 87: ECE 172 Digital Systems Chapter 3 Registers - Computer Action …web.cecs.pdx.edu/~herb/ece172s18/l03_registers.pdf · level of abstraction for AL higher than for RTL l Instead, RTL](https://reader033.fdocuments.us/reader033/viewer/2022042103/5e80a117f6fbc21794598102/html5/thumbnails/87.jpg)
87
Alpha Registersl Alpha has 2 special registers, named lock-registers,
LR0 and LR1; not further explained herel Process Cycle Counter (PCC) register consists of
two 32-bit fields; usable for performance monitoringl Low order 32 bits (31..0), known as PCC_CNT, uses
as interval timer, unsigned wrapping counter, tacking number of nanoseconds of an event
l High order 32 bits (63..32) known as PCC_OFF, and are operating-system dependent
l Suggested: use as cycle counter for process, threadl PCC read by special RPCC instruction; for OS
supportl FPCR (64-bit Floating Point Control Register) used in
IEEE 754 format; else FPCR is not visible; among others, sets one rounding mode of four
![Page 88: ECE 172 Digital Systems Chapter 3 Registers - Computer Action …web.cecs.pdx.edu/~herb/ece172s18/l03_registers.pdf · level of abstraction for AL higher than for RTL l Instead, RTL](https://reader033.fdocuments.us/reader033/viewer/2022042103/5e80a117f6fbc21794598102/html5/thumbnails/88.jpg)
88
Alpha Registers
![Page 89: ECE 172 Digital Systems Chapter 3 Registers - Computer Action …web.cecs.pdx.edu/~herb/ece172s18/l03_registers.pdf · level of abstraction for AL higher than for RTL l Instead, RTL](https://reader033.fdocuments.us/reader033/viewer/2022042103/5e80a117f6fbc21794598102/html5/thumbnails/89.jpg)
89
IBM 370 Registers
![Page 90: ECE 172 Digital Systems Chapter 3 Registers - Computer Action …web.cecs.pdx.edu/~herb/ece172s18/l03_registers.pdf · level of abstraction for AL higher than for RTL l Instead, RTL](https://reader033.fdocuments.us/reader033/viewer/2022042103/5e80a117f6fbc21794598102/html5/thumbnails/90.jpg)
90
IBM 370 Registersl IBM’s 370 ancient mainframe architecture preceded
x86, had regular and relatively rich register setl Various formats: half-word, word, extended formats
![Page 91: ECE 172 Digital Systems Chapter 3 Registers - Computer Action …web.cecs.pdx.edu/~herb/ece172s18/l03_registers.pdf · level of abstraction for AL higher than for RTL l Instead, RTL](https://reader033.fdocuments.us/reader033/viewer/2022042103/5e80a117f6fbc21794598102/html5/thumbnails/91.jpg)
91
Bibliography1. Morris M. Mano, et al.: Logic and Computer Design
Fundamentals, Pearson 5th Edition, ISBN 978-0-13-376063-7
2. Shen, John Paul, and Mikko H. Lipasti: Modern Processor Design, Fundamentals of Superscalar Processors, McGraw Hill, © 2005
3. Nilsson, James W., and Susan A. Riedel: Electric Circuits, © 2015 Pearson Education Inc., ISBN 13: 9780-13-376003-3
4. Sparc: https://en.wikipedia.org/wiki/SPARC
5. http://www-03.ibm.com/ibm/history/exhibits/mainframe/mainframe_PP3158.html