Microprogrammed Control Unit Design

1 ©M. Balakrishnan: Only for limited circulation for the Ethiopian class

CHAPTER 6: Microprogrammed Control Unit Design 6.1. Introduction 6.2. Terminology 6.3. Microprogrammed Control: Advantages and Disadvantages 6.4. Components of Microprogrammed Control Unit 6.5. Clock Period Computation 6.6. Microprogrammed Control Example 6.7. Generic Microsequencer 6.8. Microprogram Optimization: Definition 6.9. Microinstruction Encoding Example 6.10. Microprogram Optimization Algorithm 6.11. Complex Microprogramming Structures


CHAPTER 6: Microprogrammed Control Unit Design 6.1. Introduction Microprogramming is an alternative to what is referred to as hardwired approach for designing the control part of a system. The state machine implementations shown in chapter 4 are referred to as hardwired implementations. The key benefit of microprogramming is its flexibility in making design changes without having to redesign or rewire the system. This makes it extremely useful during the design and development phase as frequent modifications are inevitable at this stage.

Status Signals Control Signals

Figure 6.1 Control-Data Interaction The control part and the data part (also referred to as data path) are the two major components of any system. These parts interact via two sets of signals – the status signals from the data to the control and the control signals from the control to the data part. The data part contains the resources for storing and performing operations (transformations) on data values whereas the control part sequences these operations/data transfers in a specific order to implement the algorithm. The control part controls the sequence of operations by issuing the control signals in a specific order. As in most algorithms, the sequence of operations to be performed is not independent of the data values i.e. it does depend on the result of intermediate operations on specific data. The data operation results required for such flow control are conveyed by the data part to the control part by a set of signals called status signals. These signals typically capture the result in a binary form as whether a particular condition (e.g. a > b) is true or false. This clear separation of the design into control and data in some sense represent an “extreme” design approach. In most complex designs, one would find a less “ideal” situation with control part also performing some data operations and data part also implementing some “control steps”. For an experienced designer this may be easy to handle but for a student learning to design, it is strongly felt that this distinction between control and data and a strict separation between the two be clearly enforced. In the initial stages, this ensures faster development through a systematic debug methodology. Over a period, the designer learns to “consciously” integrate data operations in control and control steps in data units for efficiency reasons. An algorithm today is best captured in terms of what is referred to as RTL description. RTL stands for register transfer level and primarily involves primitive operations and transfer of data between registers or storage elements. These are typically operations which can be performed in one step (or simply one cycle) in the hardware. For this to happen, all such primitive operations in the RTL description are supported by combinational units in the data path which can implement those operations. A more detailed description and examples of RTL would follow in the next two chapters.

Data Part

Control Part


RTL descriptions typically do not separate control and data parts. From such an RTL description, the control can be synthesized through one of the following options: Extract FSM and synthesize using “hardwired” logic Extract a microprogram and synthesize microprogrammed control Extract FSM (Moore machine) and synthesize using microprogrammed control

In an unstructured manual design approach one can also use what is referred to as “adhoc controller” which does not go through a structured intermediate step of FSM or Microprogramming. 6.2. Terminology Before we discuss microprogramming in detail, we would like to clearly define the associated terminology. Text box titled Figure 6.2 presents the most important terms related to microprogramming. The name microprogram comes from the level at which this program exists. In case of the processor or microcontroller, it is at a level even lower than the machine level – i.e. it is internal to the machine and represents the internal control of the machine. Normally this level is not visible to the user. A set of microinstructions which implements the control by both providing the control signals and testing the conditions/status signals to take control flow decisions is referred to as microprogram. Microinstructions are the components of a microprogram as a collection of microinstructions represent a microprogram. These microinstructions may be either written directly in the binary from or in a symbolic form with one-to-one mapping to the binary form. In some sense they represent a single control step and carry signals for activating all the data units required for carrying out operations in that specific step. For the purposes of this book, we would always associate one clock period with each control step which means each microinstruction takes one clock period to execute. Each microinstruction consists of a number of control signals which activate different components in the data part to perform these operations concurrently. Each of the indivisible unit operations is referred to as micro-operation and refers to the most primitive controllable operation in the micro-architecture of the design. A microinstruction is nothing but a set of micro-operations which are concurrently executed in a specific step. Not all micro-operations can be executed concurrently as each of them requires some resources and mainly contention for the same resource prevents concurrency among pairs of micro-operations. e.g. In a data part with ALU, taking operands from two registers connected to the ALU input ports, adding them and storing in another register connected to the ALU output port is a micro-operation. Such a micro-operation cannot be activated concurrently with another micro-operation which utilizes the same ALU for performing subtraction. It should be clear that a single micro-operation may require multiple control signals; apart from function select of ALU, the “add” micro-operation defined above may require control signals for bringing/routing the register operands to the ALU input ports, taking the ALU output to the result register input port and control signal for latching the result register.


Figure 6.2 Definition of terms related to Microprogramming

To understand the functionality of the remaining terms, refer to figure 6.3 which shows a block diagram of a typical microprogrammed system. The microsequencer is a key component which continuously generates the sequence of addresses for the control ROM and thus is also referred to as the “next address generator”. The microsequencer typically contains a register called microprogram counter (µPC) which has a role in relation to microprogram that is very similar to the role program counter (PC) has vis-a-vis the program in a processor. The sequence of addresses generated in some sense encapsulates the control flow of the algorithm and thus the microsequencer has to respond to the status signals originating from the data part. Control/microprogram ROM stores the microprogram in the form of a sequence of microinstructions. Each bit of the microprogram corresponds to the control signals required to activate the data part or a signal required to help generate the next address. A set of bits (or a word of microprogram) fetched simultaneously constitute a microinstruction and correspond to the control signals that are activated concurrently. A larger microprogram width usually means a larger and more concurrent data part. The vertical size of the microprogram in some sense defines the complexity of the algorithm being implemented. The word ROM is used in a “legacy” sense and very often the miroprogram today is stored in a RAM so that the control can be easily modified. Still by and large, when the module is executing, microprogram control does not change (at least not rapidly) and thus the behavior is more like a ROM.

Microprogram: A set of microinstructions which represents the control for implementing a specific algorithm on a given data part.

Microinstruction: Each microinstruction consists of a set of control signals which correspond to a set of micro-operations which are activated concurrently in one control step.

Micro-operations: Each micro-operation is an indivisible register transfer operation which

may require one or more control signals to be activated.

Microinstruction format: Microinstruction format refers to the specific organization of control signals in the form of microinstruction fields and their bit position that defines the microinstruction.

Microsequencer: The role of the microsequencer is to generate a sequence of addresses for

the microprogram ROM taking into account the status signals originating from the datapart. These addresses are used to fetch the next microinstruction and thus the microsequencer is also known as the next address generator.

Control/Microprogram ROM: This is memory unit where the set of control signals which constitutes the microinstructions and consequently microprogram are stored.

Microprogram register: The microprogram register acts as an interface between the control and data parts – it contains the current microinstruction that is being executed by the data part in a given control step/clock cycle.


The microinstruction once fetched needs to be stored and a register named microinstruction register is used for that. The width of this register is the same as the microprogram width. To those familiar with the processor architecture, they can relate this to the instruction register which typically stores the opcode and other related control information for the machine program execution. This register in some sense allows overlap of execution of the current microinstruction with the generation (status generation + next address generation + next microinstruction access) of the next microinstruction. Introducing more registers can help overlap more of these steps and thus achieve a higher degree of pipelining. It is a synchronous design with both the control part and the data part executing in one clock cycle. The clock is explicitly not shown in the diagram but register is clocked and work on an active edge (rising or falling). Further, it is assumed that all registers in the data part that could affect the status and thus the microinstruction through the microsequencer also operate at the same edge.

Figure 6.3 Block Diagram of a Microprogrammed Unit 6.3. Microprogrammed Control: Advantages & Disadvantages Microprogrammed control evolved in early 60’s and was very popular for implementing digital systems including mainframe computers. The main reason for its popularity was its flexibility which allowed the control to be easily updated without the need to redesign the interconnections or printed circuit boards. In the case of computers, it enabled the micro-architecture to be cleanly isolated from the instruction set thorough this layer of microprogramming. This enabled the same instruction set to be used for processors implemented using multiple generation of semiconductor devices or even from multiple vendors supporting very different micro-architectures through instruction set emulation. The main advantage is that the system software and tools like compilers can be shared across multiple hardware platforms. The longevity of IBM 360 instruction set owes considerably to this feature which is referred to as “emulation”. We now enumerate specific advantages and disadvantages of the microprogrammed approach vis-à-vis hardwired or random logic approach.

µ - Seq-uencer

µ-pgm ROM

µ-reg.

Data Part

Control signals

Status signals


Advantages: • Structured and flexible design: The design is extremely flexible, as any change requires only

a modification in the contents of the ROM. The control part hardware structure is more or less fixed as opposed to a random logic design where even a simple change like a control output shifted from one state to another may require a change of interconnections or even logic gates.

• Testing sequences can be easily incorporated: Testing a system both for faults in implementation as well as for faults developing over time due to component failure require considerable effort and cost. In microprogrammed systems as all the components are being controlled from the control ROM, it is relatively easy to develop test sequences that activate various components in a controlled manner to test the complete system components functioning. The cost in terms of additional ROM words storing these test microinstructions. On the other hand, random logic control would require considerable additional circuitry including may be fresh interconnections to achieve the same results.

• Easy to document and debug: Documenting a design is a key to achieve longevity in its life. The documentation enables engineers other than designers to both upgrade and maintain the system. A system with microprogrammed control is much easier to document and maintain as the basic functionality is captured in microinstruction format and the algorithm in the microprogram. Both these are relatively easy to document, test and modify vis-à-vis a complex control circuit in the form of a large number of gates. The growth in complexity in understanding the design does not grow so fast with the increase in size of the microprogram, whereas the same cannot be said about the hardwired control.

Disadvantages: • Expensive especially for small designs: For simple designs, microprogrammed control can be

an overkill as components like ROM and microsequencer can add to its cost substantially. • Slower than random logic: A key reason why microprogramming is not used in many

“production” systems is that it can be much slower than random logic in terms of clock period. This is because the large components like ROM and microsequencer add to the delay and make the clock slower.

• Higher power consumption: Memory components tend to be more power consuming because of its full decoding vis-à-vis custom logic design. Though a detailed study including quantification in this area is missing, but it is expected that microprogrammed control is likely to be more power consuming than hardwired custom logic.

• Unsuitable for supporting pipelining: Microprogramming just like classical assembly or high-level language programming is primarily a sequential control programming paradigm. As almost all processors today support instruction pipelining ranging from three to even ten plus stages, microprogrammed control has become unsuitable at least for implementing control of the processors. Such a deep pipelining also implies a high degree of concurrency with multiple stages operating in a relatively distributed manner (e.g. need to stall). Such designs cannot be efficiently implemented using a centralized controller.

• HDLs hide logic complexity: Increasingly, designs start by capturing the specifications in terms of HDL (Verilog/VHDL) descriptions as a FSM for the control part or even RTL which integrates data and control part. From this description, the synthesis tools take over and automatically generate the hardwired or random logic. Thus, the designer never sees the


underlying logic complexity. Further, maintenance and upgradation of the design is also done only using the source HDL descriptions.

The disadvantages explain the key reasons for its relative unpopularity in designing the modern processors which employ pipelining to increase processing throughput. Most ASICs still use “sequential” control based on FSMs (and thus sequential) but as they almost always start from HDL descriptions and thus hide logic complexity from the designer, use of microprogramming has considerably declined. On the other hand, we still include microprogramming and microprogrammed control unit design in this text as this gives a unique insight into the control part by giving it a “structure”. In some designs, variants of this approach still appear though the control may not be completely microprogrammmed. 6.4. Components of Microprogrammed Control Unit In this section we would describe the major components of a microprogrammed control unit. Not only the components would be identified but also their role as well as their generic sizes would be defined to get a better understanding of a micoprogrammed control unit. n | n w m | | | | k |

s | |

1 MUX S

Figure 6.4 Block Diagram of a “Sized” Microprogrammed Control Unit

Figure 6.4 is essentially the same block diagram as figure 6.3 but with one important difference. Now the sizes of all the interconnections have been symbolically labeled. First we start with the microinstruction format. A microinstruction contains a set of control signals that activate various data part components simultaneously and also help define the next microinstruction to be executed. Microinstruction format refers to the mapping of individual microinstruction bits to the control signals. In some sense this refers to encoding of control signals in the microinstruction into various fields. More details and also options on encoding are discussed in section 6.9.

µ - Seq-uencer

µ-pgm ROM N X w

µ-reg

Data Part


A single microinstruction consists of various blocks of signals, all requiring different number of bits depending on the design. The data part in figure 6.4 is assumed to require “m” bits as control signals and that is one field. The microsequencer is capable of performing number of control actions primarily towards selecting the next address. More details of the microsequencer follow in section 6.6. These are encoded in “k” bits and referred to as as sequencer control. The number of status lines originating from the data part which can influence the next address are “S” signals. In any one microinstruction one of these signals is selected by the multiplexer. This multiplexer is controlled by the “s” select lines (equivalent to log2S) which is shown as the third field. Finally the field next address which is “n” bits wide specifies the branch address in case of branch sequencer action. All these add upto “w” bits in the microinstruction format as shown in figure 6.5 and table 6.1. Though figure 6.5 and table 6.1 defines only four broad fields, usually there are many sub-fields under these fields especially in the data part control field.

Figure 6.5 Microinstruction Format (to be read in conjunction with table 6.1) Field No. Field name Size (bits) Remarks on Size

1 Data Part Control Signals m Complexity of data part 2 Sequencer Control / Action Select k Options for next address selection 3 Status Control Select s s = log2S 4 Branch Address (for jumps etc) n n = log2N

Total microinstruction width w w = m + k + s + n

Table 6.1 Generic Microinstruction Fields Once the microinstruction fields are defined, now it would be possible to describe the sizes of the various control unit components in terms of these symbolic sizes. Table 6.2 describes all the major components.

Component Remark on Size Data part m control inputs and S status outputs µ –Sequencer k+ n + 1 inputs, n outputs Status MUX S status inputs, s select lines, 1 output ( s > log2S ) µ-pgm ROM N X w bits ( n > log2N where n is the no of address inputs ) µ-instr-register w = m + k + s + n

Table 6.2 Components and their Sizes Data part clearly has “m” control inputs and “S” status outputs. The value “m” in some sense reflects the concurrency and thus indirectly complexity of the data part. Please note the “m” control signals have nothing to do with the data inputs or primary inputs to the system. In some

1 2 3 4

m k s n


specific cases, role of a primary input is only to control the machine. e.g. a “start” signal coming from an external source triggers the start of the algorithm. In such cases, for the sake of generality we assume the external input received by the data part, still goes through it as a status signal to control the flow. This is shown in figure 6.6. :

Figure 6.6 Primary Input as a status signal The microsequencer (µ –Sequencer) has “k + n + 1” inputs corresponding to the “k” sequencer instruction bits, “n” branch address bits, 1 selected status line from the status MUX and “n” next address bits going to the microprogram ROM. The status MUX is used to select one of the “S” status lines coming from the data part using the “s” status select lines coming from the microinstruction. The microprogram ROM (µ-pgm ROM) or control ROM is of size N X w where N is the total number of microinstructions or the length of the microprogram. This obviously corresponds to the complexity of the algorithm in terms of number of “states”. The microinstruction register (µ-instr-register) is w bits wide and corresponds to the sum of widths of all the fields. While mapping the control flow onto the microprogrammed control, it is generally transformed into one in which each state has only 2 outgoing edges (i.e. one out of two possible destinations). This makes implementation much easier as after executing a given microinstruction it will either execute the next state or some other microinstruction specified by the branch address. This decision is made by the microsequencer based on the status line selected by the statusMUX. More complex multi-way branches are also possible in microprogrammed control but requires special treatment which is beyond the scope of this book.

Data Part

Control Part

Primary Outputs

Primary Inputs


6.5. Clock Period Computation Clock period is an important contributor to the performance of any synchronous digital system. This is because the total time (T) taken in executing an algorithm is given by T = tclk X nclk …(6.1) The minimum clock period is constrained by the longest delay path from any register output(/primary input) to any register input(/primary output). For the microprogrammed control, the clock period computation is shown. For that we first enumerate the relevant delays of the components. tdp: Maximum delay in data part tstatus: Maximum delay for status generation in data part tsta_mux: Delay of status multiplexer tseq: Microsequencer delay tROM: Microprogram ROM delay treg: Register delay (includes setup time) tdp is the maximal data part delay path whereas tstatus is the maximum delay of the path generating the status signals.

Figure 6.7 Clock period computation: Identified delay paths Figure 6.7 identfies five paths which can constrain the clock period. Out of these five, the three that is P3, P4 and P5 are dominated by P2 as they contain only subset of the delays involved in P2. Thus the clock period can be expressed as the maximum of the P1 and P2 delay. tclk > max { tdp , tstatus + tsta_mux + tseq + tROM * + treg } …(6.2)

µ - Seq-uencer

µ-pgm ROM N X w

µ-reg

Data Part

P1

P2

P3

P4P5


Microprogrammed control compares poorly with hardwired logic generally because the ROM delay is significantly higher than other components. This implies P2 dominates over other delays and becomes the constraint to slow down the clock period and thus the performance. Normal approach to removing a constraint originating from a long delay path is to introduce registers to break the path and in some sense “pipeline” the activities happening in that path. This will directly decrease the clock period but has the potential of increasing the number of clocks as well as the complexity of the control. Thus such transformations have to be carried out based ion comprehensive analysis.

Figure 6.8(a) A Two Stage Pipeline in the Control Path

Figure 6.8(a) A Three Stage Pipeline in the Control Path

µ - Seq-uencer

µ-pgm ROM N X w

µ-reg

Data Part

P1

P2

R E G

µ - Seq-uencer

µ-pgm ROM N X w

µ-reg

Data Part

P1

P2

R E G

R E G


Figures 6.8(a) and 6.8(b) show the same path broken into two stages and three stages respectively. The equation 6.2 gets modified to equation 6.3 in case one register is introduced between the microsequencer and microprogram ROM and equation 6.4 in case another register is introduced to latch the status before the status multiplexer. Please note all these registers are assumed to be clocked by the same clock. Figure 6.8(a) manages to isolate the Control ROM delay and 6.8(b) further manages to isolate the tstatus from the rest of the path delays. tclk > max { tdp , tstatus + tsta_mux + tseq + treg, tROM } …(6.3) tclk > max { tdp , tsta_mux + tseq + treg, tROM, tstatus } …(6.4) As already pointed out this can have an impact on number of clocks. The readers who are familiar with instruction level pipelining as part of computer architecture can easily correlate this situation with that. In case of figure 6.8(a), any jump would be delayed by one cycle and it may always not be possible to get a condition evaluated one cycle before performing a branch. Alternative would be to fill up a “Nop” microinstruction which essentially increases the number of clock cycles to execute the algorithm. In case of figure 6.8(b), the status or branch condition has to be generated two cycles before the actual branch can be affected. This again means one may have to introduce “Nop” microinstructions as this may not be possible in many situations as the data on which the status is generated may be available only just before the branch. In the previous discussion, it has been shown that pipelining can be used to decrease the clock period, but may also result in increasing the number of clock cycles. Therefore, to get optimal performance, a careful decision has to be made regarding how many stages of pipelining are suitable such that benefits of clock period reduction are not overtaken by the loss due to increase in number of clock cycles.

6.6. Microprogrammed Control Example In the previous subsections, we have introduced the terminology as well as components. To firm up the discussion and to increase the readers understanding, in this subsection we present a simple case study of a microprogrammed control implementation. The general design steps are:

Figure 6.9 Design Steps in Microprogrammed Design

We will illustrate this design process with an example of designing the microprogrammed control unit of a GCD computer.

Design the data part and identify the control and status signals Identify control flow requirements to complete control-data interface and the block

diagram Generate the symbolic microprogram Finalize the microinstruction format Design the microsequencer Generate the binary microcode or control ROM contents


Figure 6.10 Block Diagram of a 16-bit GCD Computer Figure 6.10 shows the block diagram of the GCD Computer. We clearly identify all the inputs and outputs along with their data widths. The computer is intended to find GCD of two 16 bit positive integers i.e. X ≥ 1 and Y ≥ 1. Output is also a 16-bit positive integer i.e. Z ≥ 1. The module starts computation once the “start” signal arrives by first accepting the inputs and then executing the algorithm in figure 6.11. Once the computation finishes, a signal end of computation or “eoc” becomes valid. This can be used by both the modules supplying the input (to get the status for giving the next set of inputs) and the module reading the result.

Figure 6.11 GCD Algorithm Figure 6.11 describes the “psuedocode” of the algorithm. It is not in particularly any specific language but describes the sequence of steps required to compute GCD by repeatedly subtracting the smaller number from the larger number and replacing the larger number by the result of subtraction. Step 1: Design of the Data Part with associated Control and Status Signals From the algorithm we derive the data part shown in figure 6.12 for our GCD computer. All registers and flip-flops are clocked with the same clock and without losing any generality we assume they trigger at the rising edge of the clock. The data part shown in figure 6.12 is derived by systematically analyzing the operations required by the algorithm and instantiating components for

GCD

Computer

X

Y

start

Z

eoc

1616

16

begin s: wait till (start = 1); input x,y; eoc := 0; while (x != y) do if (x>y) then x := x-y else y := y-x endif; endwhile; z := x; eoc := 1; goto s; end;


supporting those operations. Computation (arithmetic and logical) required in the algorithm need to be supported by corresponding ALUs, variables require storage elements to keep the values and the data transfer required for performing the operations need to be supported by interconnections among these ALUs and storage elements apart from primary inputs and outputs. Following the above approach, we instantiate a 16-bit comparator and a 16-bit subtractor for arithmetic operations and three registers R1, R2 and R3 for storing initial as well as intermediate values of X,Y and final value of result Z respectively. A primary input “start” and a flip-flop which can be set and reset for “eoc” are also instantiated. Now performing the required data transfers, we realize that some inputs require multiple inputs to be connected and in each such case we also instantiate a multiplexer.

Figure 6.12 Data part of the GCD Computer From the data part of figure 6.12, we have identified the control and status signals required by the controller. Figure 6.13 lists the control and status signals on the left. Control Signals: { S_R1, S_R2, S_OP1, S_OP2, L_R1, L_R2, L_R3, C_eoc, P_eoc } Status Signals: ( eq, gt, start }

Comparator

Subtractor

R1 R2

R3

mux mux

mux mux

eoc

X Y

Z

start

eoc


Figure 6.13 Data part of the GCD Computer annotated with Control and Status Signals Step 2: Identify control flow requirements to complete control-data interface and the block diagram The next step in our process is to identify the control flow requirements of the algorithm. The control flow requirements can be identified directly from the algorithm. But as we have already introduced FSM as a method of capturing control, we draw the FSM just to illustrate the control flow as distinct from the data computation requirements. Figure 6.14 shows the state machine with S0 as the initial state where the machine waits for the “start” signal to begin computing the GCD. In state S1, the initialization as well as data input is performed. In state S2 the comparison is carried out followed by different actions depending on first whether the the values are equal or not and then which value is greater. Thus in seven states the algorithm can be executed. The data operations associated with various states are not indicated in the diagram as we do not intend to proceed with synthesizing the control from this FSM.

Comparator

Subtractor

R1 R2

R3

mux mux

mux mux

eoc

X Y

Z

start

eoc

S_R1

S_R2

L_R1

L_R2

S_OP1

S_OP2

P_eoc

C_eoc

L_R3

Start

eq

gt


Figure 6.14 FSM diagram of the GCD computer. Now it is easy to identify all the control flow requirements of the algorithm as the state machine is also available as a reference. There are three types of control flows that need to be supported in our microprogrammed control.

Figure 6.15 Control Flow requirements

From these control flow requirements, we arrive at corresponding microsequencer instructions. These instructions in some sense define the range that need to be supported by the microseuqencer.

• Next or continue (cont) • Conditional jump (cjmp) • Unconditional jump (jmp)

S0

S1

S2

S3

S4 S5

S6

Start’

start

eq’

eq

gt gt’

• Next microinstruction.(e.g.. S1->S2) goto M[i+1]; where i is the present instruction number • Conditional branch. (e.g.. S0->S1). if (cond) then goto M[j] where j is the branch instruction number else goto M[i+1] and i is the present instruction number • Unconditional branch. (e.g.. S6->S0) goto M[j]; where j is the branch instruction number


To implement these microsequencer instructions we will need the following control part signals: • seq_ins (2 bits) to specify one of the three microsequencer instructions{cont, cjmp, jp} • cond_sel (2 bits) to choose between the three status or condition inputs {eq, gt, start’} • br_adr (3 bits) to specify the address (range 0..6) of the microinstruction to branch to.

At this stage we redraw the figure 6.4 and label all the widths as per the requirements of the GCD control and data part implementation.

Figure 6.16 GCD Control block diagram labeled with line widths

Step 3: Generate the symbolic microprogram Now we are in a position to write out a symbolic microprogram for our GCD computer. Note that each step in out microprogram corresponds to a state in our state diagram of Fig.6.14 but that may always not be the case. The microprogram in table 6.3 is written in three columns. First corresponds to the microinstruction label or address, the second corresponds to the data operation

Table 6. 3 Symbolic Microprogram for the GCD algorithm

Data operation Control operation M0 seq_ins = cjmp, c_sel = start’, br_adr = M0 M1 S_R1 = “X”, L_R1 (R1 <= “X”);

S_R2 = “Y”, L_R2 (R1 <= “Y”); C_eoc (eoc <= “0”);

seq_ins = cont

M2 (Compare R1 and R2) seq_ins = cjmp, c_sel = eq, br_adr = M6 M3 (Compare R1 and R2) seq_ins = cjmp, c_sel = gt, br_adr = M5 M4 S_op1 = “R2”, S_op2 = “R1”, L_R2

(R2 <= R2 – R1) seq_ins = jmp, br_adr = M2

M5 S_op1 = “R1”, S_op2 = “R2”, L_R1 (R1 <= R1 – R2);

seq_ins = jmp, br_adr = M2

M6 L_R3 (R3 <= R1); P_eoc (eoc <= “1”); seq_ins = jmp, br_adr = M0

µ - Seq-uencer

µ-pgm ROM 8X16

µ-reg

Data Part

3

9163

23

2

1


and the third corresponds to the control. The corresponding data operation is also indicated in the second column for reference. Actually the data operations ‘Compare R1 and R2” using the comparator is happening all the time. It is only the results are used at the end of microinstructions M2 and M3. No control signals are required for performing the comparison. Similarly the subtraction operation is happening all the time and control signals are only required to select the appropriate inputs and also to latch the result in an appropriate register. Now we are in a position to work out some of the finer details for our microprogrammed control which includes the microinstruction format, microsequencer design and the binary microcode. Step 4: Designing the Microinstruction Format Binary microcode corresponds to the control ROM contents. For this we need to first freeze on the microinstruction format that specifies the control signal associated with each of the bits. Designing an optimal microinstruction format has been an active area of research in seventies and we discuss some simple techniques in section 6.7. For the purpose of this case study, we assume a simple format which is nothing but putting all control signals “horizontally” to generate a format whose width corresponds to the number of bits needed to activate the data part as well as control part. The 16-bit microinstruction format is shown in figure 6.17.

Figure 6.17 16-bit Microinstruction Format for the GCD Microprogrammed Control

Figure 6.17 shows nine data part control signals which is on bit each, followed by the three control part signals (seq_ins, c_sel and br_adr) which are 2-bits, 2-bits and 3-bits respectively. Step 5: Design the Microsequencer The role of the microsequencer is to generate the next address. This address is used to fetch the microinstruction from the control ROM. This process is quite analogous to the contents of the program counter being used to fetch the instruction from the program memory. Thus the register/counter that stores the current microinstruction address is called the microprogram counter (or µPC). An important characteristic of the microsequencer is the range of instructions, referred to sequencer instructions (or seq_ins), it supports. Normally, there is a choice to pick up a standard microsequencer from the library or to design a custom one for your application. We take up the design of custom microsequencers from generic ones in section 6.8. For the GCD computer, we would design a simple microsequencer which meets the design requirements. The microsequencer is expected to support three sequencer instructions. These need to be encoded in 2-bits as we have assigned 2-bits in our microinstruction format. As we are designing our own microsequencer, the encoding is completely in our control. Table 6.4 specifies the encoding for the

S_R1

S_R2

L_R1

L_R2

L_R3

S_ OP1

S_ OP2

P_eoc

C_eoc

seq_ins c_sel br_adr

Data Part Control Signals (9 bits) Control Part Signals (7bits)


three sequencer instructions. As 2-bits give four combinations or codes, we have used two of these to encode conditional jump. Please note condition itself is selected using another control signal field (c_sel) in the microinstruction.

Table 6.4 Microsequencer Instruction Encoding Now we are in a position to draw the block diagram of the microsequencer. The heart of any microsequencer is the next address multiplexer which generates the address for the microprogram ROM. The complexity of the micosequencer can be seen from the number of options this multiplexer has. The µPC is always clocked (though clock is not shown in figure 6.18) and it stores the address currently generated into the register and also outputs the next address (current address + 1) as an option for the multiplexer. In this case the only other address is the branch address (br_adr) which is one of the fields in the microinstruction format and is stored in the ROM. Figure 6.18 shows the microsequencer for the GCD computer.

Figure 6.18 Microsequencer Design

Sequencer Instruction (seq_ins)

Encoding

CONT 0 0 JMP 0 1 CJMP 1 X

+1

REG

MUX

Select Logic

next_adr (next address)

3

3

3

3

2

br adr

seq-ins

cond

µPC Microsequencer

0

1


Now to complete the microsequencer design, only requirement is to complete the design of the select logic. The truth table of the select logic is shown in table 6.5 with the explanation for each of the four rows.

Table 6.5 Truth Table for the Microsequencer Next Address Selection Logic

The logic for na_sel (or S) can be expressed as S = I1’.I0 + I1.C With this implementation of the next address select logic, design of the microsequencer is complete for the GCD Computer example. Step 6: Generate the Binary Microcode The last step in generating the microprogrammed control is the generation of the binary microcode or the contents of the control ROM. For the GCD example it is a 7X16 binary matrix that needs to be generated. For this we use the symbolic microprogram enumerated in step 3, the microinstruction format described in step 4 along with the encoding of the various fields which relates the control bit values to various operations. If a component is inactive in a control step, the microinstruction would contain a default value for the corresponding control signal. Here it is significant to note that the default value for the signals are of two types. For control signals like multiplexer select, the default value is X (or don’t care) if it is not to select a specific input in a control signal. On the other hand control signals like load enable of a register, the default value would be “0” if it loads with value “1”. Thus the range of values a bit can take in the binary microprogram is either {0,1,X} or {0,1} depending on the nature of the control signal. Though eventually all binary bits would be converted to “0” or “1”, it is beneficial to keep the “X”s as it may help optimize the microprogram as outlined in section 6.7. Table 6.6 shows the contents of the ROM. In each of the seven control steps, the binary bits as well as the corresponding actions are enumerated. The ROM is made of 8 words and the last unused word contains the default values. Note that the control never reaches the last step. The table can be now be looked upon as 16 independent columns. It is easy to see that some of the columns are ‘compatible” with other columns, i.e. the two columns can be merged. Because of don’t cares, compatibility may not always mean they have to be identical. This also is exploited in reducing the size of the control ROM for optimizing the design. Now we make some comments regarding the flexibility of the microprogrammed control. To change the control without any hardware changes or rewiring, one needs to only reprogram the control ROM. This can be facilitated by a software tool that accepts the symbolic microprogram and generates the binary code for the control ROM. But one has to realize that only changes which are permitted by the underlying “structure” can be carried out in this manner. This “structure”

seq_ins Cond (C)

na_sel (S)

seq_ins, next_adr and comments I1 I0 0 0 X 0 cont, next_adr <= µpc + 1 0 1 X 1 jmp, next_adr <= br_adr 1 X 0 0 cjmp, next_adr <= µpc + 1; condition fails 1 X 1 1 cjmp, next_adr <= br_adr; condition passes


includes the complete control-data interface, implying the range of control signals and status signals and their definitions. e.g. if an additional condition needs to be tested in a modified algorithm, the same cannot be done by reprogramming the control ROM unless the condition can be brought on to one of the existing status signals. S_

R1 S_ R2

L_ R1

L_ R2

L_ R3

S_ OP1

S_ OP2

P_ eoc

C_ eoc

seq_ ins

c_ sel

br_adr Symbolic microprogram

0 X X 0 0 0 X X 0 0 1 X 0 0 0 0 0 cjmp (start’) M0

1 0 0 1 1 0 X X 0 1 0 0 XX XXX R1 <= ‘X’; R2 <= ‘Y’; eoc <= “0”; cont

2 X X 0 0 0 X X 0 0 1 X 0 1 1 1 0 cjmp (eq) M6

3 X X 0 0 0 X X 0 0 1 X 1 0 1 0 1 cjmp (gt) M5

4 X 1 0 1 0 1 0 0 0 0 1 XX 0 1 0 R2 <= R2 – R1; jmp (M2)

5 1 X 1 0 0 0 1 0 0 0 1 XX 0 1 0 R1 <= R1 – R2; jmp (M2)

6 X X 0 0 1 X X 1 0 0 1 XX 0 0 0 R3 <= R1; eoc <= “1”; jmp (M0)

7 X X 0 0 0 X X 0 0 0 0 XX XXX nop

Figure 6.6 Binary Microprogram for GCD Computer

6.7. Generic Microsequencer In section 6.6, we have shown the design of a microsequencer which is adequate to support the control flow requirements of the GCD algorithm. In this sub-section we would describe a more versatile generic microsequencer which can be customized for a specific application. Alternatively one can choose a standard microsequencer. A microsequencer extremely popular in the early 80’s was AMD2910 which has greatly influenced design of subsequent microsequencers. Customizing a microsequencer involves extracting application parameters and relating them to the various component specifications of the microsequencer. At the end of this subsection, we do list out some of the correspondence between the application parameters and microsequencer component specifications. Figure 6.19 shows a generic microsequencer. The “next address multiplexer” has four address inputs and that includes the inputs from “µpc” as well as “br_adr” that we have already described. The two new inputs are from the “stack” and the zero (“0”) input. Zero input is just for jumping to the location zero and can be used in the soft reset sense. The stack can store the return microinstruction addresses if your microprogram contains microroutines that could be invoked from multiple locations in the microprogram. A sequencer instruction like “call” can result in the next microinstruction to be stored in the stack and when the sequencer instruction “return” is invoked, the top of the stack is popped to return to the next microinstruction from where “call” was executed. The readers can see the analogy between this stack and the use of stack in execution of programs but one major difference should be noted. This stack can be used to store only the microinstruction return addresses whereas the stack in program execution is used for storing the state which includes data registers and many other information apart from the program counter value. Further, this being implemented in hardware with fixed size, the number of return addresses that can be stored is fixed.


Figure 6.19 A Generic Microsequencr The other major components that are included are one or more loop counters and associated zero detect logic. Many microprograms require executing a set of microinstructions a fixed number of times. The number of iterations could be a constant or could be decided before one enters the loop body. In either case it should be possible to implement such a loop within the microsequencer and not involving a status signal from the data part for checking and exiting the loop. The loop counter within the sequencer is loaded before entering the loop counted down for each loop iteration with the zero detect logic being used for generating the “status” signal internally for exiting the loop. In case the microprogram requires implementation of such nested loops, the same can be implemented by multiple loop counters either organized as a stack or independently. For our discussion we will restrict ourselves to only one loop counter. The width of this loop counter(s) defined as “m” has really nothing to do with “n”, the number of bits in the microinstruction address. “m” is dependednt on the maximum number of iterations of the loops to be implemented whereas “n” depends on the number of control steps in the microprogram. Still in some microsequencers (like AMD 2910), the two inputs corresponding to br_adr and lp_limits are combined to save on the number of pins. The same field in the microprogram used for specifying jump addresses can be used for defining loop limits. This is based on the assumption that the loop limits are so infrequently needed to be specified that sharing the field would unlikely to result in any significant performance loss in terms of additional microinstructions.

+1

REG

MUX

Decode and Select Logic

next_adr n

n

k

br adr

seq-ins

cond

µPC Generic Microsequencer

0 1

Loop Counter(s) pXm

Zero Detect Logic

Stack (for return addresses) sXn

OE

lp-limits “0”

n

n

nn

m

n


The other component that has been introduced is a tri-state buffer at the output which enables either multiple microsequencers or other address sources to generate addresses for the same control ROM. The select logic has been upgraded and it also decodes the microsequencer instruction to generate control for some of the other components like stack (push/pop) and loop counters (load/decrement). Table 6.7 describes a generic set of instructions and associated actions. More complex sequencer instructions are possible but are left out for simplicity. The role of the decode and select logic is not only to generate the appropriate select for the next address multiplexer but also to generate the control signals indicated in the other actions column. Again the clock is not shown but µpc as well as other components like loop counter and possibly stack are clocked.

Sequencer Instruction

Next Address Other Actions

cont µpc + 1 jmp

br_adr

cjmp If (cond) then br_adr else µpc + 1

jzero “0” jsub br_adr push (µpc + 1) into top of stack ret stack pop top of stack ld_cntr µpc + 1 load lp-limits in the loop counter rpnz if (zero_det’) then br_adr

else µpc + 1 decrement loop counter

Table 6.7 Microsequencer instructions and their associated actions for the Generic Microsequencer Parameter Parameter definition Application characteristics on which it is dependent n Width of microprogram

address Number of control steps (or states) in the microprogram. If this number is N then n = log2(N)

k Width of microsequencer instructions

Number of microsequencer instructions. If this number is K then k = log2(K)

s Depth of stack Maximum nesting level of microroutines defines the depth of stack required for return addresses

m Width of loop counter Maximum number of iterations. If this number is M then m = log2(M)

p Number of loop counters Maximum nesting level of loops defines the number of loop counters required for keeping loop counts.

Table 6.8 Generic Parameters and their dependence on Application Characteristics

The other important feature that makes the microsequencer shown in figure 6.19 generic is that widths of various signals can be tuned depending on the application requirements. Table 6.8 describes how various signal sizes can be determined from application requirements.


To complete the understanding of the microsequencer design, we include a timing diagram which shows the clock relationship between various signals. All the sequential components in the control as well as data part are considered to be rising-edge triggered on a single clock and thus the clock period should be more than the longest combinational path spanning both data and control parts. Figure 6.20 shows the clock, sequence instruction, next address output of the sequencer and the microinstruction output of the control ROM. The shaded area in seq_inst corresponds to the microinstruction register delay and the larger shaded area in the next_adr shows the additional delay of the microsequencer. The microinstruction should appear at the input of the microinstruction register after the control ROM delay and that should be at least setup time of the microinstruction register before the rising edge of the clock. No delay is shown in the microinstruction (otherwise it should also be similar to seq_inst) as it shows only notionally which microinstruction is active in a particular clock cycle. The control sequence shown in the diagram refers to the sequencer instructions shown in table 6.9.

Figure 6.20 Microsequencer timing diagram

Microinstruction address

Sequencer instructions

Branch address

a cont - a + 1 jmp b b Jsub c c Ret -

Table 6.9 Microsequencer control for timing diagram in figure 6.20

next_adr

clock

a+1 b c b+1 a

microinstruction

MI[a] *** MI[a+1] MI[b] MI[c]

cont jmp jsub ret cont Seq_inst


The table shows four microinstructions and the corresponding sequencer instructions as well as the branch address contents of the sequencer instruction. Before winding up the discussion on microsequencers, let us consider the issue of generating multi-way branches in a microprogrammed control. In a state machine representing conrol flow, it is a common occurrence that a multi-bit condition is tested and a multi-way branch is to be implemented. A special case of such a branch requirement occurs in processor design. Once the instruction is fetched and gets stored in the instruction register, the decoding of the opcode and braching to the set of microinstructions corresponding to each opcode corresponds to a multi-way branch of the size of number of instructions. Sequential decoding by testing one bit of opcode at a time would take many cycles and too slow to be even considered. There are two major options for implementing multi-way branches. Implement using multiple fields in the microinstruction for specifying branch address and

choose one of the branch addresses based on the multi-bit condition code. Using external logic, synthesize an “appropriate address” from the condition code and use this

as the branch address to jump to the appropriate location. The problem with the first approach is that it is not scalable. Each additional field would imply additional bits in the microinstruction format. As the number of microinstructions which have multi-way branch is umlikely to be large, the multiple branch address fields are likely to be very sparsely utilized. This approach may be used only for three way or may be four way branches in a control dominated design where such branches appear frequently. The second approach is efficient as well as scalable. It is ideally suitable for multi-way branching typically in an instruction decoding sense. We will illustrate by showing how a sequencer like AMD2910 implements it for multi-way branch in an instruction opcode decoding scenario. Details are shown in figure 6.20. From the opcode field of the instruction register, a mapping ROM translates each opcode to its corresponding microinstruction address. Where such a multi-way branch is to be executed, the microsequencer gets an instruction called “jmap”. It is identical to the “jmp” instruction except that it also activates an external signal named “map” to go high. The br_adr field of the microinstruction is multiplexed with the MAP ROM output and the “map” signal from the sequencer is used as the select of this multiplexer. Thus when the “map” signal is active, the br_adr for the sequencer comes from the MAP ROM effecting a multi-way branch. In this case the microinstruction address depends on the opcode and thus in one step, a large multi-way branch can be implemented. It is not always that one need to use a MAP ROM for implementing multi-way branch. If the “appropriate” microinstruction address can be formed by just pre-fixing and suffixing a set of fixed bits, then the MAP ROM itself can be avoided. In this case positioning of microinstructions corresponding to each of the opcodes would become inflexible.


. Figure 6.20 Microsequencer extended for Instruction decoding using a multi-way branch 6.8. Microprogram Optimization: Definition The basic objective of microprogram optimization is to increase speed as well as reduce cost. To be able to achieve this we consider two types of optimization; vertical and horizontal optimization. Vertical Optimization

The speed of execution depends on the product of number of clock cycles and clock period, and thus reducing the number of clock cycles would increase speed. This can be achieved by reducing the number of microinstructions executed by the microprogram and is referred to as “vertical optimization”. The term vertical comes from the fact that it reduces the “height” (/the number of words) of the control ROM. This also reduces the size of the control ROM and in effect reducing the cost (/the area of the circuit) is reduced. Micro-operations activated by individual control signals represent the parallelism in data part. Increasing the data part resources would potentially increase the number of micro-operations that can be executed in one control step and thus reduce the number of microinstructions. This increased parallelism has also to be supported by the microinstruction format; that is the encoding of control signals should be such that these micro-operations can invoke the corresponding data units simultaneously. Example 6.1 illustrates these choices explicitly.

µ - Seq-uencer

µ-pgm ROM 8X16

µ-reg

Data Part

Instruction register

Map ROM

opcode


Horizontal Optimization The area or the cost can also be reduced by reducing the width of the control ROM. This is referred to as horizontal optimization. This can be accomplished by encoding the control signals in fields to reduce the width of the microinstruction format. With a control ROM of size mXn, every one-bit reduction in width (parameter n) would imply a cost reduction equivalent of m-bits. This may or may not effect the potential parallelism as discussed in the following example. In this optimization, mainly attempt is made to encode disjoint control signals in one field so that the impact on speed is minimized. 6.9. Microinstruction Encoding example Example 6.1 Let us consider the data part shown in figure 6.21(a). This shows three source registers and four destinations registered for a bus “S”. The data part is activated by seven control signals which correspond to three source enables and four destination loads. Figure 6.21(b) shows the corresponding control part which generates the required seven control signals. This example deals with various ways of deciding the width “k” of the microinstruction format. We also present the considerations in choosing between various choices.

Figure 6.21(a) Data Part for example 6.1

Figure 6.21(b) Control part for the data part in figure 6.21(a)

A B C

D E F G

Bus S

en_A en_B en_C

ld_D ld_E ld_F ld_G

µ I N S T

R E G

k

DECODER

en_A en_B en_C ld_D ld_E ld_F ld_G


Now we present four different microinstruction formats or encoding of the seven control signals for the same combination of control and data part. These in some sense represent the range of possibilities though there are many more combination of encodings that are possible. Encoding 1: Figure 6.22 shows a simple encoding where seven individual bits are used for representing the seven control signals. These seven bits can directly be connected to the seven control signals and thus eliminate the need for a decoder. It supports maximal parallelism that is present in the data part. Such a format is referred to as a horizontal format with the corresponding microprogram called horizontal microprogramming.

Figure 6.22 Microinstruction Format for Encoding 1 (/horizontal format)

Encoding 2: Figure 6.23 shows the other extreme in terms of encoding. The data part supports 12 different individual microoperations corresponding to transfer between each of the three sources and four destinations. Another combination that needs to be encoded is “noop” or no transfer. These thirteen data transfer combinations would require four bits to encode. The decoder would be complex as it would need to decode these four bits to generate the seven control signals. Such an encoding almost resembles an “assembly instruction” and is referred to as the vertical format with the corresponding microprogram referred to as vertical microprogramming. The parallelism in the data part is compromised. The data part can support transfer between one source and multiple destinations but the same is not supported by the microinstruction format. This would impact the performance as the number of microinstructions to be executed would increase if such multiple transfers were possible in the application code. The symbolic microinstruction mov x,y with x = {A,B,C}, y = {D,E,F,G} represents the 12 transfers and an additional noop is required.

Figure 6.23 Microinstruction Format for Encoding 2 (/vertical format) Encoding 3: We next consider an encoding which minimizes the width without compromising on the concurrency that is available in the data part. Bus S is a shared resource and even though separate enable bits are provided for the ‘drivers”, only one of them can be enabled in one bus transfer cycle. In a single clock system, all transfers are synchronized to this clock and thus we can provide

en_A en_B en_C ld_D ld_E ld_F ld_G

mov x,y

7

4


a field which encodes all the enables together in two bits. A format with this type of encoding is shown in figure 6.24.

Figure 6.24 Microinstruction Format for Encoding 3

The format takes 5 bits with 2-bits for the field src_S. the encoding of this field is also shown in the diagram. It reduces the number of bits without compromising on the concurrency available in the data part and is referred to as minimal encoding. Encoding 4: Encoding 3 also supports transfer from one source to multiple destinations simultaneously. In fact the destination set can be any subset of the set {D, E, F, G}. An application microcode may or may not require complete flexibility. Consider an application which requires only a few of the multiple transfers. e.g. from the one source register normally transfer occurs to only one destination register but there are specific microinstructions where data from one source register is simultaneously transferred to destination registers D&E or registers E&F simultaneously. Given the same data part which supports simultaneous transfers to all registers, the required flexibility is defined by the application code. Using this application requirement, one can still minimize the format. A 5-bit format is shown in figure 6.25. This does not completely preserve the concurrency of the data part but preserves to

Figure 6.25 Microinstruction Format for Encoding 4 the extent required by the application code. Now we follow up with summarizing the space and time implication of the four encodings. Comparison of the four encodings: When we compare different encodings, it is easy to comment on the area as it is clearly additive; that is increase in bits would imply larger control ROM and increased encoding would imply larger area for decoders. On the other hand, delay is more difficult to comment on. Decoders increase the delay path of the control signals originating from the microinstruction and being applied to the data part components but whether they contribute to increasing the actual delay of the circuit or not depends on whether this path is critical or not. In case it is not the critical path and clock period is not determined by this then decoder delays would

src_S ld_D ld_E ld_F ld_G

src_S encoding 00: None 01: en_A 10: en_B 11: en_C

src_S dst_S

src_S encoding 00: None 01: en_A 10: en_B 11: en_C

5

6

dst_S encoding 000: None 001: ld_D 010: ld_E 011: ld_F 100: ld_G 101: ld_D, ld_E 110: ld_E, ld_F 111: unused


have no impact on the overall time period. Further, if any encoding restricts the concurrency available would imply it would result in increased number of microinstructions and in effect number of clock cycles. This would impact overall performance but the same can be quantified only by analyzing the application microcode; i.e. how many extra cycles would be required. Table 6.10 summarizes the four encodings discussed in this section. In some sense they represent the complete range though an individual encoding may not be purely based on anyone of them. Encoding Characteristics Area Delay

Control ROM

Decoder Delay of control signal paths

Number of clocks

Encoding 1 (Horizontal)

All control signals have separate bits – no encoding

Maximal Nil Minimal Minimal

Encoding 2 (Vertical)

One micro-operation in one microinstruction

Minimal Maximal Maximal Maximal

Encoding 3 (Minimal)

Encoding done while preserving the concurrency of the data part

High Low Medium Minimal

Encoding 4 (Application minimal)

Encoding done while preserving the concurrency of the data part required by the application microcode

Low High Medium Minimal (for a specific application )

Table 6.10 Comparison of four encodings

6.10. Microprogram Optimization Algorithm* Microprogram optimization has been a well researched area in the 70’s and early 80’s. Most optimization techniques try to encode the control signals in the smallest number of bits so that the width of the microinstruction can be reduced. The basic principle is to find control signals or microoperations which are disjoint and encode them together in one field. The “disjointness” could be based on the analysis of the data part or analysis of the symbolic microproprogram. This would make the optimization either micro-architecture dependent or application dependent. Such formal techniques need to be used when the data part is complex and the number of control signals are quite large. It is not unusual to have 20 control signals in a data part with considerable concurrency. A popular technique to locate clusters of control signals which are disjoint is using the technique of clique partitioning. A graph G(V,E) is defined where each control signal is a vertex (v) and each edge (eij) between two vertices (vi and vj) imply the two are disjoint. A clique is a “maximal complete subgraph” i.e. a set of vertices which are all connected to each other and there is no other vertex which is connected to all the nodes of the subgraph. Each such clique can be encoded into a field of the microinstruction format to reduce the number of bits. All such vertices with its associated edges are removed from the graph G(V,E) and the process repeated to identify the next clique and the corresponding microinstruction field. There are many algorithms that have been


proposed for “clique partitioning” and that is beyond the scope of this book. Readers can consult texts on graph algorithms for this purpose. The real problem of horizontal optimization has many complexities that need to be addressed by any optimization technique. These include Not all control signals are one bit wide and combining control signals with different widths

have implication on encoding that is different The nature of control signals is different and again one cannot combine them together

effectively. Some control signals have a default value of X but other can take only values “0” or “1”

During clustering reduction in width (and thus cost) is not linear as the width of the field is determined by the logarithm with base 2 of the cardinality of the cluster membership. This also has to be captured in the cost function for optimization.

Example Now we illustrate the horizontal optimization using an example. In our example we are still simplifying from the real-life situation by assuming that all control signals are 1-bit wide and all of them take only values from the set {0,1}. It should be possible to address both the issues indirectly. Convert all multi-bit signals to multiple 1-bit signals and make them active simultaneously Consider all signals which can take values {0,1, X} to be active in both the conditions when it

takes a value 0 and when it takes a value 1 and use the fact that it takes X during unused cycles only during the post-processing stage i.e. finding column compatibility.

Both the above methods of handling the issues are sub-optimal but for the introductory text we would confine to this solution. Let a symbolic microprogram of 5 control steps with 8 control signals {a, b, c, d, e, f, g, h} be as follows.

Microinstruction Active control Signals 1 {a, c, e, f} 2 (b, c, d, g} 3 {a, f, g, h} 4 {c, d, e} 5 (a, f, g}

Table 6.11 Example Symbolic Microprogram

Figure 6.26 shows a compatibility graph among these control signals. An edge exists between two nodes (which are control signals) if they are not invoked ever in the same cycle. From this graph one can extract cliques (maximal complete subgraphs) and encode them into one field. From the graph of figure 6.26, the following compatible sets can be extracted to create a partition.

Partition P = ({a, d}, {b, e, h}, {c}, {f}, {g})

Fields and encoding can be defined for each set to reduce the width of the microinstruction. At this stage as the number of bits required for a field depends on log2 of the cardinality of the set + 1 (where 1 is to take care of the noop), combining two control signals my not be useful. This again


depends on the nature of the signals whether the noop needs to be encoded or not but that is left as an exercise for the readers. Once from the definition of the fields, binary code can be generated, then column compatibility can be applied. In generating the binary code, the nature of the signals should be taken into account. The column compatibility problem can be solved again by drawing a graph where each node is a column and the two nodes have an edge between them if and only if they are compatible. In defining compatibility, control signals which when not active can take don’t care (or X) value can play a significant role. All compatible signals can be merged to form a single column. Please note that this would imply fanout of this microinstruction bit to multiple control signals.

Figure 6.26 Compatibility Graph for Example Symbolic Micropogram 6.11. Complex Microprogramming Structures* There are more complex microprogramming structures that have been used in system design. We introduce two of them here. Multi-level Microprogramming Some microprogramming schemes employ what is referred to as nano-programming. This is nothing but one level of indirection to get the actual control signals which are stored as nano-instructions. The microinstruction in these schemes is used as an address to access the nanomemory which contains the nanoprogram as a set of nanoinstructions. In some sense this is also a mechanism to reduce the control memory size. If the number of control signals is large and they appear repeatedly in a relatively small set of combinations, it may be preferable to define them as nano-instructions with each such combination accessed by a unique address that is stored in the microprogram memory. Figure 6.27 shows a two level control part with microinstructions pointing to the nanoinstructions which in effect are the control signals. It is not even necessary to have the microinstruction purely as an address to nanomemory, part of it can be the address and

a

b

c

de

f

g

h


the remaining bits/fields can be used some of the control signals which do not “cluster” with signals in the nanoinstructions (in terms of few combinations). Nanoinstructions can also be encoded but as it is already two level of control memory, normally it is kept near horizontal.

Figure 6.27 Example Two Level Control Part with Nanoprogram Memory Figure 6.27 shows an example of a two-level control memory which saves control ROM bits and area. 120 control bits which have only 150 distinct combinations are extracted out from the microprogram memory and stored in a nanoprogram memory of width 120 and “height” 150. the nanoprogram memory is addressed by introducing an 8-bit field in the microprogram memory. In this example the original microprogram memory had 600 words of 152 bits amounting to a total memory of 91200 bits whereas in the new two level structure total memory size is 42000 i.e. (600*40 + 150*120). This comes at an additional delay in accessing the nanomemory that may or may not be in the critical path to determine the clock period. It would also be use pipelining to overlap the access time of the two memories but would introduce additional cycle for decision making and conditional jumps. Multi-level Encoding

Figure 6.28 Example of Multi-level Encoding

Till now all encoding that we have considered are single-level encoding implying a set of control signals are encoded in a field of the microinstruction format. To reduce the width of the microinstruction, one can employ multi-level encoding as well. One example could be multiple microinstruction formats similar to the instruction formats used with processors. One or more bits, that can be referred to as mode bits, define the format and the remaining bits encode different set

Micro- program memory

Nanoprogram memory

600150

120328

0 b c d a

1 f g e

Mode = 0: {a, b, c, d} Mode = 1: {e, f, g} a, b, c, g are 1-bit wide d, e, and f are 2-bits wide


of control signals depending on the mode bits. Now decoding becomes more complex as the definition of individual bits is controlled by the mode bits. Figure 6.28 shows an example of a two-level formatting. The mode bit determines whether the microinstruction encodes the set {a, b, c, d} control signals or {e, f, g} control signals. This encoding reduces the number of bits from 10 to 6. This format can be used only when the control signals in the two sets never occur simultaneously. Again it is possible to have formats which are not completely disjoint but that would add to the decoding complexity and delay.

Microprogrammed Control Unit Design

Documents

Transcript of Microprogrammed Control Unit Design