Hardwired & Microprogrammed Processor Desing

download Hardwired & Microprogrammed Processor Desing

of 49

description

Hardwired & Microprogrammed Processor Desing

Transcript of Hardwired & Microprogrammed Processor Desing

  • Control Unit :Hardwired vs. Microprogrammed ApproachDr Shankar BalachandranIndian Institute of Technology [email protected] October 2006

  • Two Major Blocks in a CPUDatapathAdders, multipliers, dividersShifters, RegistersAnything that changes or stores dataControl UnitControls the dataHow data is stored?Where is it stored?When should data be available?

  • Control UnitCorrect sequencing of control signalsMuch like human brain controlling various parts of bodySequence and timing is the keyAny aberration will result in wrong operation

  • A Simplified Control UnitControl UnitFetch UnitDecode UnitExecution UnitWrite Back UnitFetchDecodeExecuteWrite Back

  • A Possible Implementation2 to 4DecoderCLKMod-3 Counter

  • Timing DiagramCLKFetchDecodeExecuteWrite Back

  • Lets Sample The Signals1000010000100001

  • Another Way to Generate Signals1 0 0 00 1 0 00 0 1 00 0 0 1

  • Hardwired vs MicroprogrammedHardwiredUse gates to generate signalsSqueeze out the juice for performanceDifferent logic styles possibleMicroprogrammedStore the control signals in the sequenceJust read from the memory every clock cycle

  • A Model Computer (Richard Eckert, SIGCSE Bulletin, Vol. 20, No. 3, September 1988)AccumulatorALURegister BPCMARMDRRAMIRControl8812121212121212412BusRWLMIPLPEPLDEDLAEASAEULBLIEI

  • More DetailsL = LoadE = Copy to busA,S = Add and SubtractSign bit to control unitIP = Increment PC

  • MnemonicOpcodeActionRegister TransfersActive Controls

  • Hardwired UnitIRDecoderControl MatrixLDASTAADDSUBMBAJMPJNRing CounterNFT5T1HaltOpcodeControl SignalsCLK

  • Table with SequencingIP = T2; R=T1+T4*LDA; LI=T2;LP = T3*JMP+T3*JN*NF; W=T5* STA; A = T3*ADD;EP = T0; LD = T4*STA; S = T3*SUB;LM = T0+T3*LDA+T3*STA ED=T2+T5*LDA; ..

  • Control MatrixImplement using discrete gatesUsually done using PLAsLarge control matrices are implemented hierarchicallyFor speedA well known process and design flows are widespread

  • An Alternate ImplementationIRStartingAddressGeneratoruPCControl StoreCLK+1MicroinstructionRegister+NF&CDMAP1*0100Control32 x 24HLTControl ROMJump Address4-bit opcode

  • Control StoreInstructionOp-CodeuInstructionAddressControl SignalsCDMAPHLTAddr. Of NextControl Word

  • Example 1 MBA followed by ADD0B09LBEUSAEALAEILIEDLDWRLMEPLPIP

  • Sequence for MBA,ADD1. MAR PC2. MDR M(MAR)3. IR MDRBA1. MAR PC2. MDR M(MAR)3. IR MDRAALU(Add)

    00110000000000000011000000000000000010000000000000001000000000001000000110000000100000011000000000000000000100010000000000101010MOV B,AADD

  • Example 2 JN with Flag Set0DCDIf negative FLAG is set, jump to a new location by skipping to uInstruction at 0FLBEUSAEALAEILIEDLDWRLMEPLPIP

  • Example 3 JN with Flag Not Set0DCDCDLBEUSAEALAEILIEDLDWRLMEPLPIP

  • Lets Review the Microprogramming ModelStore the microprogram in control storeFetch the instructionGet the set of control signals from the control wordMove the microinstruction addressLather, Rinse, Repeat

  • What is Microcode?Michael Slater's "Microprocessor Based Design" (pg.42): Microcode tells the processor every detailed step required to execute each machine language instruction. Microcode is thus at an even more detailed level than machine language, and in fact defines the machine language. In a standard microprocessor, the microcode is stored in a ROM or a programmable logic array (PLA) that is part of the microprocessor chip and cannot be modified by the user.'

  • Thought Experiment

    Why is the design a little clumsy?What can we do about it?

  • Reason for ClumsinessJN Conditional Flag checkWithout any condition check, the whole process is very smoothSolution Avoid all conditional checks

  • Real LifeA little American Football StoryTheory vs. PracticeIn theory, there is no difference between theory and practiceIn practice, theory and practice are two different things altogetherLive with condition checksKeep designs as clean as possible

  • A General Approach

    IRStarting and BranchAddressGeneratoruPCControl StoreControl WordExternal InputsConditional Codes

  • Format of MicroinstructionsPick yoursYour choice is as best as your neighborsWhat we did :One bit position per control signalOrder of the bits ?Dont matterCan result in long microinstructionsNot the number of microinstructions, but the width

  • A Note About DensityObserve that only a few bits are set to 1Poor usage of bit spaceThis scheme is called Horizontal MicroprogramAlternate Version : Encode the bitsVertical Microprogram

  • Vertical MicroprogramEncode the bits by grouping similar elements togetherGeneral Idea :Group similar resources togetherThere can be only one source or destination registerSome operations are mutually exclusiveRead vs Write of memory

  • Design IssuesEncoding reduces the bit-spaceBut requires decodersCost of decoder vs bit-spaceUsually decoder cost is very low

  • Another IdeaGroup concuurently active signalsEvery meaningful combination gets a codeComplex decoder to interpret every code

  • Vertical vs HorizontalHorizontal FasterMore areaMore common currentlyCheap transistorsVerticalSlowerMore microinstructions

  • MicrosequencingOther ways to save on hardwareEvery instruction had its own microprogram sequenceAlso, instructions have several addressing modesOnly the first few microinstructions differCan we share microcode?

  • A Powerful Technique in SharingBit-ORingExampleTwo instructions share some microcodeEventually, must branchThe default branch (one instructions) is X0The other branch is stored at X1Change the least significant bit(s?) to get a new addressCompare that with :Having two conditional branchesStore two fields, one for each branchBoth very unclean

  • Thought Experiment :What if we provided explicit branch instead of storing next field in our microprogram?Typical instruction set will need a lot of branchesLot of time will be wasted on branching

  • A Pat on Our BackWe provided explicit field for addressBranch location is now dataIt is already savedCaution :Microinstruction can get very wideSolution :There is no free lunch.

  • Can we pipeline microfetch?A neat idea :Why wait till the current micro-op is over?Branch field gives next operationGet the next opCaveat :External inputs and status flags may change the orderWhat about interrupts?They are going to follow you everywhereShould have a mechanism that can invalidate microcode prefetchSimilar to pipeline flush for instructionsCommonly used

  • Historical PerspectivesHardwired LogicPopular before 60sOnly way people did itPopular nowSpeed BenefitsMicroprogramPopular in 70sMemory was slower than CPUNo on-chip cacheBest way is to store the microcodeNow Depends on who you ask?Shades of gray :Extremes of spectrum are harder to find nowadays

  • Tools for DesignHardwiredAny state machine optimizerAssigning states, minimizing tranisitions, races, hazards,..MicrocodingSmall ones can be in binaryLarge ones Use microassemblerVery useful debug toolCan use microassembler simultaneously with actual hardware development

  • Hardwired vs MicrocodingHardwired units are faster and smallerEmulation is easy with microcodingHardwired design is complex if largeBugs in hardwired design cannot be fixed in fieldHardwired control is not suited for loopsLooping with microcode can be made as fast

  • Hardwired vs Microcode vs RISCRISCSimpler instruction setHardwired ImplementationRISC instructions are like microcodesInstructions come from I-Cache instead of Control StoreDifference :Contents are not fixedAdvantage : Only load what you want on the I-CacheKeeps size smaller as compared to Control Stores

  • Microprogram vs SoftwareImagine Floating Point DivisionSolution 1 : Write in softwareLong processError proneMany fetches repeatedly from memory for the given sequence of operationsSolution 2 : MicrocodeLong process too but designers not programmersRelatively error free more thorough designRequires many cycles but fetched and used locally

  • EmulationA very common use of microcodingIBM System/36032 bit architecture16-bit registersSecret :Most implementations were 8-bitKeep cost lowHeavy microcodingProgrammers obliviousIn 1992, International Meta Systems (IMS) announced the 3250Designed to emulate the x86, 68K, and 6502 architectures Uses customizable microcode, among other techniquesWent bust, never released

  • Another Interesting NoteWritable Control StoreWhat if you, a programmer, can write your own control store?Not a mad scientist thoughtImplemented inVAX 8800PDP-11/60IBM System/370

  • Current TrendsMicrocode UpdateLinux Utility - microcode_ctlCompanion to IA32 microcode driverIt decodes and sends new microcode to the kernel driver to be uploaded to Intel IA32 processorsUpdate is volatile lost on rebootsMicrocode updates are also rolled into BIOS updates typicallyReady even before an OS is loaded

  • Intel Said..The Pentium(R) Pro processor and Pentium(R) II processor maycontain design defects or errors known as errata that may cause theproduct to deviate from published specifications. Many times, theeffects of the errata can be avoided by implementing hardware orsoftware work-arounds, which are documented in the Pentium Pro Processor Specification Update and the Pentium II ProcessorSpecification Update. Pentium Pro and Pentium II processors include afeature called "reprogrammable microcode", which allows certain typesof errata to be worked around via microcode updates. The microcodeupdates reside in the system BIOS and are loaded into the processorby the system BIOS during the Power-On Self Test, or POST.

  • Current TrendsHyperthreading in P4A second logical CPUComplete state of the system in both CPUsMicrocoding in P4Two pointers control flow independentlyBoth processors share the ROM entriesAccess is alternated between the CPUs

  • Thank You

    The Add and Sub microroutines are different from what is there in the Eckerts website