Post on 10-Jan-2016
description
Chapter 1 Microcomputers and MicroprocessorsMicroprocessor Evolution and Performance
ContentsIntroduction to microcomputer systemMicroprocessor evolutionthe INTEL processor familyMicroprocessor performance
Introduction to MicrocomputerAn microcomputer can be interpreted as a machine with:I/O devices for Input/Output,microprocessor for processing,memory units for storageBuses for connecting the above componentsIn 1970, a microcomputer was normally interpreted as a computer considerably smaller than a mini-computer, possibly using ROM for program storage
Basic hardware unitsInpute.g. keyboard, mouseMicroprocessore.g. 8085, 8086, mc68000 microprocessorsMemorye.g. RAM, hard diskOutpute.g. monitor, printer
BusesBuses: External connections to input/output unitMajor Buses:Address bus: address of memory locations containing instructions or dataData bus: contents of memory locationsControl Bus: synchronization and handshaking between components
General ArchitectureInputunitMicroprocessingunitOutputunitSecondarymemoryPrimarymemoryMemoryUnit
Processor HistoryVacuum Tubes to ICs
First Generation ComputersVacuum tube technologyLarge room, air-conditionedTube life-time: 3,000 hoursUseless Machine?1951: 1st Univac I (UNIVersal Automatic Computer) delivered1952: Prediction of presidential election by CBS1952: IBM Model 710 Data Processing System
Second Generation ComputersThe Transistor Is Born (Solid-State Era)1948: invention of bipolar transistors1956: Nobel physics award: Drs. William Shockley, John Bardeen and Walter H. Brattain (Bell Labs)1954: Bell Labs: all-transistorized computer (TRADIC)800 transistorsMuch less heatMore reliable and less costly
Second Generation ComputersMainframe Computers1958: IBMs 1st transistorized computer 7070/70901959: 1401 (business-oriented model)Built on circuit boards mounted into rack panels, or framesMain frame (mainframe): the CPU portion of the computerPopular with business and industry
Third Generation ComputersInvention of IC: 1959Dr. Robert Noyce (Fairchild) and Jack Kilby (TI)Kilby: fabricating resistors, capacitors and transistors on a germanium wafer, and connecting these parts with fine gold wiresNoyce: isolating individual components with reverse-biased diodes, and deposing an adherent metal film over the circuit, thus connecting the components1st IC: 2-transistor multivibratorBy mid 1960s: memory chips with 1,000 components are common
Third Generation Computers1964: IBM 360 Series (32-bit)The first to use IC technologyA family of 6 compatible computers40 different I/O and auxiliary storage devicesMemory capacity: 16K words to over 1MB.32-bit registers x 1624-bit address bus128-bit data bus
Minicomputer1960s: Space Race between US & USSRIC industry boomA tremendous demand by scientists and engineers for an inexpensive computer that they could operate by themselves1965: DEC PDP-8 (by Edson de Castros group)Low-cost ($25,000) minicomputer12-bit16-bit PDP-11Supermini
Microprocessors: CPU on a Chip1968: INTEL (Integrated Electronics) Founded by Robert Noyce and Gordon Moore (Fairchild)Original goals: semiconductor memory market1969: customized ICs for Busicom for calculatorTed Hoff and Stan Mazor: proposed 4-bit CPU on a single chip, plus ROM, RAM chips
Microprocessors: CPU on a Chip1971: 4000 FamilyBy Fredrico Faggin4001: 2K ROM with 4-bit I/O port4002: 320-bit RAM, 4-bit output port4003: 10-bit serial-in parallel-out shift register4004: 4-bit processorProcessor-on-a-chip: Micro-processor era
Microprocessors: CPU on a Chip1972: 8008, 8-bit1974: 8080, an improved version
Microprocessors: CPU on a Chip8-bit CPUs16-bit address (64K)MC6800: Motorola6502: MOS Technology (spin-off from Motorola)Apple-II, Apple DOSZ-80: Zilog (spin-off from Intel)Z-80 cards on Apple-II, CP/M
Microprocessors: CPU on a Chip16-bit CPUs (Late 1970s)8086, 80186, 80286: IntelPC, PC-DOS, MS-DOS, SCO-UnixMC68000: Motorola16-bit instructionsHardware multiply and divide20-bit address buses (1MB)Workstations: Sun3
Microprocessors: CPU on a Chip32-bit CPUs80386, 80486: IntelMC68020, 68030: Motorola64-bit CPUsPentium, Pentium Pro (64-bit external data bus, 32-bit internal registers, not recognized as 64-bit CPUs in terms of internal register word length)
Microcomputers: Computers Based on Microprocessors1975: MITS Altair 8800 (Kit)$399, i8080, programmed by depositing 1s/0s via front panel switchesOther Computers boom8080: MITS, 6800: SWTPC 6800, Z-80: TRS-80, 6502: Apple I, 8K, programmed with BASICSteve Jobs & Steve Wozniak, millionaires from PC COMs
Personal Computers: the Open Architecture Era1982: IBM PCA system board (mother board)Intel 8088 processor16K memory5 expansion slotsThird-party vendors to supply various IO adapter cardsOpen architectureComputer with interchangeable components
Micro-controllers: Microcomputers on a ChipMicrocontroller: a computer on a chipMicroprocessor, plusOn-chip memory, plusInput/output ports1995: microcontrollers out sold microprocessors 10:1embedded on various equipments:Thermostat, machine tools, communication, automotive, Evolution: getting greater IO capabilitiesIntel: MCS-51, MCS-96,
High-Performance ProcessorsSupercomputersAircraft design, global climate modeling, oil-bearing formation, molecular design of new drugs, financial behaviorCDC6600, 7600: Seymour CrayCray-1: 1976, the first true supercomputerECL, 128 KW power consumption130 MFLOPS (Pentium 100: 150 MFLOPS)$5.1 million
High-Performance ProcessorsParallel ProcessorsTens of gigaflopsMulti-processors wired by a common busEach is given a portion of the problem to solveHypercube: early 1980sCosmic Cube, iPSC (with i860/RISC chips)2D rectangular Mesh architecture: multiple processor at each nodeIntel: teraflops computer with 4500 nodes, each powered by 2 Pentium Pro 200.
RISC vs. CISCRISC: Reduced Instruction Set Computer (1980s)A small number of fixed-length instructionsSimple addressing modesA large number of registersInstructions executed in one clock cycleIntel i860 (Cray on a Chip)82 instructions, 32-bit long eachFour addressing modes32 general-purpose registers
RISC vs. CISCCISC: Complex Instruction Set ComputerA large number of variable length instructionsMultiple addressing modesA small number of registersMultiple number of clock cycles to executeIntel 8086Over 3000 instruction forms, 1-6 bytes9 addressing modes8 general-purpose registersExecution from 2 to 80+ cycles
RISC vs. CISCRISCControl unit is much simpler (simpler instructions, execution in 1 CLK)Faster execution with less total on-chip logicChip area: 10% (vs 50% for CISC)More area for register file, data and instruction caches, FPU, and co-processorPowerPC: 32-bit, by IBM, Apple, MotorolaSparc: for SunMicro workstations
Application-Specific ProcessorsDSP ChipsMostly for analog signal processingADC-DSP-DAC architectureAvoid processing analog signals using discrete circuits, involving capacitors and inductanceDSP: conduct complex mathematic functionsDigital filter, spectrum analysis
Application-Specific ProcessorsDSP Chip ArchitectureDifferent data/program areas: Harvard ArchitectureHardware multipliers and adders, optimized to execute on a single cycleArithmetic pipelining: several instructions operated at onceHardware loop controlMultiple IO ports for communication with other processors
Summary of Processor History1940s: Vacuum tube, large and consuming large power1950s: Transistor (1948-)1959: First IC (second industrial revolution)1960s: IC was popular to build CPUs.1971: Intel 4004 microprocessor (2300 transistors) Starts of the microprocessor ageLate 1970s: 8080/85
Summary of Processor History1980: RISC (reduced instruction set computer)CISC (complicated instruction set computer) vs. RISCCISC family: Intel 80x86, Pentium; Motorola 68000 seriesAll others are RISC series.
Evolution of INTEL Processors4004 (71)-Pentium Pro (93-)
INTELIntegrated Electronics1968: founded by Robert Noyce and Gordon MooreIA: Intel Architecture (e.g, IA-16, IA-32, IA-64) since 8008 (72) had became the de facto standardEvolution:Internal register sizesExternal bus widthsReal, Protected, and Virtual 8086 modes
4-bit Processors4004first microprocessorbecame available in 19714-bit microprocessor:4-bit registers & 4-bit data bus#transistors: 2250Min. feature size: 10 micronsAddress bus: 10 bits/1K0.06 MIPS (@ 0.108 MHz)No internal cache
8-bit Processors8008, 8080, 8085became available in 19748-bit microprocessor
8086: IA standardBecame available in 197816-bit data bus20-bit address bus (was 16-bit for 8080)memory organization: 16 segments of 64KB (1 MB limit)Re-organize CPU into BIU (bus interface unit) and EU (execution unit)Allow fetch and execution simultaneouslyInternal register expanded to 16-bitAllow access of low/high byte separately
8086Hardware multiply and divide instructionsExternal math co-processorInstruction set compatible with 8080/80858086: defined the 80x86 architecture
8086Not quite successful16-bit data bus: Requires two separate 8-bit memory banksMemory chips were expensive
8088: PC standardBecame available in 1979, almost identical to 80868-bit data bus: for hardware compatibility with 808016-bit internal registers and data bus (same as 8086)20-bit address bus (was 16-bit for 8080)BIU re-designedmemory organization: 16 segments of 64KB (1 MB limit)Two memory accesses for 16-bit data (less efficient)But less cost8088: used by IBM PC (1982), 16K-64K, 4.77MHz
80186, 80188: High Integration CPUPC system:8088 CPU + various supporting chipsClock generator8251: serial IO (RS232)8253: timer/counter8255: PPI (programmable periphial interface)8257: DMA controller8259: interrupt controller80186/80188: 8086/8088 + supporting functionsCompatible instruction set (+ 9 new instructions)
80286Became available in 1982used in IBM AT computer (1984)16-bit data busclock speed 25% faster than 8088, throughput 5 times greater than 808824-bit address bus (16 MB) (vs. 20-bit/1M 8086)
80286: Real vs. Protected ModesLarger address space: 24-bit address busReal Mode vs. Protected ModeReal Mode:Power on default modeFunction like a 8086: use 20-bit least significant address lines (1M)Software compatible with 28616 new instructions (for Protected Mode management)Faster 286: redesigned processor, plus higher clock rate (6-8MHz)
80286: Real vs. Protected ModesProtected Mode:Multi-program environmentEach program has a predetermined amount of memoryAddressed via segment selector (physical addresses invisible): 16M addressableMultiple programs loaded at once (within their respective segments), protected from read/write by each other
80286: Real vs. Protected ModesProtected Mode:Cannot be switch back to real mode to avoid illegal access by switching back and forth between modesA faster 8086 only?MS-DOS requires that all programs be run in Real Mode
Clock SpeedElectrical signals cannot change instantaneously (transition period required)System clock provides timing signal for synchronizationCannot be used to compare the performance of microprocessors with different instruction setse.g., a 66 MHz Pentium is twice as fast as a 66 MHz 80486
80386DX (aka. 80386)available in 1985, a major redesign of 86/286Compatibility commitment through 200032-bit data and address buses (4 GB memory)Real Address Mode: 1M visible, 286 real modeProtected Virtual Address Mode:On board MMUSegmented tasks of 1byte to 4G bytesSegment base, limit, attributes defined by a descriptor registerPage swapping: 4K pages, up to 64TB virtual memory spaceWindows, OS/2, Unix/Linux
80386DX (aka. 80386)Virtual 8086 mode (a special Protected mode feature): permitted multiple 8086 virtual machines-multitasking (similar to real mode)Windows (multiple MSDOSs)Clock rate:max. 40MHz, 2 pulses per R/W bus cycleExternal memory cache to avoid waitFast SRAM93% hit rate with 64K cacheCompatible instructions (14 new)
80386SX80386SX: (for transition to 32-bit)16-bit data bus/32-bit register24-bit address bus
80486DX1989: a polished 386, 6 new OS level instructionsvirtually identical to 386 in terms of compatibilityRISC design conceptsfewer clock cycles per operation, a single clock cycle for most frequently used instructionsMax 50MHz5 stage execution pipelinePortions of 5 instructions execute at once
80486DXHighly Integrated:On board 8K memory cacheFPP (equivalent to external 80387 co-processor)Twice as fast as 386 at any given clock rate20Mhz 486 ~= 40Mhz 386
80486SX80486SXNOT a 16-bit version for transition purposeno coprocessorNo internal cacheFor low-end applicationsMax. 33Mhz only
80486DX2/DX4: Overdrive ChipsProcessor speed increased too fastRedesign of microcomputer for compatibility becomes harderSolution: Separating internal speed with external speed, improve performance independently80486DX2/DX4 internal clock twice/three times (NOT four times) the external clock: runs faster internally
80486DX2/DX4: Overdrive ChipsSystem board design is independent of processor upgrade (less expensive components are allowed)Processor operate at maximum speed data rate internallyOnly slow access to external data operates at system board rateInternal cache offset the speed gap486DX2 66: 66 internal, 33 external486DX4 100: 100 internal, 33 external (3x)Overdrive sockets: for upgrading 486dx/sx to 486dx2/dx4 (with overdrive socket pin-outs)
Pentium: Superscaler Processoravailable in 199232-bit architectureSuperscaler architectureScaling: scaling down etchable feature size to increase complexity of IC (e.g., DRAM)10 microns/4004 to 0.13 microns (2001)Superscaler: go beyond simply scaling downTwo instruction pipelines: each with own ALU, address generation circuitry, data cache interfaceExecute two different instructions simultaneously
Pentium: Superscaler ProcessorOnboard cacheSeparate 8K data and code caches to avoid access conflictsFPPInstruction pipeline: 8 stageOptimized floating point functions5x-10x FLOPs of 4862x performance of 486 at any clock rate
Pentium: Superscaler ProcessorCompatibility with 386/486:Internal 32-bit registers and address busData bus expanded to 64-bits for higher data transfer rateCompare 8088 to 386sx transition
Pentium: Superscaler Processornon-clone competition from AMD, Cyrixdevelopment of brand identity by Intel
Pentium Pro: Two Chips in OneBecame available in 1995Superscaler of degree 3Can execute 3 instructions simultaneouslyOptimized for 32-bit operating systems (e.g., Windows NT, OS2/Warp)Two separate silicon die on the same packageProcessor: 0.35 u, 5.5 million transistors256KB(/512K) Level 2 cache included on chip, 15.5 million transistors in smaller area
Pentium Pro: Two Chips in OneOn Board Level 2 cacheSimplifies system board designRequires less spaceGains faster communication with processorInternal (level 1) cache: 8KPentium Pro 133 ~= 2x Pentium 66 ~= 4x 486DX2 66
Pentium Pro:Dynamic ExecutionDynamic execution: reduce idle processor time by predicting instruction behaviorsMultiple Branch Prediction: look as far as 30 instructions ahead to anticipate program branchesData Flow Analysis: looks at upcoming instructions and determine if they are available for processing, depending on other instructions. Determine optimal execution sequences.Speculative Execution: execute instructions in different order as entered. Speculative results are stored until final states can be determined.
Processor FutureWhats More from Moores Law?
Moore's LawIn 1965, Gordon Moore predicted that:
The number of transistors per integrated circuit would double every 18 months
He forecast that this trend would continue through 1975
Moores Law
Other MicroprocessorsMotorola familyfrom 6809 (Apple II) through 68040PowerPCjoint venture between Apple, IBM, and MotorolaRISC ProcessorsDEC Alpha, MIPS, Sun SPARC, etc.
CISC vs. RISCCISC (Complex Instruction Set Computer)CISC processors have a large versatile instruction set that supports many complex addressing modesmove complexity from software to hardwareRISC (Reduced Instruction Set Computer)RISC processors have a small instruction setmove complexity from hardware to software
Microprocessor PerformanceTwo main factors:
Respond timethe time between the start and completion of a task, also referred to as execution timeThroughputthe total amount of work done in a given time
MIPSMillion Instructions Per SecondMIPS = (Instruction count) / (Execution time in micro second X 106)It specifies performance inversely to execution timeFaster machines have a higher MIPS rating
Some Problems of MIPSCannot compare computers with different instruction sets, since the instruction count will certainly differMIPS varies between programs on the same computer
iCOMPAn index provided by Intel for comparison of performance of their 32-bit microprocessorsBased on a variety of performance components that represent integer mathematics, graphics, etc.Combine results of a set of software application benchmarks
Chapter 2Computer Codes, Programming, and Operating SystemsNumber SystemsComputer CodesProgrammingOperating Systems
Number SystemsDecimal: Base 10Binary: Base 2Octal: Base 8Hexadecimal: Base 16
MCS-51 Program DevelopmentEditorAssemblerLinkerSymbolConverterICETargetProgram.ASM.OBJ.HEX.SYM.SDT(X8051)(Link)(CVTSYM)
Chapter 380x86 Processor Architecture8086/88Segmented Memory8038680486PentiumPentium Pro
The 8086 and 8088Processor ModelProgramming Model
8086: IA standardBecame available in 197816-bit data bus20-bit address bus (was 16-bit for 8080)memory organization: 16 segments of 64KB (1 MB limit)Re-organize CPU into BIU (bus interface unit) and EU (execution unit)Allow fetch and execution simultaneouslyInternal register expanded to 16-bitAllow access of low/high byte separately
8088: PC standardBecame available in 1979, almost identical to 80868-bit data bus: for hardware compatibility with 808016-bit internal registers and data bus (same as 8086)20-bit address bus (was 16-bit for 8080)BIU re-designedmemory organization: 16 segments of 64KB (1 MB limit)Two memory accesses for 16-bit data (less efficient)But less cost8088: used by IBM PC (1982), 16K-64K, 4.77MHz
80186, 80188: High Integration CPUPC system:8088 CPU + various supporting chipsClock generator8251: serial IO (RS232)8253: timer/counter8255: PPI (programmable periphial interface)8257: DMA controller8259: interrupt controller80186/80188: 8086/8088 + supporting functionsCompatible instruction set (+ 9 new instructions)
8086 Processor Model: BIU+EUBIUMemory & IO address generationEUReceive codes and data from BIUNot connected to system busesExecute instructionsSave results in registers, or pass to BIU to memory and IO
8086 Processor ModelAddress Generationand Bus ControlInstruction QueueEUBIU
Fetch and Execution CycleBIU+EU allows the fetch and execution cycle to overlap0. System boot, Instruction Queue is empty1. IP =>BIU=> address bus && IP++2. Mem[(IP-1)] => Instruction Queue[tail++]3a. InstrQ[head] => EU => execution3b. Mem[IP++] => InstrQ[tail++]Maybe multiple instructionsRepeat 3a+3b (overlapped)
Waiting Conditions: Memory AccessBIU+EU: execute (almost) continuously without waitingWaiting Conditions: Accessing memory locations not in queueBIU suspend instruction fetchIssues external memory addressResumes instruction fetch and execution
Waiting Conditions: JumpNext Jump InstructionInstructions in queue are discardedEU wait for the next instruction after the jump location to be fetched by BIUResume execution
Waiting Conditions: Long InstructionsLong Instruction is being executedInstruction FullBIU waitsResume instruction fetch after EU pull one or tow bytes from queue
BIU: 8088 vs. 8086BIU is the major difference8088:data bus: 8-bit (vs. 16-bit/8086)Instruction queue: 4 bytes (vs. 6-byte/8086)Only 30% slower than 8086If queue is kept full
8086 Programming Model
8086 Programming ModelData Group:AX (AH+AL): AccumulatorBX (BH+BL): BaseCX (CH+CL): CounterDX (DH+DL): Data
8086 Programming ModelSegment Group:CS: Code SegmentDS: Data SegmentES: Extra SegmentSS: Stack SegmentSegment Registers:Base address to particular segments
8086 Programming ModelPointer/Index Group:IP: Instruction Pointer CSSI: Source IndexDSDI: Destination IndexESSP: Stack PointerSSIndex Registers:Index (offset) or Pointer to a Base address
8086 Flag WordFlag L SF ZF X AF X PF X CF AF: Aux. Carry: Carry/Borrow on bit 3 (Low nibble of AL)SF: Sign Flag: (0: positive, 1: negative)ZF: Zero Flag: (1: result is zero)PF: (Even) Parity Flag (even number of 1s in low-order 8 bits of result)
8086 Flag WordFlag H X X X X OF DF IF TF TF: Trap flag (single-step after next instruction; clear by single-step interrupt)IF: Interrupt-Enable: enable maskable interruptsDF: Direction flag: auto-decrement (1) or increment(0) index on string operationsOF: Overflow: signed result cannot be expressed within #bits in destination operand
Segmented MemoryLinear vs. SegmentedLinear Addressing:The entire memory is regarded as a wholethe entire memory space is available all the timeSegmented:memory is divided into segmentsProcess is limited to access designated segments at a given time
8086 Memory OrganizationEven and Odd Memory Banks16-bit data bustwo-byte / two one-byte accessAllows processor to work on bytes or on words (16-bit)IO operations are normally conducted in bytesCan handle odd-length instructionsSingle byte instructionsMultiple byte (and very long) instructions
8086 Memory OrganizationMemory Space:20-bit address busLinearly, 1M bytes directly addressableMemory BanksCan read 16-bit data (512K words) from even and odd-addressed simultaneouslyneed Two memory banks in parallelBHE control line: allows addressing even/odd banks or both
Memory Organization: AlignmentEndianess:One way to model multi-byte CPU registerAX AH+ALTwo ways to store operands in memoryBig-endian CPU: (IBM370, M68*, Sparc)High-order-byte-first (HOBF)Maps highest-order byte of internal registerlowest (1st) memory byte addressOperand addressaddress of MSBMOV R1, N N: 1st byte in memory & MSB of register
Memory Organization: AlignmentLittle-endian CPU: (DEC, Intel)Low-order-byte-first (LOBF)Maps lowest-order byte of register 1st memory byteOperand address address of LSB (1st memory byte)MOV AX, N N: 1st byte in memory & LSB of registerALN, AHN+1Configurable:Can switch between Big/Little-endian, orProvide instructions which convert 16-/32-bit data between two byte ordering (80486)
8086 Memory OrganizationAligned operandOperand aligned at even-byte (word/dword) boundariesAllows single access to read/write one operandThrough internal shift/swap mechanism, if necessaryMis-aligned words:Word operand not start at even addressNeed 2 read cycles to read/write the word (8086)Issues two addresses to access the two even-aligned words containing the operand in order to access the operandslower but transparent to programmer
8086 Memory Organization8088always 2 cycles for word operationsAligned or notBecause of 8-bit external data busSingle memory bank is sufficient
8086 Memory MapMemory Map: How memory space is allocatedROM Area: boot, BIOSRAM: OS/User Apps & dataUnusedReserved: for future hardware/software usesDedicated: for specific system interrupt and rest functions, etc.
Segment Registers64K memory segments x 1616-bit offset eachCS, DS, ES, SS
Logical and Physical AddressesPhysical: 20-bitLogical: 16-bit16-byte segment boundariesAddress TranslationE.g., CS:IP
80286First with Protection ModeReview of 286 Protected Mode Next
80286Became available in 1982used in IBM AT computer (1984)16-bit data busclock speed 25% faster than 8088, throughput 5 times greater than 808824-bit address bus (16 MB) (vs. 20-bit/1M 8086)
80286: Real vs. Protected ModesLarger address space: 24-bit address busReal Mode vs. Protected ModeReal Mode:Power on default modeFunction like a 8086: use 20-bit least significant address lines (1M)Software compatible with 28616 new instructions (for Protected Mode management)Faster 286: redesigned processor, plus higher clock rate (6-8MHz)
80286: Real vs. Protected ModesProtected Mode:Multi-program environmentEach program has a predetermined amount of memoryAddressed via segment selector (physical addresses invisible): 16M addressableMultiple programs loaded at once (within their respective segments), protected from read/write by each other
80286: Real vs. Protected ModesProtected Mode:Cannot be switch back to real mode to avoid illegal access by switching back and forth between modesA faster 8086 only?MS-DOS requires that all programs be run in Real Mode
80386 ModelRefine 286 Protect ModeExpand to 32-bit registersNew Virtual 8086 Mode
80386 Review
80386DX (aka. 80386)available in 1985, a major redesign of 86/286Compatibility commitment through 200032-bit data and address buses (4 GB memory)Real Address Mode: 1M visible, 286 real modeProtected Virtual Address Mode:On board MMUSegmented tasks of 1byte to 4G bytesSegment base, limit, attributes defined by a descriptor registerPage swapping: 4K pages, up to 64TB virtual memory spaceWindows, OS/2, Unix/Linux
80386DX (aka. 80386)Virtual 8086 mode (a special Protected mode feature): permitted multiple 8086 virtual machines-multitasking (similar to real mode)Windows (multiple MSDOSs)Clock rate:max. 40MHz, 2 pulses per R/W bus cycleExternal memory cache to avoid waitFast SRAM93% hit rate with 64K cacheCompatible instructions (14 new)
80386SX80386SX: (for transition to 32-bit)16-bit data bus/32-bit register24-bit address bus
80386: Real vs. Protected ModesLarger address space: 32-bit address bus (4G)Real Mode vs. Protected Mode (refined from 286)Real Mode:Power on default modeFunction like a 8086: (1) use only 20-bit least significant address lines (1M) (2) segmented memory retained (64K)Software compatible with 286New Real Mode Features:access to 32-bit register settwo new segments: F, G
80386: Real vs. Protected ModesProtected Mode:new addressing mechanism vs. real modesupports protection levelssegment size: 1 to 4G (not 64K, fixed)segment register: pointer to a descriptor tablenot base address
80386: Real vs. Protected ModesProtected Mode:descriptor table: (8 byte per entry)32-bit base address of segmentsegment sizeaccess rightsmemory address = base address (in table) + offset (in instruction)
80386: Real vs. Protected ModesProtected Mode:Paging mechanism:map 32-bit linear address (base+offset) =>physical address & page frame address(4K page frames in system memory)64TB of virtual memory
80386: Real vs. Protected ModesProtected Mode:Protection mechanism:tasks/data/instructions are assigned a privilege level (PL)tasks running at lower PL cannot access tasks or data segments at a higher PLrunning programs that are protected from the others
80386: Real vs. Protected ModesTwo Ways to Run 8086 Programs:Real ModeVirtual 8086 ModeVirtual 8086 Mode:runs multiple 8086+other 386 (protected mode) programs independentlyeach sees 1 MB (mapped via paging to anywhere in 4GB space)running V8086+ Protected mode simultaneously
80386 Processor Model386
80386 Processor Model: BIU+CPU+MMUBIUcontrol 32-bit address and data buseskeep instruction queue full (16 bytes)Address pipeliningaddress of next memory location is output halfway through current bus cyclemore address decode timeslower memory chip is OKeasier to keep up with faster (2 CLK) bus cycle of 386
80386 Processor Model: BIUdynamic data bus sizingswitch between 16-/32-bit data bus on the flyaccommodate to external 16-bit memory cards or IO devicesadjust bus timing to use only the least significant 16 bits
80386 Processor Model: BIUExternal memory4 memory banks (4x8=32bits)BE0-BE3 for bank selectionaccess byte or word or double wordaligned operands: 1 bus cyclemis-aligned (not %4): 2 bus cycles
80386 Processor Model: CPUCPU=IU (instruction) +EU (execution)fetching & execution overlapIU:retrieval instructions from queuedecodestore in decoded queueEU:ALU+registers (32-bit)execute decode instructions
80386 Processor Model: MMUSegmentation unitReal mode: generate the 20-bit physical addressProtected mode: store base/size/rights in descriptor registerscache descriptor tables in RAMfaster operationsPaging Unitdetermines physical addresses associated with active segments (divided into 4K pages)virtual memory support to allow larger programs
80386 Programming ModelGeneral Purpose RegistersData & Addresses GroupsStatus & Control FlagsVM, RF, NT, IOPLSegment Group
80386 Programming ModelSpecial purpose Registers
80386 Programming ModelMemory Managementsegment descriptorskeep base, size, access rights3 types of tables: global (GDT), local (LDT), interrupt (IDT)addressing:index (to a table) + RPLbase + offset (from instruction)PagingTLB
80386 Programming ModelProtection (PL)task: CPLinstruction: RPLdata segment: DPLGatesspecial descriptors that allows access to higher PL tasks from lower PL tasks
80486 Review
80486DX1989: a polished 386, 6 new OS level instructionsvirtually identical to 386 in terms of compatibilityRISC design conceptsfewer clock cycles per operation, a single clock cycle for most frequently used instructionsMax 50MHz5 stage execution pipelinePortions of 5 instructions execute at once
80486DXHighly Integrated:On board 8K memory cacheFPP (equivalent to external 80387 co-processor)Twice as fast as 386 at any given clock rate20Mhz 486 ~= 40Mhz 386
80486SX80486SXNOT a 16-bit version for transition purposeno coprocessorNo internal cacheFor low-end applicationsMax. 33Mhz only
80486DX2/DX4: Overdrive ChipsProcessor speed increased too fastRedesign of microcomputer for compatibility becomes harderSolution: Separating internal speed with external speed, improve performance independently80486DX2/DX4 internal clock twice/three times (NOT four times) the external clock: runs faster internally
80486DX2/DX4: Overdrive ChipsSystem board design is independent of processor upgrade (less expensive components are allowed)Processor operate at maximum speed data rate internallyOnly slow access to external data operates at system board rateInternal cache offset the speed gap486DX2 66: 66 internal, 33 external486DX4 100: 100 internal, 33 external (3x)Overdrive sockets: for upgrading 486dx/sx to 486dx2/dx4 (with overdrive socket pin-outs)
486 Processor Features386 features:Real/Protected ModesMemory ManagementPLsregisters & bus sizesNew features6 OS instructions8K/16K onboard cache (was external before 386)
486 Processor FeaturesA better 3865 stage instruction pipelineIF/ID/EX => PF/D1/D2/EX/WBPF: instructions => Q (2*16-bytes)D1: determine opcodeD2: determine memory address of operandsEX: execute indicated OPWB: update register
486 Processor FeaturesReduced Instruction Cycle Times5 stage instruction pipeline (e.g., Fig. 3.18)instruction cycle times:8086: 4 CLK80386: 2 CLK80486: 1 CLK (close to RISC)about 2X faster than 386
486 Processor Model: 386+FPU+Cache386 units retained: BIU, CPU, MMUnew: FPU (80387) + Cache (8K/16K)FPU:387 onboard0.8 u => #transistors increased (275K => 1+ millions)simplified system board designspeedup FP operations
486 Processor Model: CacheCache (8K/16K (dx4))Function: bridge processor memory bandwidth8088: 4.77MHz80486: 50MHzPentium: 100MHzPentium Pro: 133 MHzMain Memory (DRAM): relatively slowFast Static RAMs (SRAM) as cache
486 Processor Model: CacheOrganization:8K4-way set associative4 direct mapped caches wired in paralleleach block maps to a set of 4 lines unified: data & code in the same cachewrite-through: update cache and memory page on write operations
486 Processor Model: Cachelocality (why caches help?)spatial locality: e.g., array of datatemporal: e.g., loops in codesoperations on hit/miss128-bit cache lines32-bit x N to catch locality (N=4)128-bit = 16-byte
486 Processor Model: CacheMapping:memory => many-to-many => cacheData RAM: save memory dataTag RAM: save memory address information3 methods of mappingfully associative: memory block to any cache linedirect map: memory block to specific linetrashingset associative: memory block to a set of cache lines
486 Processor Model: CacheReplacement policy (LRU)valid bits: all 4 lines in use ?NO => use any unused lineYES => find one to replaceLRU bits: which is least recently used
Pentium Review
Pentium: Superscaler Processoravailable in 199232-bit architectureSuperscaler architectureScaling: scaling down etchable feature size to increase complexity of IC (e.g., DRAM)10 microns/4004 to 0.13 microns (2001)Superscaler: go beyond simply scaling downTwo instruction pipelines: each with own ALU, address generation circuitry, data cache interfaceExecute two different instructions simultaneously
Pentium: Superscaler ProcessorOnboard cacheSeparate 8K data and code caches to avoid access conflictsFPPInstruction pipeline: 8 stageOptimized floating point functions5x-10x FLOPs of 4862x performance of 486 at any clock rate
Pentium: Superscaler ProcessorCompatibility with 386/486:Internal 32-bit registers and address busData bus expanded to 64-bits for higher data transfer rateCompare 8088 to 386sx transition
Pentium: Superscaler Processornon-clone competition from AMD, Cyrixdevelopment of brand identity by Intel
Pentium Pro Review
Pentium Pro: Two Chips in OneBecame available in 1995Superscaler of degree 3Can execute 3 instructions simultaneouslyOptimized for 32-bit operating systems (e.g., Windows NT, OS2/Warp)Two separate silicon die on the same packageProcessor: 0.35 u, 5.5 million transistors256KB(/512K) Level 2 cache included on chip, 15.5 million transistors in smaller area
Pentium Pro: Two Chips in OneOn Board Level 2 cacheSimplifies system board designRequires less spaceGains faster communication with processorInternal (level 1) cache: 8KPentium Pro 133 ~= 2x Pentium 66 ~= 4x 486DX2 66
Pentium Pro:Dynamic ExecutionDynamic execution: reduce idle processor time by predicting instruction behaviorsMultiple Branch Prediction: look as far as 30 instructions ahead to anticipate program branchesData Flow Analysis: looks at upcoming instructions and determine if they are available for processing, depending on other instructions. Determine optimal execution sequences.Speculative Execution: execute instructions in different order as entered. Speculative results are stored until final states can be determined.