8086/8088 Hardware System. Typical Microprocessor Memory System CPU Memory Control Address Data.
8086 8088 Microprocessor
-
Upload
zain-chaudhry -
Category
Documents
-
view
280 -
download
27
description
Transcript of 8086 8088 Microprocessor
06/03/2005 ET4508_p2 (KR) 1
Intel 8086/8088 Microprocessors
Intel 8086 and 8088 Microprocessors are the basis of all IBM-PC compatible computers(8086 introduced in 1978, first IBM-PC released in 1981)
All Intel, AMD and other advanced microprocessors are based on and are compatible with the original 8086/8
At Power Up and Reset time, Pentiums, Athlons etc all look like 8086 processors
06/03/2005 ET4508_p2 (KR) 2
Intel 8086/8088 Microprocessors
Intel 8086 is a 16b microprocessor: 16b data registers, 16b ALU
Width of external data bus: 8086: 16b 8088: 8b
Width of external address bus: 16b+4b=20b Some techniques to optimise the CPU
performance when it’s executing programs Segment: Offset memory model Little-Endian Data Format
06/03/2005 ET4508_p2 (KR) 3
8086/8088 (1)
Original IBM PC used 8088 microprocessor 8088 is similar to the 8086, but it has an
external 8b data bus & only 4B-deep queue For cost reduction reasons
We can consider 8086 and 8088 together PC clones often used 8086 for better
performance 8-bit bus reduces performance, but meant
cheaper computers
06/03/2005 ET4508_p2 (KR) 4
8086/8088 (2)
Remember the Fetch-Decode-Execute cycle?
Fetching from EXTERNAL MEMORY is SLOW The 8086/8 used an instruction queue to
speed up performance While the processor is decoding and
executing an instruction, its bus interface can be reading new instructions, since at that time the bus is not actually in use
06/03/2005 ET4508_p2 (KR) 5
8086/8088 Functional Units
Execution Unit(EU)
Bus InterfaceUnit(BIU)
Fetches Opcodes,Reads Operands,
Writes Data
8086/8088 MPU
06/03/2005 ET4508_p2 (KR) 6
8086/8088 (3)
8086/8088 consists of two internal units The execution unit (EU) - executes the
instructions The bus interface unit (BIU) - fetches
instructions, reads operands and writes results
The 8086 has a 6B prefetch queue The 8088 has a 4B prefetch queue
06/03/2005 ET4508_p2 (KR) 7
8086/8088 Internal Organisation
TemporaryRegisters
ALU
Flags
EUControl
AH AL
BH BL
CH CL
DH DL
SP
BP
DI
BI
CS
DS
SS
ES
IO
InternalCommunications
Registers
SUMMATION
Address Bus 20 bits
Data Bus
BusControl
1 2 3 4
Instruction Queue
8088Bus
EU BIU
BIU Elements Instruction Queue: the next instructions or data can
be fetched from memory while the processor is executing the current instruction
The memory interface is slower than the processor execution time so this speeds up overall performance
Segment Registers: CS, DS, SS and ES are 16b registers Used with the 16b Base registers to generate the 20b
address Allow the 8086/8088 to address 1MB of memory Changed under program control to point to different
segments as a program executes Instruction Pointer (IP) contains the Offset Address
of the next instruction, the distance in bytes from the address given by the current CS register
06/03/2005 ET4508_p2 (KR) 9
8086/8088 20-bit Addresses
16-bit Segnment Base Address 0000
16-bit Offset Address
20-bit Physical Address
CS
IP
06/03/2005 ET4508_p2 (KR) 10
Exercise: 20-bit Addressing
1. CS contains 0A820h,IP contains 0CE24h. What is the resulting physical address?
2. CS contains 0B500h, IP contains 0024h. What is the resulting physical address?
06/03/2005 ET4508_p2 (KR) 11
8086/8 In Circuit (1)
8086/8 microprocessors need support circuits in a microcomputer system
8086/8 multiplex the address and data buses on the same pins
This saves pins but at a price: Demultiplexing logic is needed to build up
separate address and data buses to interface with RAMs and ROMs
06/03/2005 ET4508_p2 (KR) 12
MAXIMUMMODE
MINIMUMMODE
1 40
20 21
8086
GND
AD14
AD13
AD12
AD11
AD10
AD9
AD8
AD7
AD6
AD5
AD4
AD3
AD2
AD1
AD0
NMI
INTR
CLK
GND
Vcc
AD15
A16,S3
A17,S4
A18,S5
A19,S6
/BHE,S7
MN,/MX
/RD
/RQ,/GT0
/LOCK
/S2
/S1
/S0
QS0
QS1
/TEST
READY
RESET
/RQ,/GT1
HOLD
/WR
IO/M
DT/R
/DEN
ALE
/INTA
HLDA
06/03/2005 ET4508_p2 (KR) 13
MAXIMUMMODE
MINIMUMMODE
1 40
20 21
8088
GND
A14
A13
A12
A11
A10
A9
A8
AD7
AD6
AD5
AD4
AD3
AD2
AD1
AD0
NMI
INTR
CLK
GND
Vcc
A15
A16,S3
A17,S4
A18,S5
A19,S6
MN,/MX
/RD
/RQ,/GT0
/LOCK
/S2
/S1
/S0
QS0
QS1
/TEST
READY
RESET
/RQ,/GT1
HOLD
/WR
IO/M
DT/R
/DEN
ALE
/INTA
HLDA
high /SS0
MAXIMUMMODE
MINIMUMMODE
1 40
20 21
8086
GND
AD14
AD13
AD12
AD11
AD10
AD9
AD8
AD7
AD6
AD5
AD4
AD3
AD2
AD1
AD0
NMI
INTR
CLK
GND
Vcc
AD15
A16,S3
A17,S4
A18,S5
A19,S6
/BHE,S7
MN,/MX
/RD
/RQ,/GT0
/LOCK
/S2
/S1
/S0
QS0
QS1
/TEST
READY
RESET
/RQ,/GT1
HOLD
/WR
IO/M
DT/R
/DEN
ALE
/INTA
HLDA
06/03/2005 ET4508_p2 (KR) 14
8086/8 In Circuit (2)
In Maximum Mode the 8086/8 needs at least the following: 8288 Bus Controller, 8284A Clock Generator, 74HC373s and 74HC245s
With the aid of these devices the 8086 begins to look like the ideal microprocessor we looked at earlier
74LS245x2
8284AClock
Generator
RDY
Vcc
8086CPU
CLK
READY
RESET
MN/MX#
S0#S1#S2#
8288Bus
Controller
MRDC#
MWTC#
AMWC#
IORC#
IOWC#
AIOWC#
INTA#
CLK
74LS373x3
ADDR/DATA
LEOE#
ALE
DENDT/R#
BHE#
AD15:AD0
A19:A16
74LS245x2
EN#DIR
D15:D0
A19:A0,BHE#
ADDR/Data
INTR
i8086 Circuit - Maximum Mode
06/03/2005 ET4508_p2 (KR) 16
8086/8 Maximum Mode
In maximum mode, the 8288 uses a set of status signals (S0, S1, S2) to rebuild the normal bus control signals of the microprocessor MRDC#, MWTC#, IORC#, IOWC# etc Equivalent to MEMR# etc
Look at some special signals briefly
06/03/2005 ET4508_p2 (KR) 17
RESET# Signal
The Active low RESET# signal puts the 8086/8 into a defined state
Clears the flags register, segment registers etc.
Sets the effective program address to 0FFFF0h (CS=0F000h, IP=0FFF0h)
8086/8 Programs always start at 0FFFF0H after Reset has been asserted and removed
Continues into latest generation CPUs
06/03/2005 ET4508_p2 (KR) 18
BHE# Signal (8086 Only)
The 8086 processor can address memory a byte at a time
Its data bus is 16b wide It uses the BHE# signal and A0
(sometimes called BLE#) to address bytes using its 16b bus
06/03/2005 ET4508_p2 (KR) 19
Use of BHE#/A0(BLE#)
FFFFF
FFFFD
FFFFB
FFFF9
00005
00003
00001
ODD Addresses (8086)
FFFFE
FFFFC
FFFFA
FFFF8
00004
00002
00000
EVEN Addresses (8086)
A19..A1 A19..A1
D15:D8 D7:D0
FFFFF
FFFFE
FFFFD
FFFFC
00002
00001
00000
Byte-Wide addressing(8088)
BHE# A0/BLE#
06/03/2005 ET4508_p2 (KR) 20
Use of BHE#/BLE#
BHE# A0/BLE# Selection
0 0 Whole word (16-bits)
0 1 High byte to/from odd address
1 0 Low byte to/from even address
1 1 No selection
06/03/2005 ET4508_p2 (KR) 21
ALE and Address/data Bus Multiplexing
8086/8 Multiplexes the Address and Data signals onto the same set of pins
Need off-chip logic to separate the signals
Transparent latches designed just for address demultiplexing
06/03/2005 ET4508_p2 (KR) 22
ALE and 74HC373 Transparent Latch
AddressTime
Clock
Address/DataBus
Data Time
ALE
Output of74HC373
Microcomputer AddressBus
LE
OE#
ALE
Address/Data Bus
System Address BusIn0:In7 Q0:Q7
74HC373 or equivalent
TriState Control signal,OE#, shown connected to
GND for simplicity
06/03/2005 ET4508_p2 (KR) 23
Use of ALE (Address Latch Enable)
ALE is used with an external latch (74HC373) to demultiplex the address and data lines
74HC373 is transparent when its LE input (connected to ALE) is high
When ALE goes low, the ‘373 holds the last data until ALE goes high again
06/03/2005 ET4508_p2 (KR) 24
8288 Bus Controller and Bus Transceivers
8288Bus Controller
DIR
DEN#DT/R#
74HC245
EN#
74HC245
EN#
DIR
DIR
CPU [D15:D8]
CPU [D7:D0]
Buffered [D15:D8]
Buffered [D7:D0] To M
emor
y an
d I/
O
Sys
tem
s
8288 Bus Controller alsogenerates Direction and
Enable signals for Bi-Directional Transeivers
Supports Buffering theSystem Data Bus
06/03/2005 ET4508_p2 (KR) 25
8086 Read CycleT1 T2 T3 T4
Address Status
001 or 101
Address Valid Datafloat float
Valid Address
CLK
/S0, /S1, /S2
A16..A19, /BHE
ALE
AD0..AD15
A0..A19
S3..S6
DT/R
DEN
/MRDC or /IORC
06/03/2005 ET4508_p2 (KR) 26
8086 Write CycleT1 T2 T3 T4
Address Status
010 or 110
Address Valid Data
Valid Address
CLK
/S0, /S1, /S2
A16..A19, /BHE
ALE
AD0..AD15
A0..A19
S3..S6
DT/R
DEN
/MWTC or /IOWC
06/03/2005 ET4508_p2 (KR) 27
8086 Read Cycle (1 Wait State)T1 T2 T3 Tw
Address Status
001 or 101
Address Valid Datafloat float
Valid Address
CLK
/S0, /S1, /S2
A16..A19, /BHE
ALE
AD0..AD15
A0..A19
S3..S6
DT/R
DEN
/MRDC or /IORC
T4
8284 RDY
READY
06/03/2005 ET4508_p2 (KR) 28
8086/8088 Summary
First Generation (introduced June 1978) One of the first 16b processors on the
market 16b internal registers 16/8b external data bus 20b address bus (1MB addressable) Used in 1st generation IBM PCs (1981)
06/03/2005 ET4508_p2 (KR) 29
80186/80188
Evolution of 8086/8088 80186/80188
Increased instruction set On-chip system components (Clock
generator, DMA, Interrupt, Timers…) Unsuccessful in PCs Popular in embedded systems…
06/03/2005 ET4508_p2 (KR) 30
2nd Generation Processor 286
P2 (286) = 2nd Generation Processor Introduced in 1981 CPU behind IBM AT Throughput of original IBM AT (6MHz) was
about 500% of IBM PC (4.77MHz) Level of integration: 134k transistors (vs 29k
in 8086) Still a 16b processor… Available in higher clock frequencies: 25MHz
06/03/2005 ET4508_p2 (KR) 31
2nd Generation Processors 286
Fully backwards compatible to 808680286 runs 8086 software without modification
Improved instruction executionAverage instruction takes 4.5 cycles vs. 12 cycles (8086)
Improved instruction set Real mode and Protected Mode
Multitasking-support. What happens in one area of memory doesn’t affect other programs. Protected mode supported by Windows 3.0.
16MB addressable physical memory On-chip MMU (1GB virtual memory) Non-multiplexed address-bus and data-bus
06/03/2005 ET4508_p2 (KR) 32
Improving Computer Performance
We’ve seen how 16b computer technology based on the 8086 and 80286 processors developed
These computers are not powerful enough for today’s applications
How do you improve the performance of your computer?
Let’s start with the CPU
06/03/2005 ET4508_p2 (KR) 33
CPU Performance (1)
MOST OBVIOUS: Processor Clock Frequency
Increased frequency – increased execution rate
State of the Art: >4GHz (03/2005) Memory and I/O access times can be
performance bottleneck – unless you take some special measures
06/03/2005 ET4508_p2 (KR) 34
CPU Performance (2)
ALU register width A processor is an n-bit processor, where N represents
the precision of the ALU – N can be 4, 8, 16, 32, or 64 The wider the registers – the more processing per
clock Data bus width
The wider the data bus the faster we can transfer data Since the memory and I/O device access times are
finite, the more bits transferred per cycle the better
06/03/2005 ET4508_p2 (KR) 35
CPU Performance (3)
Address bus width Increased address width doesn’t provide a
‘speed’ increase as such CPU can directly address more memory PCs use big programs, which would not fit in a
smaller address space Overcoming small address space takes time
Impacts on overall system performance
06/03/2005 ET4508_p2 (KR) 36
3rd Generation Processor 386
P3 (386) = 3rd Generation Processor Introduced: 10/1985 Full 32b processor
(32b registers. 32b internal and external databus. 32b address bus) 275k transistors. CMOS. 132-pin PGA
package.(Supply current Icc=400mA. Roughly the same as 8086 !)
Clock speeds: 16-33MHz P3 processors were far ahead of their time:
It took 10 years before 32b operating systems became mainstream! First 386 PCs early 1987
(COMPAQ)
06/03/2005 ET4508_p2 (KR) 37
3rd Generation Processor 386
Modes of operation: Real. Protected. Virtual Real.
Protected mode of 386 is fully compatible with 286Protected mode=native mode of operation. Chips are designed for advanced operating systems such as Windows NT
New virtual real modeProcessor can run with hardware memory protection while simulating the 8086’s real-mode operation. Multiple copies of e.g. DOS can run simultaneously, each in a protected area of memory. If a program in one memory area crashes, the rest of the system is protected.
Intel 32-bit Architecture:IA-32
Addressing Unit(AU)
Bus Unit (BU)
Instruction Unit (IU)
Prefetch Queue
Registers
ControlUnit (CU)
ALU
Execution Unit (EU)
Address
Data
The 80386 includes a Bus Interface Unit for reading and providing data and instructions,witha Prefetch Queue, an IU for controlling the EU with its registers, as well as an AU forgenerating memory and I/O addresses
06/03/2005 ET4508_p2 (KR) 39
80386 Features
32b general and offset registers 16B prefetch queue Memory management unit with segmentation unit and
paging unit 32b address and data bus 4GB physical address space 64TB virtual address space i387 numerical coprocessor Implementation of real, protected and virtual 8086 modes
06/03/2005 ET4508_p2 (KR) 40
80386 Operating Modes
Protected Mode for Multitasking support Real Mode (native 8086 mode)
Processor powers up in Real Mode System Management Mode
Power management or system security Processor switches to separate address space, while
saving the entire context of the currently running program or task
80386 Register Set
EIP IP
31 16 15 0
FLAG
31 16 15 E0
EFLAG
AH
31 16 15 0
AL
78
BH BL
EAX
EBX
CH CLECX
DH DLEDX
SIESI
DIEDI
BPEBP
SPESP
Instruction Pointer EFLAG Register
General-Purpose RegistersSegment Registers
15 0
CS
SS
DS
ES
FS
GS
06/03/2005 ET4508_p2 (KR) 42
80386 Prefetch Queue
Execution Unit Bus Interface Unit16-byte deep
Instruction Queue32-bit Data
Bus
Fetching from on-chip Queue is fast
Reading from off-chip Memory is slow
06/03/2005 ET4508_p2 (KR) 43
80386 Prefetch Queue
80386 Prefetch queue is 16B deep1. The instruction fetch can read from the
prefetch queue faster than from memory
2. The prefetcher can do some work while the execution unit is doing other tasks in parallel
06/03/2005 ET4508_p2 (KR) 44
Coprocessor: i387
The hardware implementation of floating point processing in the i387 means floating point operations run at much higher speed.
The i386 can execute all mathematical expressions using software emulation of the i387.
06/03/2005 ET4508_p2 (KR) 45
80386: Classic CISC Processor
CISC = Complex Instruction Set Computer Complex instructions ...but code-size efficient Micro-encoding of the machine instructions Extensive addressing capabilities for
memory operations Few, but very useful CPU registers
06/03/2005 ET4508_p2 (KR) 46
80386 Execution Sequence
Bus
Inte
rface
Pref
etch
Que
ue
Dec
odin
g U
nit
Control Unit
MicrocodeROM
MicrocodeQueue
Exec
utio
n U
nit
Register
Register
Register
Register
ALU
CoprocessorCISC Processor
In a microprogrammed CISC the processor fetches the instructions via the bus interface into aprefetch queue, which transfers them to a decoding unit. The decoding unit breaks the machineinstruction into many elementary micro-instructions and apples them to a microcode queue. Themicro-instructions are transferred from the microcode queue to the control and execution unit whichdrives the ALU and the registers
06/03/2005 ET4508_p2 (KR) 47
80386 Complex Instructions
CISC drawback: Most instructions are so complicated, they have to be broken into a sequence of micro-steps
These steps are called Micro-Code Stored in a ROM in the processor core Micro-code ROM: Access-time and size... They require extra ROM and decode logic
06/03/2005 ET4508_p2 (KR) 48
RISC: Less is More
RISC = Reduced Instruction Set Computer 20/80 Rule: 20% of the instructions
take up 80% of the time Sometimes executing a sequence of
simple instructions runs quicker than a single complex machine instruction that has the same effect
06/03/2005 ET4508_p2 (KR) 49
RISC Ideas (1)
Reduce the instruction set to simplify the decoding Smaller Instruction Set -> Simpler Logic ->
Smaller Logic -> Faster Execution Eliminate microcode – hardwire all
instruction execution Pipeline instruction decoding and
executing – do more operations in parallel
06/03/2005 ET4508_p2 (KR) 50
RISC Ideas (2)
Load/Store Architecture – only the load and store instructions can access memory All other instructions work with the
processor internal registers This is necessary for single-cycle execution
– the execution unit can’t wait for data to be read/written
06/03/2005 ET4508_p2 (KR) 51
RISC Ideas (3)
Increase number of internal register due to Load/Store Architecture
Also registers are more general purpose and less associated with specific functions
Compiler designed along with the RISC processor design. Compiler has to be aware of the processor architecture to produce code that can be executed efficiently
06/03/2005 ET4508_p2 (KR) 52
Instruction Pipelining - Operations Can Be Carried Out in Parallel
Read the instruction from memory or the prefetch queue (instruction fetch phase)
Decode the instruction (decode phase) Where necessary, fetch the operands
(operand fetch phase) Execute the instruction (execute phase) Write back the result (write-back phase)
Pipelined Execution
Inst
ruct
ion
Fetc
h
Dec
ode
Ope
rand
Fet
ch
Exe
cutio
n
Writ
e-ba
ck
Instructionk
Instructionk-1
Instructionk-2
Instructionk-3
Instructionk-4
Instructionk+1
Instructionk
Instructionk-1
Instructionk-2
Instructionk-3
Instructionk+2
Instructionk+1
Instructionk
Instructionk-1
Instructionk-2
Instructionk+4
Instructionk+3
Instructionk+2
Instructionk+1
Instructionk
Instructionk+3
Instructionk+2
Instructionk+1
Instructionk
Instructionk-1
Cycle n
Cycle n+1
Cycle n+2
Cycle n+3
Cycle n+4
Result k-4
Result k-3
Result k-2
Result k-1
Result k
06/03/2005 ET4508_p2 (KR) 54
Superscalar Architecture
The processor may have more than one pipeline (Pentium…)
Where possible each pipeline works independently Not always possible
May achieve average completed execution of more more than one instruction per clock cycle
06/03/2005 ET4508_p2 (KR) 55
Pipeline Challenges
More logic per pipeline stage – same resource can’t be used twice E.g. can’t re-use ALU for computing
implied addresses Synchronisation Problems
Delayed Jump/Branch Data and Register dependency, e.g.ADD reg1, reg2, reg7AND reg6, reg1, reg3
06/03/2005 ET4508_p2 (KR) 56
Getting the Benefits of Pipelining
Simplified Instruction decoding Simpler, faster logic
On-chip cache memories Local memory on-chip to avoid memory
access bottlenecks Floating Point pipeline for FP coprocessor Speculative Execution to get around
pipeline flushes
06/03/2005 ET4508_p2 (KR) 57
Software Implications of RISCs
Optimising Compiler must know how pipeline works(Compiler must be aware of pipeline delays, and insert NOPs if need be)
Lower code density in RISC because instructions are less efficient PowerPC code takes up to 30% more
code to do the same tasks as an x86 CPU
more memory accesses, potential performance impact...
06/03/2005 ET4508_p2 (KR) 58
80486: IA-32 with RISC elements
Introduced 04/91 Greatly improved 80386 CPU Hard-wired implementation of frequently used instructions
(as in RISCs). On average 2 clock cycles/instruction. 5 stage instruction pipeline Internal L1 Cache Memory (8kB) + cache controller On-chip Floating Point coprocessor (FPU) Longer Prefetch Queue (32-bytes as opposed to 16 on the
80386) Higher frequency operation: up to 120MHz >1.2M transistors, 0.8m CMOS. 168-pin PGA.
06/03/2005 ET4508_p2 (KR) 59
80486 Block Diagram
Bus
Int
erfa
ce
Cache(8K
bytes)P
refe
tche
r
(32-
byte
queu
e)
PagingUnit
Dec
odin
g
Uni
t
Se
gm
en
tatio
nU
nit
ControlUnit
Registerand ALU
FloatingPoint Unit
A31-A0
D31-D0
Control andStatus Signals
i486 CPU
06/03/2005 ET4508_p2 (KR) 60
80486 Pipeline
Inst
ruct
ion F
etc
h
Deco
de 1
(mem
ory
acc
ess
)
Deco
de 2
Exe
cutio
n
Write
-back
ADD eax,mem32
Decode ADD,fetch mem32
Decode ADD(continued)
Write resultinto eax
Add eax andmem32
Cycle n
Cycle n+1
Cycle n+2
Cycle n+3
Cycle n+4