Microprocessors Spanning the history to explain the technology.

Post on 24-Jan-2016

216 views 0 download

Transcript of Microprocessors Spanning the history to explain the technology.

Microprocessors

Spanning the history to explain the technology

Microprocessors

• While we call them the “brain” of the computer, they don’t compare to our brain.

• We can “wing it” or “make it up as we go” – a processor just stops when confused.

• They are really just a very fast calculator, capable of doing the same thing over and over again (code).

The Man in the Box

• We can (often) think of the processor as a “Black Box.” We need to know what he does, but not how he does it.

• Again, the Man in the Box is very good at arithmetic and at following instructions.

External Data Bus

• First thing to set up is the External Data Bus.

• The 8088 had eight wires for the EDB, hence it was considered an 8-bit processor.

• Either we (through code) or the processor (results output) “turn on” wire(s) [voltage] so we can communicate.

EDB and Binary

Inside the CPU

• We have registers which are temporary holding places (scratch paper, if you will) for information.

• In the 8088, these registers were 16 bits “long” and there were four of them: AX, BX, CX and DX.

Code Book

The Clock Wire

• It “ticks”, or has voltage, to time what goes on with the CPU.

• Much like a metronome on top of a piano.

Clock Speed

• The rated speed of the CPU, measured in Hertz – cycles (ticks) per second.

1 Hz = One cycle per second1 KHz = One thousand cycles per second1 MHz = One million cycles per second1 GHz = One billion cycles per second

(also 1000 MHz)

Clock Speed

• The 8088 had a clock speed of 4.77 MHz, or 4.77 million cycles per second.

• It was controlled by a crystal oscillator on the motherboard.

• Today, the crystal is built into the processor.

Example Clock Crystal

Clock Speed

• No longer is it the maximum speed the processor can run.

• Established by testing the CPU.

• CPUs also have a lower limit – they just won’t start at too low a speed.

• The System Crystal, or System Clock set the pace for the whole motherboard

Machine Language

Timing

8088 12 cycles per instruction

286 and 386 4.5 cycles per instruction

486 2 cycles

Pentium 1 cycle (tick)

Pentium II, III, 4 Up to 3 instructions per cycle

How we get code

• Programs are stored on hard disk drive which is too slow to send directly to CPU.

• We need a method to store, temporarily, code and get it to CPU as fast as possible.

• We will call this method: memory

Options

• We could use punched cards.

• We could use magnetic tape

• We do use Random Access Memory (RAM) which gets around the limitations of the other options.

Northbridge

• Works with the CPU to place RAM contents on the External Data Bus

• Also called the Memory Controller Chip

• Sits between the processor and memory

• Uses the Address Bus to communicate with CPU and Memory

Address Bus

Address Bus

• If we had three wires on our Address Bus, we could come up with eight addresses

• The 8088 had twenty wires for 1 million addresses, or 1 MB (1,048,576B)

0 0 0 0

0 0 1 1

0 1 0 2

0 1 1 3

1 0 0 4

1 0 1 5

1 1 0 6

1 1 1 7

Front Side Bus

As we leave the 8088

• Every modern computer starts up as an 8088

• Almost all of the activity in RAM is in the first megabyte

• The rest of memory is for data storage

Who makes CPUs?

• Intel and Advanced Micro Devices (AMD) for the PC

• Motorola did for the Apple Mac• For years, Intel and AMD had cross-

license agreements – 8088, 286, 386 and 486.

• Starting with the Pentium II, Intel and AMD made different chips – requiring different motherboards.

What does a CPU look like?

• Pin Grid Array (PGA)– Most popular “package” for CPUs– Lots (hundreds) of pins– Usually square in shape– Fragile – pins can bend or break

• Land Grid Array (LGA)• Pins are on the motherboard• Processor has “nubs”

LGA 775 (Intel)

The Socket

• Zero Insertion Force (ZIF) socket

• Lever up, CPU drops in

• Lever down, CPU held in place and good contact on pins

SEC

• For the Pentium II, Intel tried out a Single Edge Cartridge (SEC) connection method

• Intel’s were Slot 1; AMD used Slot A

• Early Pentium IIIs also used SEC

Using Memory

• We need a way to store code and data for the processor

• It takes a while to get code/data from RAM

• If we get extra, with only a small increase in time, our processor can run longer (hopefully)

DRAM and Refresh

• Dynamic Random Access Memory

• Capacitor and transistor

• For a brief time, capacitor acts like a battery – it holds a charge – for about 16 milliseconds

• Then, it must be refreshed (recharged)

• That’s why it is called Dynamic

• DRAM is cheap

Cache

• It’s cash, not cash-ae

• Small amount of really fast memory

• Enables us to store code (and data) for the CPU

• Disk cache between RAM and hard disk

• RAM cache between RAM and CPU

• RAM cache allows us to work during refresh cycles

Hit and Miss

• If we find what we are looking for in cache, that’s a cache hit

• If we don’t find what we are looking for in cache, that’s a cache miss and we have to go to RAM or worse, to the hard disk drive.

SRAM

• Static Random Access Memory

• Uses five transistors in FLIP-FLOP circuit, like a light switch

• Major expensive, not a lot produced

• No refresh required

• Almost as fast as CPU

L1 and L2 Cache

• L1 is small (16-32 KB) and runs almost as fast as CPU. Now is a part of CPU.

• L2 cache is larger (64KB to 2MB) and runs slower than CPU. Was external to CPU and now a part of CPU also.

• For our next instruction, we go from CPU to L1 cache to L2 cache to RAM

Pipelines

• More speed – we want to get more done in same amount of time.

• Think in terms of laundry, one basket at a time.

Sort Wash Dry Fold

• If we use multiple baskets of laundry, and “fill” the pipeline, we can get much more done, more quickly

Sort Wash Dry Fold

This is a four stage pipeline. We could add a secondPipeline that included a pre-soak stage for really dirty Laundry.

Threads

• In our 2+3 example, we had a single thread, or pathway, through the program.

• With careful coding and great attention to details, we can write programs that use multiple threads.

• Let’s make a pot of coffee as an example:

Put filter in Put coffee in

Single thread coffee:

Put water in Brew

Now, let’s get some help and double-thread the process:

Put filter in Put coffee in

Put water in

Brew

Clock Speed and Multipliers

• Able to build processors that run faster than motherboard

• So, we multiply the System Clock to get CPU internal speed

• Remember: System Clock is FSB

• Example: 100 MHz system times 4.0 multiplier gives 400 MHz internal speed

• P4 with 133 FSB and x23 = 3.06 GHz

CPU Voltage

• Faster processors = more heat

• If we reduce CPU voltage, heat output goes down

• Use voltage regulators

Branch Prediction

IF <some condition>, THEN<code for “TRUE” result>

ELSE<code for “FALSE” result>

END IF

If we can see the IF, we can get code from both “sides”

Out of order processing and Speculative Execution is running the branch before the test completes.

20

Pause here

Pentium II

• SEC form factor – Slot 1

• End of the cross-licensing with AMD

• AMD used Slot A – same shape but different signals on pins – can you say “crispy CPU?”

• Faster Pentium Pro with MMX: 233-450 MHz

Celeron

• Inexpensive Pentium II, no L2 cache at first.

• Real poor performance, so introduced 300A with L2 cache.

• Could overclock to almost 800 MHz.

• Designed to compete against AMD processors.

AMD Athlon

• Competitor to Pentium II and III

• Nine stage pipeline

• Advanced dynamic branch prediction

• Faster bus ….dual pumping the bus (two data transfers per clock cycle)

• 462-pin Socket A

55

AMD Duron

• AMD’s answer to Intel’s Celeron – but AMD was already making less expensive CPUs!

• Has now been replaced with the Sempron line

Pentium 4

• 478-pin package

• 20-stage pipeline to allow higher processor speeds

• Quad-pumped FSB – 400 and 533 MHz on 100 and 133 MHz bus – up to 4 data transfers per clock cycle

Athlon XP

• Competed with Pentium 4’s

• Socket A (462 pin)

• Introduced the “+” to confuse lots of people ….

The Plus

“Rated” speed Actual Clock Speed

1500+ 1333

1800+ 1533

2000+ 1666

2700+ 2167

P4

• Socket 478

• LGA 775 (Socket T)

• Core Duo (775 socket)

• 64-bit (4500 series)

• Increasing the number of instructions per clock cycle, rather than clock speed

• One L2 cache for both cores

60

AMD

• Sempron 754-pin, 32- and 64-bit processor

• Athlon64 in 939-pin socket

• Athlon64 in 940, AM2 socket

• Northbridge now a part of CPU

• Nine-step pipeline

• Two L2 caches – one for each core

Early 64-bit

• Intel built the Itanium strictly to run 64-bit code.

• AMD built 64 that would also run 32-bit code (Opteron)

• Both are for the server market, not desktop

Wrong!

Process Size

• Basically, it is how wide the traces (circuits) are on the silicon. This quickly translates into how big a “device” (usually a transistor) is going to be.

• We have gone from a 3 micrometer (or 3000 nanometer) to 65 nanometers (billionth of a meter) and now 22 nanometers!

Hyperthreading

• Swallow this: Hyperthreading is running two threads in the same pipeline at the same time…. No, I don’t get it either.

Bit Buzz

• Simply put, the labels "16-bit," "32-bit" or "64-bit," when applied to a microprocessor, characterize the processor's data stream.

• In more specific terms, the labels "64-bit," 32-bit," etc. designate the number of bits that each of the processor's general-purpose registers (GPRs) can hold. So when someone uses the term "64-bit processor," what they mean is "a processor with GPRs that store 64-bit numbers."

64

The take-home point here is that only applications that require and use 64-bit integers will see a performance increase on 64-bit hardware that is due solely to a 64-bit processor's wider registers and increased dynamic range. So there's no magical performance boost inherent in the move from 32 bits to 64 bits, as people are often led to think by journalists who write things like, "64-bit computers can processes twice as much data per clock cycle as their 32-bit counterparts." Technically, this is true in a very restricted sense, but it would be better to say the following: "64-bit computers can process numbers that are 4.3 billion times as large as those processed by their 32-bit counterparts." It sounds a lot less sexy because it is, but at least no one is misled into thinking that 64-bitness makes a computer somehow twice as fast. 

Athlon64

• AM2 (940-pin) package

• Moves the MCC into the processor – direct link to RAM

• “Little brother” is the Sempron

Wider not Faster

• Dual cores

• Intel Pentium D (now Intel Core x)

• Intel licensed AMD’s approach to 64/32 bit processing calling it EM64T

• AMD X2

72

Intel Core

• Drops back to Pentium Pro for 12-stage pipeline

• Single-core (Solo) and dual-core (Duo) versions

• Use model numbers – have to dig for speed specification

Process Numbers

• Athlon CPU at 180 nm (nanometers)

• Early P4’s also at 180 nm

• Later P4’s at 130-, 90- and 65 nm; 22 now

• Width of traces (wires) which affects overall processor size

For the moment

• We seem to have stalled at processor speeds around 3-3.5 GHz

• My guess is that we have physics problems:– Heat/power– Speed of electricity– Frequency cross-talk

3-D Transistor or Tri-gate

Ivy Bridge chips

Cooling the CPU

• “Boxed” CPUs come with heat sink and fan – with logo on the fan

• OEM CPUs do not come with heat sink and fan – have to buy separately

• Intel and AMD will only honor the 3-year warranty if you send in the logo’d heat sink with the processor

Liquid Cooling example

(Intel’s) Kentsfield, the code name for the quad-core chip, is literally two dies built into a single multi-chip module:

Each die offers 4MB of shared L2 cache, but each cache is dedicated to the two cores on that die. If data needs to be passed back and forth between the two cores, it must be done over the 1066MHz (effective) shared front-side bus (FSB.)

Mobile Processors

Replacing CPUs

• Only if it dies• There have been windows of opportunity:

Replace a 200 MHz Pentium with a 400-500 MHz AMD K6 processor

• At and above the P II, can not swap brands on the same motherboard

• Narrow speed band to work with – you are probably better off by overclocking

• Read the motherboard book

Installing CPUs

• Be very careful – static electricity can fry the CPU without any sight or sound!

• Triple-check for correct orientation into socket – if it does not drop in, it’s wrong.

• Make sure you install heat sink and fan – a few seconds can cook a CPU. And use “thermal grease” if it is not already there.

Upgrading

• Likely, your new CPU takes a different socket. That means a new motherboard.

• A new motherboard can also mean new RAM (next week’s discussion) and a new power supply.

• So, how far from a new computer are you?

Installing a new CPU

• First, the old CPU has to come out– That can require a lot of effort as the fan is clamped

VERY tightly to the motherboard– Watch closely for ESD – fussing to remove fan can

generate a lot of ESD

• New CPU should drop into place if aligned correctly

• Make sure there is some thermal compound between processor and heat sink (more is NOT better!)

Overclocking

• Michael makes it out to be one of the seven deadly sins …it does void warranty

• Yes, you can cook your CPU, in theory

• Be careful when buying a CPU – make sure it is what the box says.

• Really good deals are often really BAD deals

Overclocking

• The formula is:

FSB speed times multiplier = clock rate

100 X 23 = 2300 MHz

The multiplier is locked, so what you get to change is the FSB speed.

110 X 23 = 2530

Overclocking

• Generally, you can go as much as 10% higher without any other changes.

• Will affect RAM and memory transfers

• Go too fast, and system will not boot.

• Reset CMOS (next week) and start over.

Pipeline(s)

Fetch

Decode

Execute

Write back

Pipeline(s)

Fetch

Decode

Execute

Write back

Minium Cycle Time

UnusedProcessorcapability

Pipeline(s)

Fetch

Decode 1

Execute

Write back

Minium Cycle Time

UnusedProcessorcapability

Decode 2

New, fasterCycle time with

Less waste

Pipeline Stalls

• Original P4 (Willamette) used 20-stage pipeline

• AMD’s Athlon XP used 9-stage pipeline and “went faster” as a result of more efficient pipeline

• Northwood P4 reduced pipeline back to 12 stages, increased L2 cache and “caught up” to Athlon XP

CPUL1Cache

L2 Cache

MainMemory

90% hit rate90% hit rate CPU goes

here 10% of time

1% of requests go to Main Memory

CISC Chips

• Complex Instruction Set Computer

• Robot and light bulb:– Pick up the bulb– Insert it into the socket– Rotate clockwise until tight

RISC chips

• Reduced Instruction Set Computer• Robot and light bulb:

– Lower hand– Grasp bulb– Raise hand– Insert bulb into socket– Rotate bulb clockwise one quarter turn– Is bulb tight? If not, repeat above– End

Intel

• Intel has introduced three “models” of processors:– I3 – for low to medium desktop use– I5 – for high desktop use– I7 – for the big bucks group

• Intel borrowed BMW’s marketing book for these; similar to 300, 500 and 700 series cars

CoresSingle Core

100

$ 1x

Dual Core

120

$1.2x

Quad Core

125

$2x+

Overheating CPUs

• Poor “connection” to heat sink can cause CPU to overheat and shut itself down

• Too much, or too little thermal paste

• Environment too warm

• Try adding case fan(s)