1 Tema III – Microcontrollers and Microprocessors

111
Tema III – Microcontrollers and Microprocessors 1 Roberto Gutiérrez Mazón

Transcript of 1 Tema III – Microcontrollers and Microprocessors

Page 1: 1 Tema III – Microcontrollers and Microprocessors

Tema III – Microcontrollers and Microprocessors

1

Roberto Gutiérrez Mazón

Page 2: 1 Tema III – Microcontrollers and Microprocessors

2

¨  Introduction ¨  Processor Architectural Features. Datapath &

pipeline. ¨  Data Representation: Fixed-point vs Floating-point ¨  Interrupts, Exceptions, Watch-Dog, … ¨  32-bit microcontroller. ARM Cortex-M3

¤  ARM Cortex-M3 Architecture. Programmers Model.

¨  32/64bit microprocessor. ¤  Intel x86, UltraSparc Architecture. Programmers Model

Page 3: 1 Tema III – Microcontrollers and Microprocessors

3

What is “Computer Architecture”??

Processor Architectural Features

Instruction Set Architecture

Applications

Compiler

Operating System

Firmware

I/O system Instr. Set Proc.

Digital Design Circuit Design

Datapath & Control

Layout & fab Semiconductor Materials

Page 4: 1 Tema III – Microcontrollers and Microprocessors

4

Introduction

¨  Moore`s Law ¨  “Cramming More Components onto

Integrated Circuits” ¤  Gordon Moore, Electronics, 1965

¨  Nº on transistors on cost-effective integrated circuit double every 18 months

Page 5: 1 Tema III – Microcontrollers and Microprocessors

5

Introduction

¨  Prehistoric Computer Architecture: ¤  The Z1 was the first mechanical freely

programmable computer in the world which used Boolean logic and binary floating point numbers

¤  Memory: 64 words of 22bits.

¤  Clock Frequency: 1Hz

¤  Registers: two 22bits floating-point registers. ¤  ALU: add (5 seg), sub, mult. (16 seg) ,div

(18seg).

¤  Weight: 1000 kg

Page 6: 1 Tema III – Microcontrollers and Microprocessors

6

Introduction

¨  The zEC12 Zseries IBM Microprocessor: ¤  5.5 GHz in IBM 32nm PD-SOI CMOS

technology ¤  2.75 billion transistors in 597 mm2 ¤  64-bit virtual addressing

n  original S/360 was 24-bit, and S/370 was 31-bit extension

¤  Six-core design ¤  Three-issue out-of-order superscalar

pipeline ¤  Out-of-order memory accesses ¤  Redundant datapaths

n  every instruction performed in two parallel datapaths and results compared

¤  64KB L1 I-cache, 128KB L1 D-cache on-chip

¤  1MB private L2 unified instruction and data cache per core, on-chip

¤  On-Chip 48MB eDRAM L3 cache ¤  Scales to 120-core multiprocessor with

384MB of shared L4 eDRAM

Page 7: 1 Tema III – Microcontrollers and Microprocessors

Maquina Diferencias Baggage (1832)

1er Transistor (Shokley, Bardeen,Brattain) (1947)

IC 4004 Intel (1971) IC 486DX2 Intel (1989)

Intel Quad (2007)

Cell (2005)

Nanotecnología (¿?)

MEMS(2000) Procesadores opticos (¿?)

7

Introduction

ENIAC(1946)

Page 8: 1 Tema III – Microcontrollers and Microprocessors

8

Introduction

C B

Wirelessly networked into large scale sensor arrays

Battery Solar Cells

Processor, SRAM and PMU

Sensors, timers

Cortex-M0 +16KB RAM 65nm UWB Radio antenna

10 kB Storage memory ~3fW/bit

12µAh Li-ion Battery

Wireless Sensor Network

A

University of Michigan

Cortex-M0; 65¢

Page 9: 1 Tema III – Microcontrollers and Microprocessors

9

Introduction

4200 ARM powered Neutrino Detectors

Work supported by the National Science Foundation and University of Wisconsin-Madison

2.5km 70 bore holes 2.5km deep 60 detectors per string starting 1.5km down 1km3 of active telescope

1km

Page 10: 1 Tema III – Microcontrollers and Microprocessors

10

Introduction

Page 11: 1 Tema III – Microcontrollers and Microprocessors

11

¨  Introduction ¨  Processor Architectural Features. Datapath &

pipeline. ¨  Data Representation: Fixed-point vs Floating-point ¨  Interrupts, Exceptions, Watch-Dog, … ¨  32-bit microcontroller. ARM Cortex-M3

¤  ARM Cortex-M3 Architecture. Programmers Model.

¨  32/64bit microprocessor. ¤  Intel x86, UltraSparc Architecture. Programmers Model

Page 12: 1 Tema III – Microcontrollers and Microprocessors

12

Programming Model ¨  M i c r o p r o c e s s o r s c a n b e

programmed directly using an assembly language.

¨  Differences with high-level languages: ¤  Use commands to execute data

movements, arithmetic, logic and program control operations.

¤  Use registers to hold data for operation.

¨  Programmers need to know not only the assembly language for the microprocessor, but also the internal configuration of the microprocessor.

Processor Architectural Features

High-Level Language

Assembly Language

Operating System

Instruction SetArchitecture

Microarchitecture

Digital Logic Level 0

Level 1

Level 2

Level 3

Level 4

Level 5

Page 13: 1 Tema III – Microcontrollers and Microprocessors

13

A Basic Processor ¨  The basic components:

¤  Processor with its associate temporary memory (registers and cache if available) for code execution

¤  Main memory and secondary memory where code and data are temporary and permanently stored

¤  Input and output modules that provide interface between the processor and the user

¨  Connected through an interface bus consists of

¤  Address, Data, and Control signals. e.g. AMBA bus for the ARM-based processor

Processor Architectural Features

Reg

iste

rs

Processor core

Cache/SRAM memory

Main memory

Storage memory

I/O Interface

Address bus, data bus, and bus control signals

Page 14: 1 Tema III – Microcontrollers and Microprocessors

14

Processor Architectural Features

The gap widens between DRAM, disk, and CPU speeds.

110

1001,000

10,000100,000

1,000,00010,000,000

100,000,000

1980 1985 1990 1995 2000

year

ns

Disk seek timeDRAM access timeSRAM access timeCPU cycle time

register cache memory disk Access time

(cycles) 1 1-10 50-100 20,000,000

Page 15: 1 Tema III – Microcontrollers and Microprocessors

15

Memory Hierarchy ¨  A typical processor is supported by:

¤  on-board main memory (e.g. SDRAM up to GB)

¤  on-chip or on-die cache memory (e.g. SRAM KB to MB)

¤  on-die registers

¨  Some processors also provide general purpose on-chip ¤  SRAM (e.g. embedded processor) which may be

configured as SRAM/Cache combination (e.g. TI’s DSP)

¨  Typically, a processor also utilizes secondary non-volatile memory ¤  For permanent code and data storage like Flash-

based memory and hard disk

Processor Architectural Features

Larger, slower, and cheaper (per byte) storage devices

registers

on-chip L1 cache (SRAM)

main memory (DRAM)

local secondary storage (virtual memory) (local disks)

remote secondary storage (tapes, distributed file systems, Web servers)

off-chip L2 cache (SRAM)

L0:

L1:

L2:

L3:

L4:

L5:

Smaller, faster, and more expensive (per byte) storage devices

Page 16: 1 Tema III – Microcontrollers and Microprocessors

16

Processor Architectural Features

¨  Multiple machine cycles are required when reading from memory, because it responds much more slowly than the CPU (e.g.33 MHz). The wasted clock cycles are called wait states.

Processor Chip

L1 Data 1 cycle latency

16 KB 4-way assoc

Write-through 32B lines

L1 Instruction 16 KB, 4-way

32B lines

Regs. L2 Unified 128KB--2 MB 4-way assoc Write-back

Write allocate 32B lines

Main Memory

Up to 4GB

Pentium III cache hierarchy

Page 17: 1 Tema III – Microcontrollers and Microprocessors

17

Address Space ¨  Address space of a processor depends on its address

decoding mechanism. ¤  Size will depend on the number of address bit used.

¨  Depending on the processor design, there may be two types of address space: ¤  One is used by normal memory access. ¤  Another one is reserved for I/O peripheral registers (control,

status, and data). ¤  Need extra control signal or special means of accessing the

alternate address space.

Processor Architectural Features

Page 18: 1 Tema III – Microcontrollers and Microprocessors

18

Address Space ¨  Refer to the range of address that can be accessed by the processor determined

by the number of address bit utilized in the processor architecture. ¨  Some processor families (e.g. ARM) utilize only one address space for both

memory and I/O devices

¤  i.e. everything is mapped in the same address space

I/O Reg

I/O Reg

Memory

Processor

0x00000000

0xFFFFFFFF I/O

Data

Code

Processor Architectural Features

Page 19: 1 Tema III – Microcontrollers and Microprocessors

0x00000000

0xFFFFFFFF

0x0000

0xFFFF

I/O Address Space

19

Memory mapped vs I/O mapped ¨  Some processor families have two address spaces. ¨  E.g., for the x86 processor, memory and I/O devices can be mapped in

two different address spaces: ¤  Memory address space and I/O address space

Memory Address Space

Processor

I/O Reg

I/O Reg Data

Code

Data

Code

Processor Architectural Features

Page 20: 1 Tema III – Microcontrollers and Microprocessors

20

Memory system Architectures ¨  Two types of information are found in a typical program code:

¤  Instruction codes for execution ¤  Data that is used by the instruction codes

¨  Two classes of memory system design to store these information: ¤  Von Neumann architecture ¤  Harvard architecture

0000h

FFFFh

Code

Data

Code

Data Table

Data

Processor

Single path (bus) for both Code & Data

0000h

FFFFh

Code

Code

Data

Data

Processor

Separate bus for Code & Data

Data

Code 7FFFh

8000h

Von Neumann

Harvard

Processor Architectural Features

Page 21: 1 Tema III – Microcontrollers and Microprocessors

21

Processor Size ¨  The processor size is described in terms of ‘bits’ (e.g. an 8 bit, 32-bit

processor).

¤  Corresponds to the data size that can be manipulated at a time by the processor.

¤  Typically reflected in the size of the processor (internal) data path and register bank.

¨  Hence an 8-bit processor can only manipulate byte size data at a time, while a 32-bit processor can handle 32-bit double word size data at a time.

•  Even though the data content may only be of single byte size.

Processor Architectural Features

Page 22: 1 Tema III – Microcontrollers and Microprocessors

22

Registers ¨  The most fundamental storage area

in the processor is closely located to the processor provides very fast access, operating at the processor clock but is of limited amount (less than 100 typical)

¨  Most are of the general purpose type and can store any type of information: ¤  data – e.g. timer value, constants ¤  address – e.g. ASCII table, stack

¨  Some are reserved for specific purpose ¤  program counter (IP). ¤  program status register (SR).

Processor Architectural Features

I-1 I-2 I-3 I-4

PC program

I-1instructionregister

op1op2

memory fetch

ALU

registers

writ

e

decode

execute

read

writ

e(output)

registers

flags

program counter instruction queue

Page 23: 1 Tema III – Microcontrollers and Microprocessors

23

Data Organization in Memory ¨  A typical memory contains a storage location that can store data of a certain

fixed size (most commonly of the 8-bit (byte) size). Each location is provided with a unique address.

¨  Depending on the data path size of the processor. The memory content is accessible in the size of an 8-bit byte, a 16-bit half word, a 32-bit word, and even a 64-bit double word.

¨  A 32-bit data consists of four bytes of data, and are stored in four successive memory locations. Data and code must be aligned to the respective address size boundary. ¤  E.g. for a 32-bit system, align to the word boundary, with the lowest two address bits equal

to zero

¨  But what is the order of the four bytes of data?. Depends on the Endianness adopted

Processor Architectural Features

Page 24: 1 Tema III – Microcontrollers and Microprocessors

24

Data Endianness ¨  In the Little Endian format, the least significant byte (LSB) is stored in the lowest

address of the memory, with the most significant byte (MSB) stored in the highest address location of the memory.

¨  In the Big Endian format, the least significant byte (LSB) is stored in the highest address of the memory, with the most significant byte (MSB) stored in the lowest address location of the memory.

Memory Address Space

0x000000

Memory Address Space

0x000000

MSB LSB

Big Endian Little Endian

Processor Architectural Features

Page 25: 1 Tema III – Microcontrollers and Microprocessors

25

Top Boot and Botton Boot ¨  Different processor family uses different location for its reset vector boot-up purpose. ¨  Examples:

¤  x86 boot up from the top of the memory space ¤  ARM boot up from the bottom of its memory space

00..00h

FF..FFh

Reset vector

Data

Program

Data

Processor

00..00h

FF..FFh

Program

Data

Data Processor

Reset vector

Processor Architectural Features

x86 ARM

Page 26: 1 Tema III – Microcontrollers and Microprocessors

26

CISC – Complex Instruction Set Computer. Philosophy: Hardware is always faster than the software. Objective: Instruction set should be as powerful as possible

With a power instruction set, fewer instructions needed to complete (and less memory) the same task as RISC. CISC was developed at a time (early 60’s), when memory technology was not so advanced. Memory was small (in terms of kilobytes) and expensive. But for embedded systems, especially Internet Appliances, memory efficiency comes into play again, especially in chip area and power. ¨  Many instructions ¨  Complex instructions

¤  Each instruction can execute several low level operations

¨  Complex addressing modes ¤  Smaller number of registers needed

¨  A semantically rich instruction set is accommodated by allowing instructions that can be of variable lengths

Processor Architectural Features

Page 27: 1 Tema III – Microcontrollers and Microprocessors

27

RISC – Reduce Instruction Set Computer. By reducing the number of instructions that a processor supports and thereby reducing the complexity of the chip, it is possible to make individual instructions execute faster and achieve a net gain in performance even though more instructions might be required to accomplish a task. RISC trades-off instruction set complexity for instruction execution timing. Large register set: having more registers allows memory access to be minimized. Load/Store architecture: operating data in memory directly is one of the most expensive in terms of clock cycle. Fixed length instruction encoding: This simplifies instruction fetching and decoding logic and allows easy implementation of pipelining.

¤  All instructions are register-to-register format except Load/Store which access memory ¤  All instructions execute in a single cycle save branch instructions which require two. ¤  Almost all single instruction size & same format.

Processor Architectural Features

Page 28: 1 Tema III – Microcontrollers and Microprocessors

28

Limitations of CISC ¨  A highly encoded instruction set

n e e d s t o b e d e c o d e d b y hardwired microcode electronic circuitry. ¤  More complex hardware design ¤  Slower instruction decoding/

execution ¨  Variable length instructions

different execution time among instructions affect pipelined operations.

Advantages of CISC ¨  As each instruction can execute several

low level operations, the code size is r e d u c e d t o s a v e o n m e m o r y requirement. less main memory access is required and hence faster.

¨  Backward code compatibility is maintained. ¤  Can add new (and more powerful)

instructions while retaining the ‘old’ instruction set for code compatibility (i.e. the legacy program can still run)

¨  Easy to program. ¤  direct support of high-level language

constructs. ¤  complex instructions that fit well with high-

level language expression.

Processor Architectural Features

Page 29: 1 Tema III – Microcontrollers and Microprocessors

29

Limitations of RISC ¨  Fewer instructions than CISC:

¤  Compared to CISC, RISC needs more instructions to execute one task.

¤  code density is less. ¤  need more memory.

¨  No complex instruction: ¤  No hardware support for division,

floating-point arithmetic operation. ¤  Need a more complex compiler and a

longer compiling time

But ARM also adds DSP-l ike instructions to support commonly used signal processing function.

Advantages of RISC ¨  Simpler instructions:

¤  One clock per instruction gives faster execution than on a CISC processor with the same clock speed

¨  Simpler addressing mode: ¤  Faster decoding

¨  Fixed length instructions: ¤  Faster decoding and better pipeline

performance

¨  Simpler hardware: ¤  Less silicon area ¤  Less power consumption

Processor Architectural Features

Page 30: 1 Tema III – Microcontrollers and Microprocessors

30

CISC RISC

Any instruction may reference memory Only load/store references memory

Many instructions & addressing modes Few instructions & addressing modes

Variable instruction formats Fixed instruction formats

Single register set Multiple register sets

Multi-clock cycle instructions Single-clock cycle instructions

Micro-program interprets instructions Hardware (FSM) executes instructions

Complexity is in the micro-program Complexity is in the compiler

Less to no pipelining Highly pipelined

Program code size small Program code size large

Processor Architectural Features

Page 31: 1 Tema III – Microcontrollers and Microprocessors

31

RISC vs CISC ¨  RISC machines: SUN SPARC, SGI Mips, HP PA-RISC, ARM ¨  CISC machines: Intel 80x86, Motorola 680x0 ¨  What really distinguishes RISC from CISC these days lies in the architecture and

not in the instruction set. ¨  CISC occurs whenever there is a disparity in speed between CPU operations and

memory accesses due to technology or cost. ¨  What about combining both ideas?

¤  Intel 8086 Pentium P6 architecture is externally CISC but internally RISC & CISC! ¤  Intel IA-64 executes many instructions in parallel.

Processor Architectural Features

CISC (Intel 486)

RISC (MIPS R4000)

#instructions 235 94

Addr. modes 11 1

Inst. Size (bytes) 1-12 4

GP registers 8 32

Page 32: 1 Tema III – Microcontrollers and Microprocessors

32

Instruction Code Format ¨  Opcode encoding depends on the number of bit used.

¤  Example: For ARM, all instructions are of 32-bit length, but only 8 bits (bit 20 to 28) are used to encode the instruction. Hence a total of 28 = 256 different instructions possible.

¨  A typical instruction is encoded with a specific bit pattern that consists of the following: ¤  Opcode field specifying the operation to be performed. ¤  Operand(s) identification (address) field that depends on the modes of

addressing; n  this provides the address of the register/memory location (s) that store the

operand(s), or the operand itself.

Processor Architectural Features

Page 33: 1 Tema III – Microcontrollers and Microprocessors

33

Operand Addressing Types: ¨  Immediate addressing.

¤  Operand is given in the instruction.

¨  Register addressing. ¤  Operand is stored in a register.

¨  Direct addressing. ¤  Operand is stored in memory, with the address

given in the instruction.

¨  Indirect (Index) addressing. ¤  Operand is stored in memory, with the address

given in a register (address adds with an offset given in the instruction).

¨  Implied addressing ¤  Implicit location like stack and program counter.

Instruction Opcode Types: ¨  General categories of

instruction operations: ¤  Data transfer ¤  E.g. move, load, and store ¤  Data manipulation ¤  E.g. add, subtract, logical

operation ¤  Program control ¤  E.g. branch, subroutine call

Processor Architectural Features

Page 34: 1 Tema III – Microcontrollers and Microprocessors

34

Instruction Execution ¨  Multiple stages are involved in executing an instruction. Example:

1)  Fetching the instruction code. Reads the instruction from the memory 1)  Decoding the instruction code. Determining which instruction is to be executed 2)  Executing the instruction code. Performs the operations necessary to complete

what the instruction is suppose to do. Read data from memory, write data to memory or I/O device, perform only operations within CPU or combination of those.

¨  Hence multiple processor clock cycles are needed to execute one single instruction.

Fetch Instruction

Decode Instruction

Execute Instruction

time

Fetch Instruction

Decode Instruction

Execute Instruction

1st instruction 2nd instruction

Processor Architectural Features

Page 35: 1 Tema III – Microcontrollers and Microprocessors

35

Instruction Pipeline ¨  Pipeline allows concurrent execution of multiple different

instructions at the same time ¨  During a normal operation

•  While one instruction is being executed. •  The next instruction is being decoded. •  And a third instruction is being fetched from memory. •  Allows effective throughput to increase to one instruction per clock cycle.

Processor Architectural Features

Page 36: 1 Tema III – Microcontrollers and Microprocessors

36

Pipeline Architecture: Longer pipeline can also be used to further break down the operation carried out in the individual stage. Simpler logic for each stage to increase system clock.

Example: A 5-stage instruction pipeline

Fetch Instruction

Decode Instruction

Fetch Operand

Execute Instruction

Store Result

Parallel execution of multiple instructions.

Assume instructions are completely independent!

Fetch Instruction

Decode Instruction

Fetch Operand

Execute Instruction

Store Result

Fetch Instruction

Decode Instruction

Fetch Operand

Execute Instruction

Store Result

Fetch Instruction

Decode Instruction

Fetch Operand

Execute Instruction

Store Result

Fetch Instruction

Decode Instruction

Fetch Operand

Execute Instruction

Store Result

time

1st

2nd 3rd

4th

5th

Maximum Speedup é Number of stages Speedup ≈ Time for unpipelined operation Time for longest stage

Processor Architectural Features

Page 37: 1 Tema III – Microcontrollers and Microprocessors

37

ARM Cortex-A15

Processor Architectural Features

Page 38: 1 Tema III – Microcontrollers and Microprocessors

38

¨  Introduction ¨  Processor Architectural Features. Datapath &

pipeline. ¨  Data Representation: Fixed-point vs Floating-

point ¨  Interrupts, Exceptions, Watch-Dog, … ¨  32-bit microcontroller. ARM Cortex-M3

¤  ARM Cortex-M3 Architecture. Programmers Model.

¨  32/64bit microprocessor. ¤  Intel x86, UltraSparc Architecture. Programmers Model

Page 39: 1 Tema III – Microcontrollers and Microprocessors

39

¨  Numerical values represented as binary fractions: -1.0 ≤ value < 1.0 ¨  Why a fractional representation?

¤  Multiplying a fraction by a fraction always results in a fraction and will not produce an overflow (e.g., 0.99 x 0.9999 = less than 1). Successive additions may cause overflow

¤  Normalized representation is convenient. Signal processing is multiplication-intensive.

¤  Coefficients from digital filter designs are typically already in fractional form.

Data representation. Fixed-point vs Floating-point

-20 2-1 2-2 2-3 2-(n-1)

Radix point

Sign bit

Page 40: 1 Tema III – Microcontrollers and Microprocessors

40

¨  Fixed-point Notation: ¤  Decimal point is always in a fixed location (e.g., 0.74, 0.34, etc.). ¤  Fixed-point notation prevents overflow (useful with a small dynamic range). ¤  Fixed-point notation is less expensive.

¨  How is fixed-point notation realized in a DSP? ¤  Most fixed-point DSPs are 16 bits. ¤  The range of numbers that can be represented is 215-1 to -215.

¤  The most common fixed-point format is Q15.

Data representation. Fixed-point vs Floating-point

Q15 Notation Bit 15 Bits 14 to 0

sign two’s complement number

Page 41: 1 Tema III – Microcontrollers and Microprocessors

41

Data representation. Fixed-point vs Floating-point

Dynamic range in Q15

Number representations in Q15

Rules for operations Avoid operations with numbers larger than 1

2.0 x (0.5 x 0.45) = (0.2 x 0.5 x 0.45) x 10

= (0.5 x 0.45) + (0.5 x 0.45)

Scale numbers before the operation

0.5 in Q15 = 0.5 x 32767 =16384

Number Biggest Smallest

Fractional number 0.999 -1.000

Scaled integer for Q15 32767 -32768

Decimal Q15 = Decimal x 215 Q15 Integer

0.5 0.5 x 32767 16384

0.05 0.05 x 32767 1638

0.0012 0.0012 x 32767 39

Addition

Multiplication 2 x 0.5 x 0.45 =

Decimal Q15 Scale back Q15 / 32767

0.5 + 0.05 = 0.55 16384 + 1638 = 18022 0.55

0.5 – 0.05 = 0.45 16384 – 1638 = 14746 0.45

Decimal Q15 Back to Q15 Product / 32767

Scale back Q15 / 32767

0.5 x 0.45 = 0.225 16384 x 14745 = 241584537 7373

0.225 + 0.225 = 0.45 7373 + 7373 = 14746 0.45

Page 42: 1 Tema III – Microcontrollers and Microprocessors

42

¨  Floating-point Notation:

Data representation. Fixed-point vs Floating-point

Conversion equations

Special case

e = exponent is a signed two’s compliment 8-bit field and determines the location of the binary Q point

s = sign of mantissa (s = 0 positive, s =1 negative) f = fractional part of the mantissa; an implied 1.0 is added to this fraction

but is not allocated in the bit field since this value is always present

single-precision floating-point format

Bit No

Exponent (e)

Hex two’s comp. 00 01 7F FF 80

Decimal 1 127 -1 -128 0

31 ... 24 23 22 .............. 0e s f

8 bits 1 bit 23 bits

Binary Decimal Equation s = 0 X = 01.f x 2e X = 01.f x 2e 1 s = 1 X = 10.f x 2e X = ( -2 + 0.f ) x 2e 2

s = 0 X = 0 e = -128

Page 43: 1 Tema III – Microcontrollers and Microprocessors

43

¨  Floating-point Numbers:

Data representation. Fixed-point vs Floating-point

Calculate 1.0e0 In hex 00 00 00 00 In binary 00000000000000000000000000000000 s = 0 Equation 1 applies: X = 01.f x2e

01.0 x 20 = 1.0

e = 0

f = 0

Calculate 1.5e01 In hex 03 70 00 00 In binary 00110111000000000000000000000000 s = 0 Equation 1 applies: X = 01.f x2e

0011 e = 3 s111 f = 0.5 + 0.25 + 0.125 = 0.875 X = 01.875 x 23 = 15.0 decimal

...

Calculate -2.0e0 In hex 00 80 00 00 In binary 00000000100000000000000000000000 s = 1 Equation 2 applies: X = ( -2.0 + 0.f ) x 2e ( -2.0 + 0.0 ) x 20 = -2.0

e = 0

f = 0

Addition 1.5 + (-2.0) = 0.5 Multiplication 1.5e00 x 1.5e01 = 2.25e01 = 22.5

Page 44: 1 Tema III – Microcontrollers and Microprocessors

44

¨  Dynamic Range ¤  Ranges of number systems

¤  The dynamic range of floating-point representation is very large ¤  Conclusion

n  Largest integer x (1.5 x 10 29 ) ~ = largest floating point n  Largest Q15 x (1.03 x 10 34 ) ~ = largest floating point

Data representation. Fixed-point vs Floating-point

Numbers Base 2 Decimal Two’s

Complement Hex

Largest Integer 231 - 1 2 147 483 647 7F FF FF FF

Smallest Integer - 231 -2 147 483 648 80 00 00 00

Largest Q15 215 - 1 32 767 7F FF

Smallest Q15 - 215 -32 768 80 00

Largest Floating Point ( 2 - 2-23 ) x 2127 3.402823 x 1038 7F 7F FF FD

Smallest Floating Point -2 x 2127 -3.402823 x 1038 83 39 44 6E

Page 45: 1 Tema III – Microcontrollers and Microprocessors

45

¨  DSP devices are designed as floating point or fixed point. ¨  Floating-point devices usually have a full set of fixed-point instructions. ¨  Floating point devices are easier to program. ¨  Fixed-point devices can emulate floating point in software.

Data representation. Fixed-point vs Floating-point

Characteristic Floating point Fixed point Dynamic range much larger smaller

Resolution comparable comparable

Speed comparable comparable

Ease of programming much easier more difficult

Compiler efficiency more efficient less efficient

Power consumption comparable comparable

Chip cost comparable comparable

System cost comparable comparable

Design cost less more

Time to market faster slower

Page 46: 1 Tema III – Microcontrollers and Microprocessors

46

¨  Applications which require: ¤  High precision. ¤  Wide dynamic range. ¤  High signal-to-noise ratio. ¤  Ease of use.

Need a floating point processor. ¨  Drawback of floating point processors:

¤  Higher power consumption. ¤  Can be more expensive. ¤  Can be slower than fixed-point counterparts and larger in size.

DSP Data representation. Fixed-point vs Floating-point

Page 47: 1 Tema III – Microcontrollers and Microprocessors

47

¨  Introduction ¨  Processor Architectural Features. Datapath &

pipeline. ¨  Data Representation: Fixed-point vs Floating-point ¨  Interrupts, Exceptions, Watch-Dog, … ¨  32-bit microcontroller. ARM Cortex-M3

¤  ARM Cortex-M3 Architecture. Programmers Model.

¨  32/64bit microprocessor. ¤  Intel x86, UltraSparc Architecture. Programmers Model

Page 48: 1 Tema III – Microcontrollers and Microprocessors

Interrupt, Exceptions, Watch-Dog, … 48

¨  Exceptions:

¤  Exception handling is a combination of hardware behaviors and software constructs designed to manage a unique condition. n  Related to the current program flow. n  Result of unexpected error conditions (such as a bus error). n  Result of illegal operations (guarded memory access). n  Some exceptions can be programmed to occur (FIT, PIT). n  A software routine could not execute properly (divide by 0).

¤  Exception handling changes the normal flow of software execution.

Page 49: 1 Tema III – Microcontrollers and Microprocessors

Interrupt, Exceptions, Watch-Dog, … 49

¨  Interrupts:

¤  A hardware interrupt is an asynchronous signal from hardware, either originating outside the SoC or from the programmable logic within the SoC, indicating a peripheral's need for attention. n  Embedded processor peripheral (FIT, PIT, for example). n  External bus peripheral (UART, EMAC, for example). n  External interrupts enter via hardware pin(s). n  Multiple hardware interrupts can utilize general interrupt controller of the PS.

¤  A software interrupt is a synchronous event in software, often referred to as exceptions, indicating the need for a change in execution. n  Examples

n  Divide by zero. n  Illegal instruction. n  User-generated software interrupt.

Page 50: 1 Tema III – Microcontrollers and Microprocessors

Interrupt, Exceptions, Watch-Dog, … 50

¨  Cortex-A9 Modes and Registers: Cortex-A9 has seven execution modes

¤  Five are exception modes. ¤  Each mode has its own stack space and different

subset of registers. ¤  System mode will use the user mode registers.

Cortex-A9 has 37 registers ¤  Up to 18 visible at any one time. ¤  Execution modes have some private registers

that are banked in when the mode is changed. ¤  Non-banked registers are shared between

modes.

Page 51: 1 Tema III – Microcontrollers and Microprocessors

Interrupt, Exceptions, Watch-Dog, … 51

¨  Cortex-A9 Exceptions:

In Cortex-A9 processor interrupts are handled as exceptions ¤  Each Cortex-A9 processor core accepts two different levels of interrupts.

n  nFIQ interrupts from secure sources (serviced first). n  nIRQ interrupts from either secure sources or non-secure sources.

Page 52: 1 Tema III – Microcontrollers and Microprocessors

Interrupt, Exceptions, Watch-Dog, … 52

Interrupt Servicing in Cortex-A9:

¨  When an interrupt is received, the current executing instruction completes.

¨  Save processor status ¤  Copies CPSR into SPSR_irq. ¤  Stores the return address in LR_irq.

¨  Change processor status for exception ¤  Mode field bits. ¤  ARM or thumb (T2) state. ¤  Interrupt disable bits (if appropriate). ¤  Sets PC to vector address (either FIQ or

IRQ).

¨  The above steps are performed automatically by the core

Page 53: 1 Tema III – Microcontrollers and Microprocessors

Interrupt, Exceptions, Watch-Dog, … 53

General Interrupt Controller (GIC)

¨  Supports interrupt prioritization ¨  Handles up to 16 software-

generated interrupts (SGI)

¨  Supports 64 shared peripheral interrupts (SPI) starting at ID 32

¨  Processes both level-sensitive interrupts and edge-sensitive interrupts ¤  Five private peripheral

interrupts (PPI) dedicated for each.

¤  The global timer, private watchdog timer, private timer, and FIQ/IRQ from the PL.

Page 54: 1 Tema III – Microcontrollers and Microprocessors

54

¨  Introduction ¨  Processor Architectural Features. Datapath &

pipeline. ¨  Data Representation: Fixed-point vs Floating-point ¨  Interrupts, Exceptions, Watch-Dog, … ¨  32-bit microcontroller. ARM Cortex-M3

¤  ARM Cortex-M3 Architecture. Programmers Model.

¨  32/64bit microprocessor. ¤  Intel x86, UltraSparc Architecture. Programmers Model

Page 55: 1 Tema III – Microcontrollers and Microprocessors

55

A microcontroller combines onto the same microchip : ¨  The CPU core ¨  Memory (both ROM and RAM) ¨  I/O – parallel, serial, analog,

digital

32-bit microcontroller ARM Cortex-M3. Architecture. Programmers Model.

Page 56: 1 Tema III – Microcontrollers and Microprocessors

56

ARM Ltd ¨  Founded in November 1990

¤  Spun out of Acorn Computers ¤  Initial funding from Apple, Acorn

and VLSI ¨  Designs the ARM range of RISC

processor cores ¤  Licenses ARM core designs to

semiconductor partners who fabricate and sell to their customers

¤  ARM does not fabricate silicon itself

¨  Also develop technologies to assist with the design-in of the ARM architecture ¤  Software tools, boards, debug

hardware ¤  Application software ¤  Bus architectures ¤  Peripherals, etc

32-bit microcontroller ARM Cortex-M3. Architecture. Programmers Model.

Energy Efficient Appliances

IR Fire Detector

Intelligent Vending

Tele-parking

Utility Meters

Exercise Machines Intelligent

toys

Equipment Adopting 32-bit ARM Microcontrollers

Page 57: 1 Tema III – Microcontrollers and Microprocessors

57

Cortex Family ¨  ARM Cortex-A family (v7-A):

¤  Applications processors for full OS and 3rd party applications

¨  ARM Cortex-R family (v7-R): ¤  Embedded processors for real-time

signal processing, control applications

¨  ARM Cortex-M family (v7-M): ¤  Microcontroller-oriented processors

for MCU and SoC applications

32-bit microcontroller ARM Cortex-M3. Architecture. Programmers Model.

Cortex-R4

Cortex-A8

SC300™

Cortex-M1

Cortex™-M3

...2.5GHz x1-4

Cortex-A9

12k gates... Cortex-M0

Cortex-M4

x1-4

Cortex-A5

x1-4

Cortex-A15

Page 58: 1 Tema III – Microcontrollers and Microprocessors

58

ARM Cortex Family

32-bit microcontroller ARM Cortex-M3. Architecture. Programmers Model.

Cortex-A8 §  Architecture v7A

§  MMU

§  AXI

§  VFP & NEON support

Cortex-R4

§  Architecture v7R

§  MPU (optional)

§  AXI

§  Dual Issue

Cortex-M3 §  Architecture v7M

§  MPU (optional)

§  AHB Lite & APB

Page 59: 1 Tema III – Microcontrollers and Microprocessors

59

Relative Perfomance

32-bit microcontroller ARM Cortex-M3. Architecture. Programmers Model.

Cortex-M0

Cortex-M3 ARM7 ARM92

6 ARM10

26 ARM11

36 ARM11

76 Cortex-

A8

Cortex-A9

Dual-core

Max Freq (MHz) 50 150 184 470 540 610 750 1100 2000 Min Power (mW/MHz) 0,012 0,06 0,35 0,235 0,36 0,335 0,568 0,43 0,5

0

500

1000

1500

2000

2500

Max

Fre

quen

cy (M

hz)

Page 60: 1 Tema III – Microcontrollers and Microprocessors

60

ARM architecture ¨  Load/store

architecture ¨  A large array of

uniform registers ¨  Fixed-length 32-bit

instructions ¨  3-address instructions

32-bit microcontroller ARM Cortex-M3. Architecture. Programmers Model.

Page 61: 1 Tema III – Microcontrollers and Microprocessors

61

Data Sizes and Instruction Sets ¨  The ARM is a 32-bit architecture. ¨  When used in relation to the

ARM: ¤  Byte means 8 bits ¤  Halfword means 16 bits (two

bytes) ¤  Word means 32 bits (four bytes)

¨  Most ARM’s implement two instruction sets ¤  32-bit ARM Instruction Set ¤  16-bit Thumb Instruction Set

¨  Jazelle cores can also execute Java bytecode

32-bit microcontroller ARM Cortex-M3. Architecture. Programmers Model.

Memory width (zero wait state)

0

5000

10000

15000

20000

25000

30000

32-bit 16-bit 16-bit with 32-bit stack

ARM

Thumb

Dhrystone 2.1/sec @ 20MHz

ARM and Thumb Performance

0 8 7 16 15 24 23 31

8-bit Byte 16-bit Half word

32-bit word

Page 62: 1 Tema III – Microcontrollers and Microprocessors

62

The Thumb-2 instruction set ¨  Variable-length instructions

¤  ARM instructions are a fixed length of 32 bits. ¤  Thumb instructions are a fixed length of 16 bits.

¤  Thumb-2 instructions can be either 16-bit or 32-bit.

¨  Thumb-2 g ives approx imate ly 26% improvement in code density over ARM

¨  Thumb-2 g ives approx imate ly 25% improvement in performance over Thumb.

32-bit microcontroller ARM Cortex-M3. Architecture. Programmers Model.

Page 63: 1 Tema III – Microcontrollers and Microprocessors

63

Cortex-M Programmer’s Model ¨  Fully programmable in C ¨  Stack-based exception

model ¨  Only two processor modes

¤  Thread Mode for User tasks

¤  Handler Mode for OS tasks and exceptions

¨  Vector table contains addresses

32-bit microcontroller ARM Cortex-M3. Architecture. Programmers Model.

Process

r8 r9 r10 r11 r12 sp lr r15 (pc)

xPSR

r0 r1 r2 r3 r4 r5 r6 r7

Main

sp

32-bits Endianess

Address Space

32-bits Endianess

Page 64: 1 Tema III – Microcontrollers and Microprocessors

64

Address Space

32-bit microcontroller ARM Cortex-M3. Architecture. Programmers Model.

Page 65: 1 Tema III – Microcontrollers and Microprocessors

65

Cortex-M3 Processor Privilege

32-bit microcontroller ARM Cortex-M3. Architecture. Programmers Model.

ARM Cortex-M3

Application code

OS

System Call (SVCall) Undefined Instruction

Privileged

Memory

Instructions & Data

Aborts Interrupts Reset

Non-Privileged

Supervisor

User

Handler Mode

Thread Mode

Page 66: 1 Tema III – Microcontrollers and Microprocessors

66

Cortex-M3 Interrupt Handling ¨  One Non-Maskable Interrupt

(INTNMI) supported ¨  1-240 prioritizable interrupts

supported ¤  Interrupts can be masked ¤  Implementation option

selects number of interrupts supported

¨  Nested Vectored Interrupt Controller (NVIC) is tightly coupled with processor core

¨  Interrupt inputs are active HIGH

32-bit microcontroller ARM Cortex-M3. Architecture. Programmers Model.

INTNMI

NVIC

Cortex-M3

1-240 Interrupts INTISR[239:0] …

Cortex-M3 Processor Core

Page 67: 1 Tema III – Microcontrollers and Microprocessors

67

Cortex-M3 Exception Handling ¤  Reset : power-on or system reset ¤  NMI : cannot be stopped or preempted by any

exception other than reset

¤  Faults n  Hard Fault : default Fault or any fault unable to

activate n  Memory Manage : MPU violations n  Bus Fault : prefetch and memory access violations n  Usage Fault : undef instructions, divide by zero, etc.

¤  SVCall : privileged OS requests

¤  Debug Monitor : debug monitor program

¤  PendSV : pending SVCalls

¤  SysTick Interrupt : internal sys timer, i.e., used by RTOS to periodically check resources or peripherals

¤  External Interrupt : i.e., external peripherals

32-bit microcontroller ARM Cortex-M3. Architecture. Programmers Model.

Page 68: 1 Tema III – Microcontrollers and Microprocessors

68

Cortex-M3 Program Status Register ¨  One Status Register consisting of

¤  APSR - Application Program Status Register – ALU flags ¤  IPSR - Interrupt Program Status Register – Interrupt/Exception No. ¤  EPSR - Execution Program Status Register

n  IT field – If/Then block information n  ICI field – Interruptible-Continuable Instruction information

¨  xPSR

¤  Composite of the 3 PSRs ¤  Stored on the stack on exception entry

32-bit microcontroller ARM Cortex-M3. Architecture. Programmers Model.

IT/ICI IT

27 31

N Z C V Q

28 7

ISR Number

16 23

15

0 24 25 26 10

T

Page 69: 1 Tema III – Microcontrollers and Microprocessors

69

Conditional Execution ¨  If – Then (IT) instruction added (16 bit)

¤  Up to 3 additional “then” or “else” conditions maybe specified (T or E) ¤  Makes up to 4 following instructions conditional

¨ Any normal ARM condition code can be used ¨ 16-bit instructions in block do not affect condition code flags

¤ Apart from comparison instruction. ¤ 32 bit instructions may affect flags (normal rules apply)

¨  Current “if-then status” stored in CPSR ¤ Conditional block maybe safely interrupted and returned to ¤ Must NOT branch into or out of ‘if-then’ block

32-bit microcontroller ARM Cortex-M3. Architecture. Programmers Model.

ITTET EQ Inst 1 Inst 2 Inst 3 Inst 4

MOVEQ ADDEQ SUBNE ORREQ

Page 70: 1 Tema III – Microcontrollers and Microprocessors

70

Classes of Instructions (v4T)

32-bit microcontroller ARM Cortex-M3. Architecture. Programmers Model.

Load/Store

LDR

STR

ADR

Miscellaneous

CMP

SWI

SWP Data Operations

ADD MUL

LSL

AND MOV PC, Rm Bcc BL BLX

Change of Flow

Page 71: 1 Tema III – Microcontrollers and Microprocessors

71

Data Processing Instructions ¨  Consist of :

¤  Arithmetic: ADD ADC SUB SBC RSB RSC ¤  Logical: AND ORR EOR BIC ¤  Comparisons: CMP CMN TST TEQ ¤  Data movement: MOV MVN

¨  These instructions only work on registers, NOT memory. ¨  Syntax:

<Operation>{<cond>}{S} Rd, Rn, Operand2 n  Comparisons set flags only - they do not specify Rd n  Data movement does not specify Rn n  Second operand is sent to the ALU via barrel shifter.

32-bit microcontroller ARM Cortex-M3. Architecture. Programmers Model.

Page 72: 1 Tema III – Microcontrollers and Microprocessors

72

Using a Barrel-shifter: The 2nd Operand

32-bit microcontroller ARM Cortex-M3. Architecture. Programmers Model.

Register, optionally with shift operation •  Shift value can be either be:

•  5 bit unsigned integer •  Specified in bottom byte of

another register. •  Used for multiplication by constant

Immediate value

•  8 bit number, with a range of 0-255.

•  Rotated right through even number of positions

•  Allows increased range of 32-bit constants to be loaded directly into registers

Result

Operand 1

Barrel Shifter

Operand 2

ALU

Page 73: 1 Tema III – Microcontrollers and Microprocessors

73

Single Register Data Transfer LDR STR Word LDRB STRB Byte LDRH STRH Halfword LDRSB Signed byte load LDRSH Signed halfword load

¨  Memory system must support all access sizes

¨  Syntax: ¤  LDR{<cond>}{<size>} Rd, <address> ¤  STR{<cond>}{<size>} Rd, <address> e.g. LDREQB

32-bit microcontroller ARM Cortex-M3. Architecture. Programmers Model.

Page 74: 1 Tema III – Microcontrollers and Microprocessors

74

Cortex-M3 Datapath

32-bit microcontroller ARM Cortex-M3. Architecture. Programmers Model.

Register Bank Mul/Div

Address Incrementer

ALU

B

A

INTADDR

I_HADDR

Address Register

Barrel Shifter

Writeback

ALU

Read Data Register

Write Data Register

Instruction Decode

I_HRDATA

D_HWDATA

D_HRDATA

Address Incrementer

D_HADDR Address Register

Page 75: 1 Tema III – Microcontrollers and Microprocessors

75

Cortex-M3 Pipeline ¨  Cortex-M3 has 3-stage fetch-decode-execute pipeline

¤  Similar to ARM7 ¤  Cortex-M3 does more in each stage to increase overall performance

32-bit microcontroller ARM Cortex-M3. Architecture. Programmers Model.

Branch forwarding & speculation

1st Stage - Fetch 2nd Stage - Decode 3rd Stage - Execute

Execute stage branch (ALU branch & Load Store Branch)

Fetch (Prefetch)

AGU

Instruction Decode &

Register Read

Branch

Address Phase & Write Back

Data Phase Load/Store &

Branch

Multiply & Divide

Shift ALU & Branch

Write

Page 76: 1 Tema III – Microcontrollers and Microprocessors

76

SW Development

32-bit microcontroller ARM Cortex-M3. Architecture. Programmers Model.

0x00000142 49120x00000144 68080x00000146 F040000F0x0000014A 6008

Start; direction register LDR R1,=GPIO_PORTD_DIR_R LDR R0,[R1] ORR R0,R0,#0x0F; make PD3-0 output STR R0, [R1]

Source code

Build Target (F7)

DownloadObject code

Processor

Memory

I/O

SimulatedMicrocontroller

Address Data

Editor KeilTM uVision®

Processor

Memory

I/O

RealMicrocontroller

StartDebugSession

StartDebugSession

¨  GNU compiler and binutils ¤  gcc: GNU C compiler ¤  as: GNU assembler ¤  ld: GNU linker ¤  gdb: GNU project

debugger ¤  COFF (common object

file format) ¤  ELF (extended linker

format) ¤  Segments in the object file

n  Text: code n  Data: initialized global

variables n  BSS: uninitialized global

variables

.c .elf

C source executable

gcc .s

asm source

as .coff

object file

ld Simulator Debugger …

Page 77: 1 Tema III – Microcontrollers and Microprocessors

77

¨  Introduction ¨  Processor Architectural Features. Datapath &

pipeline. ¨  Data Representation: Fixed-point vs Floating-point ¨  Interrupts, Exceptions, Watch-Dog, … ¨  32-bit microcontroller. ARM Cortex-M3

¤  ARM Cortex-M3 Architecture. Programmers Model.

¨  32/64bit microprocessor. ¤  Intel x86, UltraSparc Architecture. Programmers Model

Page 78: 1 Tema III – Microcontrollers and Microprocessors

78

Intel x86 Processor Evolution: Name Date Transistors MHz 8086 1978 29K 5-10

n  First 16-bit processor. Basis for IBM PC & DOS n  1MB address space

386 1985 275K 16-33 n  First 32 bit processor , referred to as IA32 n  Added “flat addressing” n  Capable of running Unix n  Until recently, 32-bit Linux/gcc used no instructions introduced in later models

Pentium 4F 2005 230M 2800-3800 n  First 64-bit processor n  Meanwhile, Pentium 4s (Netburst arch.) phased out in favor of “Core” line

32/64bit microprocessor. Intel x86, UltraSparc

Page 79: 1 Tema III – Microcontrollers and Microprocessors

79

Intel x86 Processor Evolution: Machine Evolution

n  486 1989 1.9M n  Pentium 1993 3.1M n  Pentium/MMX 1997 4.5M n  PentiumPro 1995 6.5M n  Pentium III 1999 8.2M n  Pentium 4 2001 42M n  Core 2 Duo 2006 291M

Added Features n  Instructions to support multimedia operations

l  Parallel operations on 1, 2, and 4-byte data, both integer & FP n  Instructions to enable more efficient conditional operations

Linux/GCC Evolution n  Very limited, needs to get better – trying to maintain compatibility

32/64bit microprocessor. Intel x86, UltraSparc

Page 80: 1 Tema III – Microcontrollers and Microprocessors

80

Intel x86 Processor Evolution: Name Date Transistors Itanium 2001 10M

n  First shot at 64-bit architecture: first called IA64 n  Radically new instruction set designed for high

performance n  Can run existing IA32 programs

l  On-board “x86 engine” n  Joint project with Hewlett-Packard - Boat Anchor

Itanium 2 2002 221M n  Big performance boost

Itanium 2 Dual-Core 2006 1.7B ¤  Itanium has not taken off in marketplace

n  Lack of backward compatibility, no good compiler support, Pentium 4 too good.

32/64bit microprocessor. Intel x86, UltraSparc

Page 81: 1 Tema III – Microcontrollers and Microprocessors

81

¨  IA-32 architecture ¤  Lots of architecture improvements, pipelining, superscalar, branch prediction, hyperthreading and

multi-core. ¤  From programmer’s point of view, IA-32 has not changed substantially except the introduction of

a set of high-performance instructions

32/64bit microprocessor. Intel x86, UltraSparc

¨  Modes of operation ¤  Protected mode

n  Native mode (Windows, Linux), full features, separate memory

¤  Real-address mode n  Native MS-DOS

¤  System management mode n  Power management, system security, diagnostics

•  Virtual-8086 mode •  hybrid of Protected •  each program has its own 8086 computer

¨  Addressable Memory ¤  Protected mode

n  4 GB n  32-bit address

¤  Real-address and Virtual-8086 modes n  1 MB space n  20-bit address

Page 82: 1 Tema III – Microcontrollers and Microprocessors

82

¨  General Purpose Registers

32/64bit microprocessor. Intel x86, UltraSparc

CS

SS

DS

ES

EIP

EFLAGS

16-bit Segment Registers

EAXEBXECX

EDX

32-bit General-Purpose Registers

FS

GS

EBPESP

ESI

EDI

Page 83: 1 Tema III – Microcontrollers and Microprocessors

83

¨  Accessing parts of registers ¤  Use 8-bit name, 16-bit

name, or 32-bit name ¤  Applies to EAX, EBX,

ECX, and EDX ¤  The 16-bit registers are

usually used only in real-address mode.

32/64bit microprocessor. Intel x86, UltraSparc

AH AL

16 bits

8

AX

EAX

8

32 bits

8 bits + 8 bits

Page 84: 1 Tema III – Microcontrollers and Microprocessors

84

¨  Floating-point, MMX,XMM registers. ¤  Eight 80-bit floating-point data registers

n  ST(0), ST(1), . . . , ST(7) n  arranged in a stack

n  used for all floating-point arithmetic

¤  Eight 64-bit MMX registers.

¤  Eight 128-bit XMM registers for single-instruction multiple-data (SIMD) operations.

32/64bit microprocessor. Intel x86, UltraSparc

ST(0)ST(1)ST(2)

ST(3)

80-bit Data Registers

FPU Data Pointer

Tag Register

Control Register

Status Register

ST(4)ST(5)ST(6)

ST(7)

FPU Instruction Pointer

Opcode Register

16-bit Control Registers

48-bit Pointer Registers

Page 85: 1 Tema III – Microcontrollers and Microprocessors

85

¨  Programmer’s Model

32/64bit microprocessor. Intel x86, UltraSparc

Page 86: 1 Tema III – Microcontrollers and Microprocessors

86

¨  IA-32 addressing Modes

32/64bit microprocessor. Intel x86, UltraSparc

8

Page 87: 1 Tema III – Microcontrollers and Microprocessors

87

¨  IA-32 Memory Management ¤  Protected Mode n  1 MB RAM maximum

addressable (20-bit address)

n  Application programs can access any area of memory

n  Single tasking n  Supported by MS-DOS

operating system

32/64bit microprocessor. Intel x86, UltraSparc

00000

10000

20000

30000

40000

50000

60000

70000

80000

90000

A0000

B0000

C0000

D0000

E0000

F0000

8000:0000

8000:FFFF

seg ofs

8000:0250

0250

linea

r add

ress

es

one segment

(64K)

Segmented memory addressing: absolute (linear) address is a combination of a 16-bit segment value added to a 16-bit offset

Page 88: 1 Tema III – Microcontrollers and Microprocessors

88

¨  IA-32 Memory Management ¤  Real-address mode n  4 GB addressable RAM (32-

bit address) n  (00000000 to FFFFFFFFh)

n  Each program assigned a memory partition which is protected from other programs

n  Designed for multitasking n  Supported by Linux & MS-

Windows n  Segment descriptor tables n  Program structure

n  code, data, and stack areas n  CS, DS, SS segment descriptors n  global descriptor table (GDT)

n  MASM Programs use the Microsoft flat memory model.

32/64bit microprocessor. Intel x86, UltraSparc

Flat

segm

enta

tion

mod

el

3000

RAM

00003000

Local Descriptor Table

000200008000 000A00026000 0010

base limit access

8000

26000

multiplied by 1000h

Mul

ti-se

gmen

t mod

el

Page 89: 1 Tema III – Microcontrollers and Microprocessors

89

¨  IA-32 Memory Management ¤  Translating Addresses

n  The IA-32 processor uses a one- or two-step process to convert a variable's logical address into a unique memory location.

n  The first step combines a segment value with a variable’s offset to create a linear address.

n  The second optional step, called page translation, converts a linear address to a physical address.

32/64bit microprocessor. Intel x86, UltraSparc

Selector Offset

Logical address

Segment Descriptor

Descriptor table

+

GDTR/LDTR

(contains base address ofdescriptor table)

Linear address

Page 90: 1 Tema III – Microcontrollers and Microprocessors

90

¨  IA-32 Memory Management ¤  Indexing into a

Descriptor Table n  Each segment descriptor

indexes into the program's local descriptor table (LDT). Each table entry is mapped to a linear address.

32/64bit microprocessor. Intel x86, UltraSparc

Logical addresses

0018 0000003A

(unused)

DRAMSS ESP

001A0000

0002A000

0001A000

00003000

Local Descriptor Table

0010 000001B6

0008 00002CD3

LDTR register

DS18

10

08

00

(index)

Linear address space

IP

offset

Page 91: 1 Tema III – Microcontrollers and Microprocessors

91

¨  IA-32 Memory Management ¤  Paging

n  Virtual memory uses disk as part of the memory, thus allowing sum of all programs can be larger than physical memory. Only part of a program must be kept in memory, while the remaining parts are kept on disk.

n  The memory used by the program is divided into small units called pages (4096-byte).

n  As the program runs, the processor selectively unloads inactive pages from memory and loads other pages that are immediately required.

n  OS maintains page directory and page tables n  Page translation: CPU converts the linear address

into a physical address n  Page fault: occurs when a needed page is not in

memory, and the CPU interrupts the program

n  Virtual memory manager (VMM) – OS utility that manages the loading and unloading of pages

n  OS copies the page into memory, program resumes execution

32/64bit microprocessor. Intel x86, UltraSparc

Directory Table Offset

Directory Entry

CR3

Page Directory

Page-Table Entry

Page Table

Physical Address

Page Frame

Linear Address10 10 12

32

Page 92: 1 Tema III – Microcontrollers and Microprocessors

92

¨  Interrupt Handling ¨  Processor generates interrupts that

index into a Interrupt Descriptor Table, whose base is stored in IDTR and loaded using the privileged instruction LIDT.

¨  The descriptors in IDT can be ¤  Interrupt gate: ISR handled as a

normal call subroutine – uses the interrupted processor stack to save EIP,CS, (SS, ESP in case of stack switch – new stack got from TSS).

¤  Task gate: ISR handled as a task switch n  Needed for stack fault in CPL = 0 and

double faults.

32/64bit microprocessor. Intel x86, UltraSparc

Page 93: 1 Tema III – Microcontrollers and Microprocessors

93

Intel® Core® Micro-architecture Blocks

32/64bit microprocessor. Intel x86, UltraSparc

Branch Target Buffer

Microcode Sequencer

Register Allocation Table (RAT)

32 KB Instruction Cache Next IP

Instruction Decode (4 issue)

Fetch / Decode

Retire

Re-Order Buffer (ROB) – 96 entry

IA Register Set

To L2 Cache

Por

t P

ort

Por

t P

ort

Bus Unit

Res

erva

tion

Sta

tion

s (R

S)

32

en

try

Sch

edu

ler

/ D

isp

atch

Por

ts

32 KB Data Cache

Execute

Por

t

FP Add

SIMD Integer Arithmetic

Memory Order Buffer (MOB)

Load

Store Addr

FP Div/Mul Integer

Shift/Rotate SIMD

SIMD

Integer Arithmetic

Integer Arithmetic

Por

t

Store Data

Page 94: 1 Tema III – Microcontrollers and Microprocessors

94

Intel® Core® Micro-architecture Blocks ¨ Intel® Wide Dynamic Execution

¤  14-stage efficient pipeline n  Wider decoding capacity n  Advanced branch prediction n  Wider execution path

¤  64-Bit Support n  Merom, Conroe, and Woodcrest support

EM64T

¨ Intel® Advanced Smart Cache ¤  Multi-core optimization

n  Shared between the two cores n  Advanced Transfer Cache architecture n  Reduced bus traffic n  Both cores have full access to the entire cache n  Dynamic Cache sizing

¤  Shared second level (L2) 2MB 8-way or 4MB 16-way instruction and data cache

Execution Unit Overview

Execute 6 operations/cycle •  3 Memory Operations

•  1 Load •  1 Store Address

•  1 Store Data •  3 “Computational” Operations

Unified Reservation Station

Port 0

Port 1

Port 2

Port 3

Port 4

Port 5

Load Store Address

Store Data

Integer ALU & Shift

Integer ALU & LEA

Integer ALU & Shift

Branch FP Add FP Multiply

Complex Integer Divide

SSE Integer ALU Integer Shuffles

SSE Integer Multiply

FP Shuffle

SSE Integer ALU Integer Shuffles

Unified Reservation Station •  Schedules operations to Execution units •  Single Scheduler for all Execution Units •  Can be used by all integer, all FP, etc.

Page 95: 1 Tema III – Microcontrollers and Microprocessors

95

Intel® Core® Micro-architecture Blocks

¨ Instruction Decode ¤  Frequent pairs of micro-operations

derived from the same Macro Instruction can be fused into a single micro-operation

32/64bit microprocessor. Intel x86, UltraSparc

Micro-op fusion effectively widens the pipeline

Page 96: 1 Tema III – Microcontrollers and Microprocessors

96

Intel® Core® Micro-architecture Blocks ¨ Intel® Advanced Digital Media Boost

¤  Single Cycle SSE n  8 Single Precision Flops/cycle n  4 Double Precision Flops/cycle

¤  Wide Operations n  128-bit packed Add n  128-bit packed Multiply

n  128-bit packed Load n  128-bit packed Store

¤  Support for Intel® EM64T instructions

32/64bit microprocessor. Intel x86, UltraSparc

Core™ µarch

Previous

X4

Y4

X4opY4

SOURCE

X1opY1

X3

Y3

X3opY3

X2

Y2

X2opY2

X1

Y1

X1opY1

DEST

SSE/2/3 OP

X2opY2

X3opY3 X4opY4

CLOCK CYCLE 1

CLOCK CYCLE 2

0 127

CLOCK CYCLE 1

SSE Operation (SSE/SSE2/SSE3)

Page 97: 1 Tema III – Microcontrollers and Microprocessors

97

Intel® Core® Micro-architecture Blocks

¨  Hyperthreading ¤  Ability of processor to run multiple

threads n  Duplicate architecture state

creates illusion to SW of Dual Processor (DP).

n  Execution unit shared between two threads, but dedicated if one stalls.

¤  Almost two Logical Processors. ¤  Architecture state (registers) and APIC

duplicated. ¤  Share execution units, caches, branch

prediction, control logic and buses.

32/64bit microprocessor. Intel x86, UltraSparc

Processor Execution Resource

Adv. Programmable Interrupt Control

Architecture State

Adv. Programmable Interrupt Control

Architecture State

On-Die Caches

System Bus

Page 98: 1 Tema III – Microcontrollers and Microprocessors

98

Intel® Core® Micro-architecture Blocks

¨ Power Efficient Support ¤  Advanced power gating & Dynamic

power coordination n  Multi-point demand-based switching n  Voltage-Frequency switching separation n  Supports transitions to deeper sleep

modes n  Event blocking n  Clock partitioning and recovery n  Dynamic Bus Parking n  During periods of high performance

execution, many parts of the chip core can be shut off

32/64bit microprocessor. Intel x86, UltraSparc

PLL

Uncore , LLC

Core Vcc

Freq . Sensors

Core Vcc

Freq . Sensors

Core Vcc

Freq . Sensors

Core Vcc

Freq . Sensors

PLL

PLL

PLL

PLL

PCU

BCLK Vcc

Page 99: 1 Tema III – Microcontrollers and Microprocessors

99

X86-64 Architecture

¨  Full support for 64-bit integers ¤  All general-purpose registers are expanded from 32 bits to 64 bits ¤  All arithmetic and logical operations, memory-to-register, and register-to-memory

operations are now directly supported for 64-bit integers ¤  Pushes and pops on the stack are always in eight-byte strides, and pointers are

eight bytes wide ¨  Additional registers

¤  The number of named registers is increased from 8 (i.e. eax, ebx, ecx, edx, ebp, esp, esi, edi) to 16.

¤  Compilers can keep more local variables in registers rather than on the stack. ¤  Can use registers for frequently accessed constants. ¤  Arguments for small and fast subroutines may also be passed in registers to a

greater extent.

32/64bit microprocessor. Intel x86, UltraSparc

Page 100: 1 Tema III – Microcontrollers and Microprocessors

100

X86-64 Architecture

¨  Larger virtual address space ¤  Current models can address

up to 256 terabytes ¤  Expandable in the future to

16 exabytes ¤  Compared to just 4 gigabytes

for 32-bit x86

¨  Larger physical address space ¤  Current models can address

up to 1 terabyte ¤  Expandable in the future to

4 petabytes

32/64bit microprocessor. Intel x86, UltraSparc

Page 101: 1 Tema III – Microcontrollers and Microprocessors

101

UltraSparc (RISC)

¨  Sun Microsystems (ORACLE) ¨  Sparc = Scalable Processor

Architecture Open processor architecture

¨  SUN UltraSparc v9: ¤  RISC Architecture big-endian. ¤  64 bit address and data. ¤  Memory Management

Unit(MMU). ¤  Superscalar. ¤  OpenSparc (open-source) ¤  LEON (soft-core). Space rated.

VHDL

32/64bit microprocessor. Intel x86, UltraSparc

Begin developing Sparc – 1984 First Sparc Processor – 1986 SuperSparc – 1992 UltraSparc I – 1995 UltraSparc II – 1997 UltraSparc III – 2001 UltraSparc IV – 2004 UltraSparc IV+ – 2005 UltraSparc T1 – 2005 UltraSparc T2 – 2007 Sparc T3 – 2010 Sparc T4 – 2011 Sparc T5 – 2013

Page 102: 1 Tema III – Microcontrollers and Microprocessors

102

UltraSparc (RISC)

¨  Registers ¤  ~160 general-purpose registers ¤  Any procedure can access only 32

registers (r0~r31) n  First 8 registers (r0~r8) are global,

i.e. they can be access by all procedures on the system (r0 is zero)

n  Other 24 registers can be visualized as a window through which part of the register file can be seen

¤  Program counter (PC) n  The address of the next instruction to

be executed

¤  Condition code registers ¤  Other control registers

32/64bit microprocessor. Intel x86, UltraSparc

¨  Data Formats ¤  Integers are 8-, 16-, 32-, 64-bit binary

numbers ¤  2’s complement is used for negative values ¤  Support both big-endian and little-endian

byte orderings n  (big-endian means the most significant part of

a numeric value is stored at the lowest-numbered address)

¤  Three different floating-point data formats n  Single-precision, 32 bits long (23 + 8 + 1) n  Double-precision, 64 bits long (52 + 11 + 1) n  Quad-precision, 128 bits long (112 + 15 + 1)

Page 103: 1 Tema III – Microcontrollers and Microprocessors

103

UltraSparc (RISC)

¨  Addressing Modes ¤  Immediate mode ¤  Register direct mode ¤  Memory addressing

Mode Target address calculation PC-relative* TA= (PC)+displacement {30 bits, signed} Register indirect TA= (register)+displacement {13 bits, signed} with displacement Register indirect indexed TA= (register-1)+(register-2)

*PC-relative is used only for branch instructions

32/64bit microprocessor. Intel x86, UltraSparc

¨  Instruction Set ¤  <150 instructions ¤  Pipelined execution

n  While one instruction is being executed, the next one is fetched from memory and decoded

¤  Delayed branches n  The instruction immediately following the branch

instruction is actually executed before the branch is taken

¤  Special-purpose instructions n  High-bandwidth block load and store operations n  Special “atomic” instructions to support multi-

processor system

¨  Input and Output ¤  A range of memory locations is logically replaced

by device registers ¤  Each I/O device has a unique address, or set of

addresses ¤  No special I/O instructions are needed

Page 104: 1 Tema III – Microcontrollers and Microprocessors

104

UltraSparc T2 (RISC)

¨  Multi-threaded(8), multi-core(8) CPU

¨  Frequency ranges from 900MHz to 1.4GHz

¨  Powered by less than 95 watts (nominal) with less than 2 watts per thread

¨  Integrated ¤  10 Gb Ethernet networking ¤  PCI Express I/O expansion ¤  FPU and cryptographic

processing units per core

32/64bit microprocessor. Intel x86, UltraSparc

¨  Codename Niagara2 ¨  Member of SPARC family ¨  2 previous multi-core processors

¤  UltraSPARC IV ¤  UltraSPARC IV+

¨  UltraSPARC T1 (first multi-core and multi-threaded) ¤  Released 14 November 2005 ¤  4, 6, or 8 cores with 4 threads each

¨  UltraSPARC T2 Released 7 August 2007 ¤  Now 8 threads per core (instead of 4)

Page 105: 1 Tema III – Microcontrollers and Microprocessors

105

UltraSparc T2 (RISC)

¨  8 Fully pipelined FPUs ¨  8 SPUs ¨  2 integer ALUs per core, each

one shared by a group of four threads

¨  4MB L2 Cache (8-banks, 16-way associative)

¨  8 KB data cache and 16 KB instruction cache

¨  Two 10Gb Ethernet ports and one PCIe port

32/64bit microprocessor. Intel x86, UltraSparc

Page 106: 1 Tema III – Microcontrollers and Microprocessors

106

UltraSparc T2 (RISC)

32/64bit microprocessor. Intel x86, UltraSparc

Page 107: 1 Tema III – Microcontrollers and Microprocessors

107

UltraSparc T2.Core Architecture

32/64bit microprocessor. Intel x86, UltraSparc

Page 108: 1 Tema III – Microcontrollers and Microprocessors

108

UltraSparc T2.Core Architecture

32/64bit microprocessor. Intel x86, UltraSparc

Page 109: 1 Tema III – Microcontrollers and Microprocessors

109

UltraSparc T2 Pipeline

¨  Eight-stage integer pipeline

¤  Pick is for selecting 2 threads for execution (Added this stage for T2) ¤  In the bypass stage, the load/store unit (LSU) forwards data to the integer register files

(IRFs) with sufficient write timing margin. All integer operations pass through the bypass stage.

¨  12-stage floating point pipeline

Ø  6-cycle latency for dependent FP ops!Ø  Integer multiplies are pipelined between different threads. Integer multiplies block within the same thread.!Ø  Integer divide is a long latency operation. Integer divides are not pipelined between different threads.!

32/64bit microprocessor. Intel x86, UltraSparc

Fetch Cache Pick Decode Execute Mem Bypass W

Fetch Cache Pick Decode Execute Fx1 Fx5 FW . . . FB

Page 110: 1 Tema III – Microcontrollers and Microprocessors

110

MIPS (ARM) vs x86

32/64bit microprocessor. Intel x86, UltraSparc

x86 32/64-bit 4KB Data unaligned Right add %rs1,%rs2,%rd %r0, %r1, ..., %r7 (n.a.) (n.a.)

MIPS (ARM) Address: 32/64-bit Page size: 4KB Data aligned Destination reg: Left add $rd,$rs1,$rs2 Regs: $0, $1, ..., $31 Reg = 0: $0 Return address: $31

MIPS: “Three-address architecture” •  Arithmetic-logic specify all 3 operands

!add $s0,$s1,$s2 # s0=s1+s2!Benefit: fewer instructions éé performance x86: “Two-address architecture” •  Only 2 operands, so the destination is also one of

the sources add $s1,$s0 # s0=s0+s1! Often true in C statements: c += b;

Benefit: smaller instructions êê smaller code

Page 111: 1 Tema III – Microcontrollers and Microprocessors

111

MIPS (ARM) vs x86

32/64bit microprocessor. Intel x86, UltraSparc

MIPS: “load-store architecture” •  Only Load/Store access memory; rest

operations register-register; e.g., lw $t0, 12($gp) add $s0,$s0,$t0 # s0=s0+Mem[12+gp]!

Benefit: simpler hardware è easier to pipeline, higher performance

x86: “register-memory architecture” •  All operations can have an operand in memory;

other operand is a register; e.g., add 12(%gp),%s0 # s0=s0+Mem[12+gp]!

Benefit: fewer instructions è smaller code

MIPS: “fixed-length instructions” •  All instructions same size, e.g., 4 bytes •  Simple hardware performance •  Branches can be multiples of 4 bytes

x86: “variable-length instructions” •  Instructions are multiple of bytes: 1 to 17;

êê small code size (30% smaller?) •  More Recent Performance Benefit:

better instruction cache hit rates •  Instructions can include 8- or 32-bit

immediates