Chapter 3

THE EMBEDDED COMPUTING

PLATFORM

The CPU Bus

The bus is the mechanism by which the CPU communicates with memory and devices.

A bus is, at a minimum, a collection of wires, but the bus also defines a protocol by which the CPU, memory and devices communicate.

One of the major role of the bus is to provide an interface to memory.

Bus Protocols

Bus protocol determines how devices communicate.

Devices on the bus go through sequences of states. Protocols are specified by state machines, one

state machine per actor in the protocol. May contain asynchronous logic behavior.

Four-cycle handshake

1. Device1 raises its o/p to signal an enquiry, which tells device2 that it should get ready to listen for data.

2. When device2 is ready to receive, it raises its o/p to signal an acknowledgement. At this point, device1 and 2 can transmit or receive.

3. Once the data transfer is complete, device2 lowers its o/p, signaling that it has received the data.

4. After seeing that ack has been released, device1 lowers its o/p.

Four-cycle handshake

device 1 device 2

enq

ack

device 1

device 2

1 2 3 4

Microprocessor busses

Clock provides synchronization to the bus components,

R/W’ is true when the bus is reading and false when the bus is writing,

Address is an a bit bundle of signals that transmits the address for an access,

Data is an n bit bundle of signals that can carry data to or from the CPU,

Data ready’ signals when the values on the data bundle are valid

A typical microprocessor bus

Timing diagrams

A timing diagram shows how the signals on a bus vary over time,

since values like the address and data can take on many values, some standard notation is used to describe signals.

A signal can go between 0/1 state and a stable/changing state.

To be sure that signals go to their proper values at the proper time, timing diagram sometimes show timing constraints.

Timing diagrams

Timing diagram for the example bus

Timing diagram shown with timing constraints for the example bus.

The diagram shows a read and a write. Timing constraints shown only for read operation,

but similar constraints applies to the write operation. The bus is normally in read mode, since that does not

change any state. During a read the external device or memory is

sending a value on the data lines, while during a write the CPU is controlling the data lines.

Bus read and write

Read Operation on timing diagram

A read or write is initiated by setting address enable high after the clock starts to rise. We set R/W’=1 to indicate a read and the address lines are set to the desired address.

One clock cycle later, the memory or device is expected to assert the data value at that address on the data lines. simultaneously, the external device specifies that the data are valid by pulling down the data ready’ line. This line is active low, meaning that a logically true value is indicated by a low voltage, in order to provide increased immunity to electrical noise.

The CPU is free to remove the address at the end of clock cycle and must do so before the beginning of the next cycle. The external device has a similar requirement for removing the data value from the data lines.

Bus wait state

Burst read

The handshake that tells the CPU and devices when data are to be transferred is formed by data ready for the acknowledge side, but Is implicit for the inquiry side.

The data ready signal allows the bus to be connected to devices that are slower than bus.

The cycle between the minimum time at which data can be asserted and when it is actually inserted are known as wait states.

In this burst read transaction the CPU sends one address but receives of data values.

Bus burst read

State diagrams for bus read

CPUdevice

Get data

Done

Adrs

Wait

See ack

Senddata

Release ack

Adrs

Wait

Ack

start

State diagram

The state machine view of the bus transaction is also helpful and useful complement to the timing diagram.

It shows the transition of control signal. And the CPU decides to perform a read

transaction, it moves to a new state, sending bus signals that cause the device to behave appropriately.

The device’s state transition graph captures it side of the protocol.

Bus multiplexing

CPU

adrs

device

data

adrs

data enable

Adrs enable

Bus multiplexing

Some buses use multiplexed address and data. Additional control lines are provided to tell

whether the value on the address/data lines is an address or data.

Typically, the address comes first followed by the data.

The address can be held in a register until the data arrive so that both can be presented to the device at the same time.

DMA

Direct memory access (DMA) performs data transfers without executing instructions. CPU sets up transfer. DMA engine fetches,

writes. DMA controller is a

separate unit.

Bus mastership

By default, CPU is bus master and initiates transfers.

DMA must become bus master to perform its work. CPU can’t use bus while DMA operates.

Bus mastership protocol: Bus request. Bus grant.

DMA operation CPU sets DMA registers for

start address, length. DMA status register

controls the unit. Once DMA is bus master, it

transfers automatically. May run continuously until

complete. May use every nth bus cycle.

Bus transfer sequence diagram

System bus configurations

Multiple busses allow parallelism: Slow devices on one

bus. Fast devices on

separate bus. A bridge connects two

busses.

CPU slow device

memory

high-speeddevice

brid

ge

slow device

Bridge state diagram

ARM AMBA bus Two varieties:

AHB is high-performance. APB is lower-speed, lower

cost. AHB supports pipelining,

burst transfers, split transactions, multiple bus masters.

All devices are slaves on APB.

Memory Devices

Several different types of memory: Read Only Memories

Flash. Read/Write Memories

DRAM. SRAM.

Each type of memory comes in varying: Capacities. Widths.

Memory Device Organization

4-Mbit memory may be 1M x 4-bit array – single memory access obtain 4-bit

data item, with maximum of 2^20 different addresses. 4M x 1-bit array – single memory access obtain 1-bit

data item, with maximum of 2^22 different addresses. The height width ratio of memory is known as its aspect

ratio. The data are stored in 2-D array of memory cells. n-bit (n = r + c) address

A row address A column address

Internal Organization of a Memory Devices

Random-access memory

Dynamic RAM is dense, requires refresh. Synchronous DRAM is dominant type. SDRAM uses clock to improve performance,

pipeline memory accesses. Static RAM is faster, less dense, consumes

more power.

Static RAM and its operation

CE is the chip enable input. It is active low. When CE=1 the SRAM’s data pins are disabled, and when CE=0, the data pins are enabled.

R/W controls whether the current operation is a read (R/W=1) or a write (R/W=0). Read and write are normally specified relative to the CPU, so read means reading from RAM and write means writing to RAM.

Adrs specifies the address for the read or write. Data is a bidirectional bundle of signals for data transfer.

When R/W=1, the pins are o/p, and when R/W=0, the data pins are input.

InterfaceTiming

diagram

A read operation on the SRAM occurs as follows:

CE is set to zero enabling the chip with R/W=1. An address is presented on the address lines. After some delay, data appear on the data lines. A write operation is similar: CE is set to zero. R/W is set to 0 for writing. An address is set on the address line and data is

set on the data lines.

InterfaceTiming

diagram

Timing diagram for Read

First, RAS is set to 0 and the row part of the address is set on the address lines.

Next, CAS is set to 0 and the column part of the address are put on the address lines.

Read-only memory

ROM may be programmed at factory. Flash is dominant form of field-

programmable ROM. Electrically erasable, must be block erased. Random access, but write/erase is much slower

than read. NOR flash is more flexible. NAND flash is more dense.

Flash memory

Non-volatile memory. Flash can be programmed in-circuit.

Random access for read. To write:

Erase a block to 1. Write bits to 0.

Flash writing

Write is much slower than read. 1.6 ms write, 70 ns read.

Blocks are large (approx. 1 Mb). Writing causes wear that eventually

destroys the device. Modern lifetime approx. 1 million writes.

Types of flash

NOR: Word-accessible read. Erase by blocks.

NAND: Read by pages (512-4K bytes). Erase by blocks.

NAND is cheaper, has faster erase, sequential access times.

I/O Devices

• I/O devices are commonly used in embedded computing systems.

• Some devices are often found as on-chip devices in microcontrollers.

• Other devices are interfaced externally.• We need to understand the requirements of

devices interfacing and its uses in programming.

Timers and counters

Very similar: a timer is incremented by a periodic signal; a counter is incremented by an asynchronous,

occasional signal. Rollover causes interrupt.

Watchdog timer

Watchdog timer is periodically reset by system timer.

If watchdog is not reset, it generates an interrupt to reset the host.

host CPU watchdogtimer

interrupt

reset

Digital-to-analog conversion

Use resistor tree:

R

2R

4R

8R

bn

bn-1

bn-2

bn-3

Vout

Flash A/D conversion

N-bit result requires 2n comparators:

encoder

Vin

...

Dual-slope conversion

Use counter to time required to charge/discharge capacitor.

Charging, then discharging eliminates non-linearities.

Vin

timer

Sample-and-hold

Samples data:

converterVin

Switch debouncing

A switch must be debounced to multiple contacts caused by eliminate mechanical bouncing:

Encoded keyboard

An array of switches is read by an encoder. N-key rollover remembers multiple key

depressions.

row

LED

Must use resistor to limit current:

7-segment LCD display

May use parallel or multiplexed input.

Types of high-resolution display

Liquid crystal display (LCD) is dominant form.

Plasma, OLED, etc. Frame buffer holds current display contents.

Written by processor. Read by video.

Touchscreen

Includes input and output device. Input device is a two-dimensional

voltmeter:

Touchscreen position sensing

ADC

voltage

Component Interfacing : Memory interfacing

Static RAM is simpler to interface to a bus than is DRAM, due to both the DRAM’s RAS/CAS multiplexing and the need for refresh.

The R/W on the bus can often be directly connected to the SRAM.

The main issue in interfacing SRAM is decoding the address. The chip enable pin is used in RAM’s to simplify the

interfacing of large memories. If the required number of memory words fits within the

height of an available memory, then the interface is simple: the CE signal is permanently wired to the ground so that the chip is always enabled.

DRAM interfacing

The bus address can be split in to row and column address with a small amount of logic-a register captures the address, a multiplexer selects the row or column portion of the address, and a state machine generates RAS and CAS.

The refresh signal can be generated with a counter and a state machine as shown.

The counter times the wait between successive refresh actions, the controller generates the required signal.

In idle state, the bus signals are passed through the DRAM to enable reads and writes.

When the counter roles over, the controller generates CAS and then RAS to induce the next refresh cycle.

Device interfacing

Some I/O devices are designed to interface directly to a particular bus, forming glueless interfaces.

But glue logic is required when a device is required when a device is connected to a bus for which it is not designed.

An I/O device typically requires a much smaller range of addresses than a memory, so addresses must be decoded much more finely.

Some additional logic is required to cause the bus to read and write the device’s register.

System architecture• An architecture is a set of elements and the relationships

between them that together form a single unit.• The architecture of an embedded computing system is

the blue-print for implementing that system.• The architecture of an embedded computing system

includes both hardware and software elements.Hardware:

Hardware architecture of an embedded system is more obvious manifestation that you can touch it and feel.

CPU: • There are many different architectures and even within

an architecture we can select between models that vary in clock speed, integrated peripherals and so on

• The choice of the CPU cannot be made considering the software that will execute on the machine.

System architectureBus:• In applications that make intensive use of the bus due to

I/O or other data traffic, the bus may be more of a limiting factor than the CPU.

• Attention must be paid to the required data bandwidths to be sure that the bus can handle the traffic.

Memory:• The ratio of ROM to RAM and selection of DRAM versus

SRAM can have a significant influence on the cost of the system.• The speed of memory will play a large part in determining

system performanceI/O devices: • networking, sensors, actuators, etc.How big/fast much each one be?

Software architecture

Functional description must be broken into pieces:

division among people; conceptual organization; performance; testability; maintenance.

Hardware and software architectures

Hardware and software are intimately related: software doesn’t run without hardware; how much hardware you need is

determined by the software requirements: speed; memory.

Evaluation boards

Designed by CPU manufacturer or others. Includes CPU, memory, some I/O devices. May include prototyping section. CPU manufacturer often gives out

evaluation board netlist---can be used as starting point for your custom board design.

Adding logic to a board

Programmable logic devices (PLDs) provide low/medium density logic.

Field-programmable gate arrays (FPGAs) provide more logic and multi-level logic.

Application-specific integrated circuits (ASICs) are manufactured for a single purpose.

The PC as a platform

Advantages: cheap and easy to get; rich and familiar software environment.

Disadvantages: requires a lot of hardware resources; not well-adapted to real-time.

Typical PC hardware platform

CPU

CPU bus

memory

DMAcontroller

timers

businterface

bus

inte

rfac

e

high-speed bus

low-speed bus

device

device

intrctrl

Typical PC hardware platform

The CPU provides basic computational facilities. RAM is used for program storage. ROM holds the boot program. A DMA controller provides DMA capabilities. Timers are used by the operating system for a variety of

purposes. A high speed bus connected to the CPU bus through a bridge,

allows fast devices to communicate efficiently with the rest of the system.

A low speed bus provides an inexpensive way to connect simpler devices and may be necessary for backward compatibility as well.

Typical busses

PCI: standard for high-speed interfacing 33 or 66 MHz. PCI Express.

USB (Universal Serial Bus), Firewire (IEEE 1394): relatively low-cost serial interface with high speed.

Software elements

IBM PC uses BIOS (Basic I/O System) to implement low-level functions: boot-up; minimal device drivers.

BIOS has become a generic term for the lowest-level system software.

Example: StrongARM

StrongARM system includes: CPU chip (3.686 MHz clock) system control module (32.768 kHz clock).

Real-time clock; operating system timer general-purpose I/O; interrupt controller; power manager controller; reset controller.

Strong ARM SA-1100

Peripheral devices of system control module:

A real time clock. An operating system timer. 28 general-purpose I/Os(GPIOs). An interrupt controller. A power manager controller. A reset controller that handles resetting the

processor.

Debugging embedded systems

Challenges: target system may be hard to observe; target may be hard to control; may be hard to generate realistic inputs; setup sequence may be complex.

Host/target design

Use a host system to prepare software for target system:

targetsystem

host systemserial line

Host/target design

• Load the programs into the target,• Start and stop program execution on

the target, and • Examine memory and CPU registers.

Host-based tools

Cross compiler: compiles code on host for target system.

Cross debugger: displays target state, allows target system to be

controlled.

Software debuggers

A monitor program residing on the target provides basic debugger functions.

Debugger should have a minimal footprint in memory.

User program must be careful not to destroy debugger program, but , should be able to recover from some damage caused by user code.

Breakpoints

A breakpoint allows the user to stop execution, examine system state, and change state.

Replace the breakpointed instruction with a subroutine call to the monitor program.

ARM breakpoints

0x400 MUL r4,r6,r60x404 ADD r2,r2,r40x408 ADD r0,r0,#10x40c B loop

uninstrumented code

0x400 MUL r4,r6,r60x404 ADD r2,r2,r40x408 ADD r0,r0,#10x40c BL bkpoint

code with breakpoint

Breakpoint handler actions

Save registers. Allow user to examine machine. Before returning, restore system state.

Safest way to execute the instruction is to replace it and execute in place.

Put another breakpoint after the replaced breakpoint to allow restoring the original breakpoint.

In-circuit emulators

A microprocessor in-circuit emulator is a specially-instrumented microprocessor.

Allows you to stop execution, examine CPU state, modify registers.

Logic analyzers

A logic analyzer is an array of low-grade oscilloscopes:

Logic analyzer architecture

UUTsample

memorymicroprocessor

controller

system clock

clockgen

state ortiming mode

vectoraddress

displaykeypad

System Data Samples

Hardware/software co-verification

An instruction level simulation may be used to debug code running on the CPU.

A cycle-level simulation tool may be used for faster simulation of parts of the system.

A hardware/software co-simulator may be used to simulate various parts of the system at different level of detail.

Bus-Based Computer Systems

Designing with microprocessors. Development and debugging. System-level performance analysis. Example: alarm clock

Design Example : Alarm clock

Alarm on Alarm off

Alarmready

settime

setalarm

hour minute

light

button

PM

Operations

Set time: hold set time, depress hour, minute.

Set alarm time: hold set alarm, depress hour, minute.

Turn alarm on/off: depress alarm on/off.

Alarm clock requirementsname alarm clockpurpose 24-hour digital clock with one alarminputs set time, set alarm, hour, minute, alarm on/offoutputs four-digit display, PM indicator, alarm ready, buzzerfunctions keep time, set time, set alarm, turn alarm on/off,

activate buzzer by alarmperformance hours and digits, no seconds; not high precisionmanufacturingcost

consumer product

power ACphysicalsize/weight

fits on stand

Alarm clock class diagram

Lights* Display Mechanism

Buttons*

Speaker*

1 1 1 1

1

1

1

1

Alarm clock physical classes

Lights*

digit-val()digit-scan()alarm-on-light()PM-light()

Buttons*

set-time(): booleanset-alarm(): booleanalarm-on(): booleanalarm-off(): booleanminute(): booleanhour(): boolean

Speaker*

buzz()

Display class

Display

time[4]: integeralarm-indicator: booleanPM-indicator: boolean

set-time()alarm-light-on()alarm-light-off()PM-light-on()PM-light-off()

Mechanism classMechanism

Seconds: integerPM: booleantens-hours, ones-hours: booleantens-minutes, ones-minutes: booleanalarm-ready: booleanalarm-tens-hours, alarm-ones-hours: booleanalarm-tens-minutes, alarm-ones-minutes: boolean

scan-keyboard()update-time()

Update-time behavior

update secondswith rollover

update hh:mmwith rollover

Rollover?

T

F

PM=true PM=false

AM->PMPM->AM

display.set-time(current time)

Time >= alarm and alarm-on?

alarm.buzzer(true)

T

F

Scan-keyboard behavior

compute button activations

alarm-ready=true

alarm-ready=false

alarm.buzzer(false)

Increment timetens w. rollover

and AM/PM

Increment timeones w. rollover

and AM/PM

save buttonstates

Alarm-on

Alarm-off

Set-time andnot set-alarmand hours

Set-time andnot set-alarmand minutes

System architecture

The system has both periodic and aperiodic components-the current time must obviously be updated periodically, and the button commands occur occasionally

It seems reasonable to have the following two major software components: An interrupt driven routine can update the current time. The

current time will be kept in a variable in memory. A timer can be used to interrupt periodically and update the time.

A foreground program can poll the buttons and execute their commands. Since buttons are changed at a relatively slow rate, it makes no sense to add the hardware required to connect the buttons to interrupts.

System architecture

The foreground code will be implemented as a while loop:

While (TRUE){Read buttons(button values);/*read inputs*/Process command(button values);/*do

commands*/Check alarm();/*decide whether to turn on

the alarm*/ }

System architecture

• The loop first reads the button using first command

• The buttons will remain depressed for many sample periods since the sample rate is much faster than any person can push and release buttons.

• We want to make sure that clock responds to this as a single depression of the button, not one depression per sample interval.

Testing

Component testing: test interrupt code on the platform; can test foreground program using a mock-up.

System testing: relatively few components to integrate; check clock accuracy; check recognition of buttons, buzzer, etc.

Preprocessing button inputs

As shown in the figure this can be done using a simple edge detection

Chapter 3

Documents

Transcript of Chapter 3