Embedded Computing Processors CSE 237D: Winter 2010 Topic #6 Ryan Kastner.

86
Embedded Computing Processors CSE 237D: Winter 2010 Topic #6 Ryan Kastner

Transcript of Embedded Computing Processors CSE 237D: Winter 2010 Topic #6 Ryan Kastner.

Page 1: Embedded Computing Processors CSE 237D: Winter 2010 Topic #6 Ryan Kastner.

Embedded Computing Processors

CSE 237D: Winter 2010Topic #6

Ryan Kastner

Page 2: Embedded Computing Processors CSE 237D: Winter 2010 Topic #6 Ryan Kastner.

What kind of embedded processor?

What are our options for processors in embedded systems?

What performance metrics are we worried about?

Page 3: Embedded Computing Processors CSE 237D: Winter 2010 Topic #6 Ryan Kastner.

“Traditional” Software Embedded Systems = CPU + RTOS

Slide courtesy of Mani Srivastava

Page 4: Embedded Computing Processors CSE 237D: Winter 2010 Topic #6 Ryan Kastner.

“Traditional” Hardware Embedded Systems = ASIC

A direct sequence spread spectrum (DSSS) receiver ASIC

ASIC FeaturesArea: 4.6 mm x 5.1 mm

Speed: 20 MHz @ 10 Mcps

Technology: HP 0.5 m

Power: 16 mW - 120 mW (mode dependent) @ 20 MHz, 3.3 V

Avg. Acquisition Time: 10 s to 300 s

Source: Mani Srivastava

Page 5: Embedded Computing Processors CSE 237D: Winter 2010 Topic #6 Ryan Kastner.

A spectrum of options now

Microcontroller Microprocessor ASIP

DSPGraphics ProcessorNetwork ProcessorCryptoprocessor…

FPGA ASIC

Page 6: Embedded Computing Processors CSE 237D: Winter 2010 Topic #6 Ryan Kastner.

Microcontrollers Overview

A microcontroller (uC) is a small, lightweight CPU which is usually combined with on-board memory and peripherals Compact and low power (relatively)

Often used as a simple hardware to software interface as well as for in-situ processing Analog to digital gateway Allows for real-time feedback based on data

Microcontroller(uC)

sensor

sensor

sensorA

nalo

g to

Dig

ital

Dig

ital to

An

alo

g

actuator

indicator

Page 7: Embedded Computing Processors CSE 237D: Winter 2010 Topic #6 Ryan Kastner.

Microcontroller Features

Processor speed: Fundamental measure of processing rate of deviceValue of interest is in MIPS, not MHz

Supply voltage/current: Measure of the amount of power required to run the deviceMultiple modes (sleep, drowsy, idle, etc)

It is possible to adjust the voltage and frequency of some devices in real time, thereby trading off speed and power usage

Page 8: Embedded Computing Processors CSE 237D: Winter 2010 Topic #6 Ryan Kastner.

Microcontroller Features Internal memory: Sometimes divided

between program and data memory, the amount of information that can be stored on board Can be supplemented with external memory

I/O Pins: Individual points for communication between the uC and the rest of the world Can be digital or analog, general or special purpose

Interrupts: Non-linear program flow based on event triggers from peripheral or pins

CPU ROM RAM

I/O

Subsystems:Timers, Counters, AnalogInterfaces, I/O interfaces

Memory

Page 9: Embedded Computing Processors CSE 237D: Winter 2010 Topic #6 Ryan Kastner.

Microcontroller Peripherals Timers: Internal registers (any size) in the uC that increment at the

clock rate

Voltage Comparators: Input that effectively functions as a 1-bit ADC with an adjustable threshold

ADC: Most ADCs used in sensor data collection are integrated with uC

DAC: Digital to analog converters are also included in some data collection driven uC Mostly used for feedback and control

Page 10: Embedded Computing Processors CSE 237D: Winter 2010 Topic #6 Ryan Kastner.

Microcontrollers Communication UART: Basic hardware module which mediates serial

communication (RS232) Simplest form of communication but limited by speed Most modules are full duplex

USB: High bandwidth serial communication between uC and a computer or an embedded host Usually requires chips with specialized hardware and firmware Host side issues

I2C: Half duplex master-slave 2-wire protocol for data transfer kbit transfer rates Tx/Rx based on slave addressing Can invert protocol with sensors as masters

RF: Radio frequency (>100 MHz) EM transmission of data Built in to some newer special-purpose uC Wireless spherical transmission

Page 11: Embedded Computing Processors CSE 237D: Winter 2010 Topic #6 Ryan Kastner.

8051 Architecture

Page 12: Embedded Computing Processors CSE 237D: Winter 2010 Topic #6 Ryan Kastner.

PIC Architecture

Page 13: Embedded Computing Processors CSE 237D: Winter 2010 Topic #6 Ryan Kastner.

AVR

8-bit RISC series of microcontroller chips Large range of available devices covering many interfaces,

speeds, memory sizes, and package sizes Large hobbyist development community with many available

free tool chains and sample applications General specs

One MIPS per MHz Models available up to 20MHz Max 128K program space / 8K RAM ADC/LCD Driver/Motor Control UART/CAN/USB/I2C/SPI/DAC/LCD/PWM/Comparators

http://www.atmel.com/products/product_selector.asp

Page 14: Embedded Computing Processors CSE 237D: Winter 2010 Topic #6 Ryan Kastner.

TI MSP430

Proprietary TI low-power low-cost RISC chipsWell supported by TI with good program chainDesigned for intermittent sampling and fast startup

General specsVery low power (flexible)Max 32KHz / 8 MIPSMax 50K program space / 10K RAMMax 16 bit ADCUART/SPI/DAC/LCD/PWM/Comparators

http://www.msp430.com

Page 15: Embedded Computing Processors CSE 237D: Winter 2010 Topic #6 Ryan Kastner.

Atmel ARM7

32-bit ARM microcontrollerLow power (for 32-bit machines)Can run in 16-bit mode if needed

General specsLots of memory (8-64KB RAM, 32-256KB flash)Variable speed up to 55MHzPacked with peripherals (USB, ADC, SPI, etc.)Common in systems that require more processing

http://www.at91.com/

Page 16: Embedded Computing Processors CSE 237D: Winter 2010 Topic #6 Ryan Kastner.

Many Types of Programmable Processors

Past Microprocessor Microcontroller DSP Graphics

Processor

Now / Future Network Processor Sensor Processor Cryptoprocessor Game Processor Wearable Processor Mobile Processor

Source: Mani Srivastava

Page 17: Embedded Computing Processors CSE 237D: Winter 2010 Topic #6 Ryan Kastner.

From Processor to ASIP

Source

RF0

FU0

Result

Decoder

Con

trol

Temporal bottleneck:Limited functionality

Spatial bottleneck:not enough bandwidth

Source: Tensilica

Page 18: Embedded Computing Processors CSE 237D: Winter 2010 Topic #6 Ryan Kastner.

Add Custom Functional Units

Source routing

RF0

FU0 FU1 FU2 FU3

Result routing

Decoder

Con

trol

FSM StorageFSM Storage

Source: Tensilica

Page 19: Embedded Computing Processors CSE 237D: Winter 2010 Topic #6 Ryan Kastner.

Customize Memory

Source routing

RF0 RF1 RF2 S1S0

FU0 FU1 FU2 FU3

Result routing

Decoder

Con

trol

FSM StorageFSM Storage

Source: Tensilica

Page 20: Embedded Computing Processors CSE 237D: Winter 2010 Topic #6 Ryan Kastner.

Multicycle Instructions

Source routing

RF0 RF1 RF2 S1S0

FU0 FU1 FU2 FU3

Result routing

Decoder

Con

trol

FSM StorageFSM Storage

Source: Tensilica

Page 21: Embedded Computing Processors CSE 237D: Winter 2010 Topic #6 Ryan Kastner.

Optional & ConfigurableOptional FunctionConfigurable FunctionBase ISA Feature

Advanced Designer Defined Coprocessors

Interrupt Control

Data Address Watch 0 to n

Instruction Address Watch 0 to n

TRACE Port

JTAG Tap Control

On Chip Debug

Exception SupportDesigner-Defined

ExecutionUnits

Pro

cess

or

Co

ntr

ols

Align and Decode

ALU

Register File

FPU

Vectra DSP

MAC 16

MUL 16

MUL 32Timers 0 to n

Designer-Defined

Register Files

External Interface

Xtensa ProcessorInterface

(PIF)

WriteBuffer

(1 to 32 entries)

Instruction Fetch / PC

Unit

MMU

Instruction ROMInstruction ROM

InstructionCache

Instruction RAMInstruction RAM

Data Load / Store

Unit

DTLTLB

MMUDTLTLBTLB

Data ROMData ROM

Data RAMData RAM

DataCache

MMUDTLB

MMUITLBITLBITLB

Tensilica Xtensa Processor Options

Source: Tensilica

Page 22: Embedded Computing Processors CSE 237D: Winter 2010 Topic #6 Ryan Kastner.

Select processor

options (FU, $, Registers, etc)

ALU

Pipe

I/O

Timer

MMURegister File

Cache

Tailored,synthesizable HDL uP core

Customized Compiler, Assembler, Linker, Debugger,Simulator

Describe new

instructions

Use automated processor generator,

create custom processor

ASIP Design Flow

Source: Tensilica

Page 23: Embedded Computing Processors CSE 237D: Winter 2010 Topic #6 Ryan Kastner.

Architectural Design Space

Approaches to Parallel Processing Processing Element (PE) level Instruction-levelBit-level

Elements of Special Purpose Hardware Structure of Memory Architectures Types of On-Chip Communication Mechanisms Use of Peripherals

Page 24: Embedded Computing Processors CSE 237D: Winter 2010 Topic #6 Ryan Kastner.

Typical Network Processor Architecture

SDRAM(Packet buffer)

SRAM(Routing table)

multi-threaded processing elements

Co-processor

Input ports Output ports

Network Processor

Bus Bus

Page 25: Embedded Computing Processors CSE 237D: Winter 2010 Topic #6 Ryan Kastner.

Intel IXP1200 Network Processor

° StrongARM processing core

° Microengines introduce new ISA

° I/O• PCI

• SDRAM

• SRAM

• IX : PCI-like packet bus

° On chip FIFOs• 16 entry 64B each

Page 26: Embedded Computing Processors CSE 237D: Winter 2010 Topic #6 Ryan Kastner.

Intel IXP1200 Microengine 4 hardware contexts

Single issue processor Explicit optional context switch on

SRAM access Registers

All are single ported Separate GPR 256*6 = 1536 registers total

32-bit ALU Can access GPR or XFER registers

Shared hash unit 1/2/3 values – 48b/64b For IP routing hashing

Standard 5 stage pipeline 4KB SRAM instruction store – not a

cache! Barrel shifter

Page 27: Embedded Computing Processors CSE 237D: Winter 2010 Topic #6 Ryan Kastner.

IBM PowerNP 16 pico-processors and 1

PowerPC Each pico-processor

support 2 hardware threads 3 stage pipeline :

fetch/decode/execute Dyadic Processing Unit

Two pico-processors 2KB Shared memory Tree search engine

Focus is Network layers 2-4 PowerPC 405 for control plane

operations 16K I and D caches

Target is OC-48

Page 28: Embedded Computing Processors CSE 237D: Winter 2010 Topic #6 Ryan Kastner.

Cisco 10000

Almost all data plane operations execute on the programmable XMC Pipeline stages are assigned tasks – e.g. classification, routing,

firewall, MPLS Classic SW load balancing problem

External SDRAM shared by common pipe stages

Page 29: Embedded Computing Processors CSE 237D: Winter 2010 Topic #6 Ryan Kastner.

Processors with instruction-sets tailored to specific applications or application domains

Instruction-set generation as part of synthesis Customized processor options

Pluses: Customization yields lower area, power etc.

Minuses: higher h/w & s/w development overhead

– design, compilers, debuggers– higher time to market

Source: Mani Srivastava

Summary: ASIPs

Page 30: Embedded Computing Processors CSE 237D: Winter 2010 Topic #6 Ryan Kastner.

What is this?

90nm 9-layer Interconnect (from Altera FPGA) Source: Altera

Page 31: Embedded Computing Processors CSE 237D: Winter 2010 Topic #6 Ryan Kastner.

What is this?

PolySpacer Spacer

Contact

Diffusion

Isola

tion

Isola

tion

Dielectric

Salicide

90nm Transistor (from Altera FPGA)

Source: Altera

Page 32: Embedded Computing Processors CSE 237D: Winter 2010 Topic #6 Ryan Kastner.

FPGA

Page 33: Embedded Computing Processors CSE 237D: Winter 2010 Topic #6 Ryan Kastner.

FPGA

CLB

Switchbox RoutingChannel

IOBRouting

Channel

ConfigurationBit

Page 34: Embedded Computing Processors CSE 237D: Winter 2010 Topic #6 Ryan Kastner.

Programmable Logic

Each logic element outputs one data bitInterconnect programmable between elementsInterconnect tracks grouped into channels

LE LE

LE LE

LE LE LE LE

LE LE

LE LE

Logic Element Tracks

Page 35: Embedded Computing Processors CSE 237D: Winter 2010 Topic #6 Ryan Kastner.

Lookup Table (LUT)

Program configuration bits for required functionality

Computes “any” 2-input function

In Out00 001 010 011 1

2-LUT

Configuration Bit 0

Configuration Bit 1

Configuration Bit 2

Configuration Bit 3

A B

C

A

BC=A

B

Page 36: Embedded Computing Processors CSE 237D: Winter 2010 Topic #6 Ryan Kastner.

Lookup Table (LUT)

K-LUT -- K input lookup table Any function of K inputs by programming table Load bits into table

2N bits to describe functions=> different functions

N22

Page 37: Embedded Computing Processors CSE 237D: Winter 2010 Topic #6 Ryan Kastner.

K-LUT (typical k=4) w/ optional output Flip-Flop

Lookup Table (LUT)

Page 38: Embedded Computing Processors CSE 237D: Winter 2010 Topic #6 Ryan Kastner.

Lookup Table (LUT)

Single configuration bit for each:LUT bit Interconnect point/optionFlip-flop select

Page 39: Embedded Computing Processors CSE 237D: Winter 2010 Topic #6 Ryan Kastner.

Configurable Logic Block (CLB)

Page 40: Embedded Computing Processors CSE 237D: Winter 2010 Topic #6 Ryan Kastner.

Programmable Interconnect

Interconnect architecture Fast local interconnect Horizontal and vertical lines of various lengths

C LB

C LB

C LB

C LB

C LB

C LB

C LB

C LB

C LB

C LB

C LB

C LB

SwitchMatrix

Switch Matrix

Page 41: Embedded Computing Processors CSE 237D: Winter 2010 Topic #6 Ryan Kastner.

Switchbox Operation

6 pass transistors per switchbox interconnect point

Pass transistors act as programmable switches

Pass transistor gates are driven by configuration memory cells

After ProgrammingBefore Programming

Page 42: Embedded Computing Processors CSE 237D: Winter 2010 Topic #6 Ryan Kastner.

Programmable Interconnect

Page 43: Embedded Computing Processors CSE 237D: Winter 2010 Topic #6 Ryan Kastner.

Programmable Interconnect

25

Page 44: Embedded Computing Processors CSE 237D: Winter 2010 Topic #6 Ryan Kastner.

Embedded Functional Units

Fixed, fast multipliers MAC, Shifters, counters Hard/soft processor cores

PowerPC Nios Microblaze

Memory Block RAM Various sizes and

distributions

CLB Block RAM IP Core (Multiplier)

Page 45: Embedded Computing Processors CSE 237D: Winter 2010 Topic #6 Ryan Kastner.

Embedded RAM

Xilinx – Block SelectRAM18Kb dual-port RAM arranged in columns

Altera – TriMatrix Dual-Port RAMM512 – 512 x 1M4K – 4096 x 1M-RAM – 64K x 8

Page 46: Embedded Computing Processors CSE 237D: Winter 2010 Topic #6 Ryan Kastner.

Xilinx Virtex-II Pro

1 to 4 PowerPCs 4 to 16 multi-gigabit

transceivers 12 to 216 multipliers 3,000 to 50,000 logic cells 200k to 4M bits RAM 204 to 852 I/Os

Log

ic

cells

Up to 16 serial transceivers• 622 Mbps to 3.125 Gbps622 Mbps to 3.125 Gbps

Pow

erP

Cs

Page 47: Embedded Computing Processors CSE 237D: Winter 2010 Topic #6 Ryan Kastner.

Altera Stratix

Page 48: Embedded Computing Processors CSE 237D: Winter 2010 Topic #6 Ryan Kastner.

FPGA Architectures

FPGA-based reconfigurable devices Configurable logic blocks

Flexible logic block Programmable

interconnect Dedicated multipliers Embedded configurable block

RAM RISC microprocessor cores

Other architectures Reconfigurable multi-core

processor Coarse-grained reconfigurable

architectures

Page 49: Embedded Computing Processors CSE 237D: Winter 2010 Topic #6 Ryan Kastner.

Application Specific Integrated Circuits (ASICs)

Full Custom ASICs Every transistor is designed and drawn by hand Typically only way to design analog portions of

ASICs Gives the highest performance but the longest

design time Full set of masks required for fabrication

Source: Paul D. Franzon

Page 50: Embedded Computing Processors CSE 237D: Winter 2010 Topic #6 Ryan Kastner.

Standard-Cell-Based ASICs or ‘Cell Based IC’ (CBIC) or ‘semi-custom’ Standard Cells are custom designed and then inserted into

a library These cells are then used in the design by being placed in

rows and wired together using ‘place and route’ CAD tools

Some standard cells, such as RAM and ROM cells, and some datapath cells (e.g. a multiplier) are tiled together to create macrocells

Application Specific Integrated Circuits (ASICs)

NOR gate:D-flip-flop:

Source: Paul D. Franzon

Page 51: Embedded Computing Processors CSE 237D: Winter 2010 Topic #6 Ryan Kastner.

Standard Cells

Cell boundary

N WellCell height 12 metal tracksMetal track is approx. 3 + 3Pitch = repetitive distance between objects

Cell height is “12 pitch”

2

Rails ~10

InOut

VDD

GND

© Digital Integrated Circuits2nd

Page 52: Embedded Computing Processors CSE 237D: Winter 2010 Topic #6 Ryan Kastner.

Standard Cells

A

Out

VDD

GND

B

2-input NAND gate

B

VDD

A

© Digital Integrated Circuits2nd

Page 53: Embedded Computing Processors CSE 237D: Winter 2010 Topic #6 Ryan Kastner.

Standard Cell Layout Methodology – 1980s

signals

Routingchannel

VDD

GND

© Digital Integrated Circuits2nd

Page 54: Embedded Computing Processors CSE 237D: Winter 2010 Topic #6 Ryan Kastner.

Standard Cell Layout Methodology – 1990s

M2

No Routingchannels

VDD

GNDM3

VDD

GND

Mirrored Cell

Mirrored Cell

© Digital Integrated Circuits2nd

Page 55: Embedded Computing Processors CSE 237D: Winter 2010 Topic #6 Ryan Kastner.

Standard Cell Layouts

Page 56: Embedded Computing Processors CSE 237D: Winter 2010 Topic #6 Ryan Kastner.

ASIC Design Flow

Most ASICs are designed using a RTL/Synthesis basedmethodologyDesign details captured in a simulatable description of the hardware•Captured as Register Transfer Language (RTL)•Simulations done to verify design

Source: Paul D. Franzon

Page 57: Embedded Computing Processors CSE 237D: Winter 2010 Topic #6 Ryan Kastner.

ASIC Design Flow

Automatic synthesis is used to turn the RTL into a gate-level description•ie. AND, OR gates, etc.•Chip-test features are usually inserted at this point

Gate level design verified for correctnessOutput of synthesis is a “net-list”•i.e. List of logic gates and their implied connectionsNOR2 U36 ( .Y(n107), .A0(n109), .A1(\value[2] ) );NAND2 U37 ( .Y(n109), .A0(n105), .A1(n103) );NAND2 U38 ( .Y(n114), .A0(\value[1] ), .A1(\value[0] ) );NOR2 U39 ( .Y(n115), .A0(\value[3] ), .A1(\value[2] ) );

Source: Paul D. Franzon

Page 58: Embedded Computing Processors CSE 237D: Winter 2010 Topic #6 Ryan Kastner.

ASIC Design Flow

Physical Design tools used to turn the gate-level design into a set of chip masks (for photolithography) or a configuration file for downloading to an FPGAFloorplanning•Positioning of major functions

Placement•Gates arranged in rows

Page 59: Embedded Computing Processors CSE 237D: Winter 2010 Topic #6 Ryan Kastner.

ASIC Design Flow

Clock and buffer Insertion•Distribute clocks to cells and locate buffers for use as amplifiers in long wires

Routing•Logic Cells wired together

Page 60: Embedded Computing Processors CSE 237D: Winter 2010 Topic #6 Ryan Kastner.

Semiconductor Roadmap

Projections for ‘leading edge’ ASIC: (www.itrs.net)

Page 61: Embedded Computing Processors CSE 237D: Winter 2010 Topic #6 Ryan Kastner.

Std Cell ASIC Development Cost Trend

Tota

l D

eve

lop

men

t C

ost

s ($

M)

Note: Conservative estimate; does not include re-spins.

0

5

10

15

20

25

30

35

40

45

0.18 µm0.15 µm0.13 µm 90 nm 65 nm 45 nm

Masks & Wafers Test & Product EngineeringSoftware Design/Verification & Layout

Page 62: Embedded Computing Processors CSE 237D: Winter 2010 Topic #6 Ryan Kastner.

Result: Declining ASIC Starts

Result: Declining ASIC Starts

Source: Dataquest/Gartner

Standard Cell/Gate Arrays

0

2000

4000

6000

8000

10000

12000

1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007

Desi

gn

Sta

rts

Page 63: Embedded Computing Processors CSE 237D: Winter 2010 Topic #6 Ryan Kastner.

FPGA vs Standard Cell

63

Parameter FPGA Standard Cell

CAD tool Cost $2000 $Millions

Mask Cost 0 $1.4M US @ 90 nm

Bug Fix 1 hour ~10 weeks

Electrical & Optical Check & Debug Vendor’s Problem Your Problem!

Time to Market Fast Slow

Die Size 2X to 20X 1X

Volume Cost 1X to 20X 1X

Speed 0.3X to 0.6X 1X

Power 2X to 5X 1XSource: Altera

Page 64: Embedded Computing Processors CSE 237D: Winter 2010 Topic #6 Ryan Kastner.

Efficiency vs. Development Cost

Low

High

Processor DSP FPGA Struct.ASIC

Std. Cell FullCustom

Power & System Cost*Development Difficulty & Cost

*For applications with significant parallelism

Source: Altera

Page 65: Embedded Computing Processors CSE 237D: Winter 2010 Topic #6 Ryan Kastner.

Many Implementation Choices

Microprocessors/controllers ASIP

DSP Graphics Network processors Crypto

FPGA ASIC

Speed Power Cost

High Low Volume

Page 66: Embedded Computing Processors CSE 237D: Winter 2010 Topic #6 Ryan Kastner.

Embedded System Design

CAD tools take care of hardware fairly wellAlthough a productivity gap emerging

But, software is a different story…HLLs such as C help, but can’t cope with complexity and

performance constraints

Holy Grail for Tools People: H/W-like synthesis & verification from a behavior description of the whole system at a high

level of abstraction using formal computation models

Source: Mani Srivastava

Page 67: Embedded Computing Processors CSE 237D: Winter 2010 Topic #6 Ryan Kastner.

Productivity Gap in Hardware Design

A growing gap between design complexity and design productivity

Source: Alberto Sangiovanni-Vincentelli

Page 68: Embedded Computing Processors CSE 237D: Winter 2010 Topic #6 Ryan Kastner.

Situation Worse in S/W

0

5

10

15

20

25

30

35

40

45

1980 1982 1984 1986 1988 1990 1992 1994

Hardware

Software

DoD Embedded System Costs

Bill

ion

$/Y

ear

Source: Mani Srivastava

Page 69: Embedded Computing Processors CSE 237D: Winter 2010 Topic #6 Ryan Kastner.

Embedded System Design from a Design Technology Perspective

Intertwined subtasks Specification/modeling H/W & S/W partitioning Scheduling & resource allocations H/W & S/W implementation Verification & debugging

Crucial is the co-design and joint optimization of hardware and software

Processor

Analog I/O

Memory

ASIC

DSPCode

Source: Mani Srivastava

Page 70: Embedded Computing Processors CSE 237D: Winter 2010 Topic #6 Ryan Kastner.

On-going Paradigm Shift in Embedded System Design

Change in business model due to SoCs Currently many IC companies

have a chance to sell devices for a single board

In future, a single vendor will create a System-on-Chip

But, how will it have knowledge of all the domains?

Component-based design Components encapsulate the

intellectual property Platforms

Integrated HW/SW/IP Application focus Rapid low-cost customization

Source: Mani Srivastava

Page 71: Embedded Computing Processors CSE 237D: Winter 2010 Topic #6 Ryan Kastner.

Complexity and Heterogeneity

Heterogeneity within H/W & S/W parts as well S/W: control oriented, DSP oriented H/W: ASICs, COTS ICs

controller

control panel

Real-timeOS

controllerprocesses

UIprocesses

ASIC

ProgrammableDSP

ProgrammableDSP

DSPAssembly

Code

DSPAssembly

Code

Dual-portedRAM

CODEC

Source: Mani Srivastava

Page 72: Embedded Computing Processors CSE 237D: Winter 2010 Topic #6 Ryan Kastner.

Handling Heterogeneity

Source: Edward Lee

Page 73: Embedded Computing Processors CSE 237D: Winter 2010 Topic #6 Ryan Kastner.

IP-based Design

Source: Mani Srivastava

Page 74: Embedded Computing Processors CSE 237D: Winter 2010 Topic #6 Ryan Kastner.

Map from Behavior to Architecture

Source: Mani Srivastava

Page 75: Embedded Computing Processors CSE 237D: Winter 2010 Topic #6 Ryan Kastner.

Behavior Vs. Architecture

SystemSystemBehaviorBehavior

SystemSystemArchitectureArchitecture

MappingMapping

Flow To ImplementationFlow To Implementation

CommunicationRefinement

BehaviorBehaviorSimulationSimulation

PerformancePerformanceSimulationSimulation

1

3

4

2

Models of Computatio

n

Performance models: Emb. SW, comm. and

comp. resources

HW/SW partitioning,Scheduling

SynthesisSW estimation

Source Alberto Sangiovanni-Vincentelli

Page 76: Embedded Computing Processors CSE 237D: Winter 2010 Topic #6 Ryan Kastner.

Hardware vs. Software Modules

Hardware = functionality implemented via a custom architecture (e.g. datapath + FSM)

Software = functionality implemented in software on a programmable processor

Key differences: Multiplexing

software modules multiplexed with others on a processor e.g. using an OS

hardware modules are typically mapped individually on dedicated hardware

Concurrency processors usually have one “thread of control” dedicated hardware often has concurrent datapaths

Source: Mani Srivastava

Page 77: Embedded Computing Processors CSE 237D: Winter 2010 Topic #6 Ryan Kastner.

Hardware-Software Architecture

A significant part of the problem is deciding which parts should be in software on programmable processors, and which in specialized hardware

Today:Ad hoc approaches based on earlier experience

with similar products, & on manual designHW-SW partitioning decided at the beginning, and

then designs proceed separately

Source: Mani Srivastava

Page 78: Embedded Computing Processors CSE 237D: Winter 2010 Topic #6 Ryan Kastner.
Page 79: Embedded Computing Processors CSE 237D: Winter 2010 Topic #6 Ryan Kastner.

Extra Slides

Page 80: Embedded Computing Processors CSE 237D: Winter 2010 Topic #6 Ryan Kastner.

Industrial Structure Shift (from Sony)

Source: Mani Srivastava

Page 81: Embedded Computing Processors CSE 237D: Winter 2010 Topic #6 Ryan Kastner.

Where are the CPUs?

Estimated 98% of 8 Billion CPUs produced in 2000 used for embedded apps

Where Has CS Focused?Where Has CS Focused?Where Has CS Focused?Where Has CS Focused?

InteractiveInteractiveComputersComputersInteractiveInteractiveComputersComputers

Servers,Servers,etc.etc.

Servers,Servers,etc.etc.

200M200Mper Yearper Year

200M200Mper Yearper Year

In VehiclesIn VehiclesEmbeddedEmbeddedIn RobotsIn Robots

Where Are the Processors?Where Are the Processors?Where Are the Processors?Where Are the Processors?

Look for the CPUs…the Opportunities Will Follow!Look for the CPUs…the Opportunities Will Follow!Look for the CPUs…the Opportunities Will Follow!Look for the CPUs…the Opportunities Will Follow!

Embedded ComputersEmbedded Computers80%80%

Embedded ComputersEmbedded Computers80%80%

8.5B Parts 8.5B Parts per Yearper Year

8.5B Parts 8.5B Parts per Yearper Year

RobotsRobots6%6%

VehiclesVehicles12%12%

DirectDirect2%2%

Source: DARPA/Intel (Tennenhouse)Source: DARPA/Intel (Tennenhouse)

Page 82: Embedded Computing Processors CSE 237D: Winter 2010 Topic #6 Ryan Kastner.

PIC Data Sheet

Page 83: Embedded Computing Processors CSE 237D: Winter 2010 Topic #6 Ryan Kastner.

Example: Video Processor

TM-xxxxTM-xxxx D$

D$

I$I$

TriMedia CPU

DEVICE I/P BLOCKDEVICE I/P BLOCK

DEVICE I/P BLOCKDEVICE I/P BLOCK

DEVICE I/P BLOCKDEVICE I/P BLOCK

. . .

DVP System Silicon

VLIW Media Processor:• 100 to 300+ MHz• 32-bit or 64-bit

NexperiaSystem Busses• PI bus• Memory bus• 32-128 bit

PI

BU

S

SDRAMSDRAM

MMIMMI

DV

P M

EM

OR

Y

BU

S

DEVICE I/P BLOCKDEVICE I/P BLOCK

PRxxxxPRxxxxD$

D$

I$I$

MIPS CPU

DEVICE I/P BLOCKDEVICE I/P BLOCK. . .

DEVICE I/P BLOCKDEVICE I/P BLOCK

PI

BU

S

General Purpose RISC Processor• 50 to 300+ MHz• 32-bit or 64-bitLibrary of Device Blocks• Imagecoprocessors

• DSPs• UART• 1394• USB

•…and more

TriMediaTM

MIPSTM

Flexible architecture for digital video applications

Philips Nexperia:

Page 84: Embedded Computing Processors CSE 237D: Winter 2010 Topic #6 Ryan Kastner.

Increasingly on the Same Chip: System on a Chip (SOC)

Source: Mani Srivastava

Page 85: Embedded Computing Processors CSE 237D: Winter 2010 Topic #6 Ryan Kastner.

Reconfigurable SoC

Triscend’s A7 CSoC

Other Examples

Atmel’s FPSLIC(AVR + FPGA)

Altera’s Nios(configurable

RISC on a PLD)

Source: Mani Srivastava

Page 86: Embedded Computing Processors CSE 237D: Winter 2010 Topic #6 Ryan Kastner.

Reconfigurable HardwareMain Entry: re-Function: prefix1 : again : anew <retell>2 : back : backward <recall>

Main Entry: con·fig·urePronunciation: k&n-'fi-gy&rFunction: transitive verb: to set up for operation especially in a particular way

KEY ADVANTAGE: Performance of KEY ADVANTAGE: Performance of Hardware, Flexibility of SoftwareHardware, Flexibility of Software

CLB Block RAM IP Core (Multiplier)