Communication Protocols

• Layering– Lower levels provide services to higher level

– Easier to design

– Physical layer

• Lowest level in hierarchy

• Medium to carry data from one actor (device or node) to another

• Protocols: real-time or best effort– Parallel

– Serial

– Wireless

Parallel communication

• Multiple data, control, and power wires

– One bit per wire

• High data throughput with short distances

• Typically used when connecting devices on same IC or same circuit board

– Bus must be kept short• long parallel wires result in high capacitance values which requires

more time to charge/discharge

• Data misalignment between wires increases as length increases

• Higher cost, bulky

Parallel Protocols: PCI Bus

• PCI Bus (Peripheral Component Interconnect)– High performance bus designed by Intel

in the 1990’s

– Interconnects CPUs, expansion boards, memory

– Data transfer rates up to 1GBs for 64 bit addresses

– Synchronous bus architecture

– Multiplexed data/address lines

• PCI express

– Serial, point-to-point protocol

Source: http://computer.howstuffworks.com

Parallel Protocols: ARM Bus

• ARM Bus– Designed and used internally by ARM Corporation

– Interfaces with ARM line of processors

– Many IC design companies have own bus protocol

– Data transfer rate is a function of clock speed

– 32-bit addressing

Serial Communication

• Single data wire – transmit one bit at a time

• Higher data throughput with long distances

– Less average capacitance, so more bits per unit of time

• Complex protocol and interfacing logic

– Sender needs to decompose word into bits

– Receiver needs to recompose bits into word

– Control signals often on the same wire -> increasing protocol complexity

Serial Communication

bit 0 bit 1 bit n-1

start stop...

• Parameters:

– Baud (bit) rate.

– Number of bits per character.

– Parity/no parity.

– Even/odd parity.

– Length of stop bit (1, 1.5, 2 bits).

Serial Protocol: 8251 UART

• Universal asynchronous receiver transmitter• Takes parallel data and transmits serially at up to

max 450 Kbps• 8251 chip functions are integrated into standard PC

interface chip.

CPU 8251

status

(8 bit)

serial

Serial Protocols: I2C

• I2C (Inter-IC)– Two-wire serial bus protocol developed by Philips

Semiconductors ~20 years ago

– Enables peripheral ICs to communicate using simple communication hardware

• appropriate for peripherals where simplicity and low manufacturing cost are more important than speed

– Normal mode: 100 Kbps with 7-bit address

– Fast mode: 3.4 Mbpbs with10-bit address

– Common devices capable of interfacing to I2C bus:• EPROMS, Flash, and some RAM memory, real-time clocks, watchdog

timers, and microcontrollers

• Raspberry PI

Serial Protocols: USB

• USB (Universal Serial Bus)– Easier connection between PC and peripherals

– USB 1.1 has 2 data rates:• 12 Mbps for increased bandwidth devices

• 1.5 Mbps for lower-speed devices (joysticks, game pads)

– USB 2.0 runs at 480 Mbps; USB 3.1 up to 10 Gbps

– Tiered star topology can be used• One USB device (hub) connected to PC

• Up to 127 USB devices can be connected to hub

– USB host controller • Manages and controls bandwidth and driver software required by each

peripheral

• Dynamically allocates power downstream according to devices connected/disconnected

PCI Express (PCIe)• Serial, point-to-point protocol

• Bandwidth is very scalable: 1x-16x links

• Max 6.4GBps in either direction on x16

• Switches for connecting different devices

Source: http://computer.howstuffworks.com

Real-Time Communication & Protocol Examples

Class Overview

• What’ve covered until now in SW:

– Real-time scheduling, RTOS, RTIO started

• Where we are going today:

– RTIO, HW/SW codesign

• Due today:

– Article on RTIO

• Upcoming:

– HW2 assigned

– Individual project part 2 deadline extended to end of the

day Sunday, 2/19

Real-time Comm. Requirements– Real-time behavior

– Efficient, economical(e.g. centralized power supply)

– Appropriate bandwidth and communication delay

– Robustness

– Fault tolerance

– Maintainability

– Diagnosability

– Security

– Safety

Real-time IO•Field bus:

–A family of industrial computer network protocols used for real-time distributed control

• Carrier-sense multiple-access/collision-detection (CSMA/CD); used in Ethernet & CAN

• Alternatives:–Token rings, token busses

–Carrier-sense multiple-access/collision-avoidance: CSMA/CA• Each partner gets an ID (priority). After each bus transfer, all partners try setting their

ID on the bus; partners detecting higher ID disconnect themselves from the bus. Highest priority partner gets guaranteed response time; others communicate only if they are given a chance.

Event vs. time triggered• Event Triggered (ET):

– Computation/communication triggered by an external event

– Events are primarily generated by changes in the environment

– Efficient — only do things when they need to be done; rest and save energy/cpu time/bandwidth

– High peak-load if multiple events happen at once

– Hard to analyze due to asynchronous nature of events

• Time Triggered (TT):

– Computation/communication triggered by the system clock

– Events happen according to a fixed schedule:

• Inefficient — does things periodically, whether needed or not

– Enhanced analizability due to easily characterizable load, predictable interaction sequences, bus use, etc.

Time division multiple access

• Each assigned a fixed time slot:

http://www.ece.cmu.edu/~koopman/jtdma/jtdma.html#classical

Master sends sync

Some waiting time

Each slave transmits in its time slot

Variations (truncating unused slots, several slots per slave) exist

Advantages of TDMA-bussesover priority-driven schemes

– Can provide QoS guarantees

– TDMA resources support temporal composability, by separating resource access of different subsystems

– TDMA resources have a very deterministic timing behavior

– Can be made fault tolerant

– Support for error detection

– Support for error contention

• a faulty subsystem does not affect the correct behavior of the remaining system

[Ernesto Wandeler Lothar Thiele: Optimal TDMA Time Slot and Cycle

Length Allocation for Hard Real-Time Systems, ASP-DAC, 2006]

Field busses: Profibus• Process Field Bus (Profibus):

• PROFIBUS DP (Decentralized Peripherals) is used to operate sensors and actuators via a centralized controller in factory automation apps; runs at 9.6kbps – 12 Mbps; RS485 allows max 126 devices, but expansion is possible

• ROFIBUS PA (Process Automation) is used to monitor measuring equipment via a process control system in process automation apps; runs at 31.2 kbps; same message format as DP

– Focus on safety; 20% market share for field busses.– Integration with Ethernet via Profinet.

[http://www.profibus.com/]

Profibus: Application & Data Layers

• Application layer:– DP-V0 for cyclic exchange of data and diagnosis

– DP-V1 for acyclic data exchange and alarm handling

– DP-V2 for slave2slave comm and data exchange broadcast

• Data link: – FDL (Field bus Data Link) combines token passing with a master-slave method for Profibus-DP

• Each byte uses even parity and is transferred asynchronously with a start and stop bit

• Master signals the start of a new telegram with a SYN pause of at least 33 bits

• Various messages possible:

– Token

– Variable data length

– Fixed data length

– No data

– Brief ack

OSI-Layer PROFIBUS

7 Application DPV0 DPV1 DPV2

Management

6 Presentation

--5 Session

4 Transport

3 Network

2 Data Link FDL

1 Physical EIA-485 Optical MBP

Controller area network (CAN)– Designed by Bosch and Intel in 1981;

– Key concept: • every device can be connected by a single set of wires, and every device that is connected

can freely exchange data with any other device

– Originally designed for cars; now used also for:

• elevator controllers, copiers, telescopes, production-line control systems, and medical instruments

– Binary countdown arbitration (CSMA/CD)• Start from MSB, transmit each bit of priority

• Highest priority wins

– Throughput:10kbit/s - 1 Mbit/s

– Low and high-priority signals• maximum latency of 134 µs for high priority

www.can.bosch.com

Aircraft communication systems– Information exchange

• information many bytes of data: e.g. digital map, flight plan, etc.

• exchange : a response is expected, at min acknowledgment

• higher speed data link needed

– Control platform: sampling and data transmission• data : digital value of an analog parameter: e.g. speed; height etc.

• No response is expected, but:– Time, integrity and availability are the key drivers.

– The stability of the flight relies on this transmission

• Aeronautical response : ARINC 429 protocol

ARINC 429 overview• Developed by Aeronautical Radio, Incorporated (ARINC)

• Commonly used standard for the aircraft

• Electrical and data format standard for a 2-wire serial bus with one sender and many listeners.

• Each data is individually identified (by a label) and sent

Physical connection

DataLink/MAC

Network

Transport

Application

label data

label data parity

32 bit

Information system requirements

• Ensure that the information is transmitted without any error.

– Data needs to be acknowledged

– Messages can be sent again in case of error

• Past aircraft uses A429 but added acknowledgement.

Physical connection

DataLink/MAC

Network

Transport

Application

A429 williamsburg

ARINC 629

• Multi-transmitter protocol where many units share the same bus; originally designed for Boeing 777.

• Based on "waiting room" protocol:

– Each node is assigned a unique number of mini slots that must elapse with silence on the channel before the data transmission begins

• Three (groups of) time-out parameters:– SG — synchronization gap controlling access to the waiting room

– TGi — terminal gap, the personal time-out of node I

– TI — transmit interval preventing monopolization of channel

– TI > SG > max{TGi }

TTP (Time-Triggered Protocol)

Sources: Dr. Insup Lee & H. Kopetz

TTP – more than just a protocol– Network protocol

– Operating system scheduling philosophy

– Fault tolerance approach

Time-Triggered approach – Simple to implement

– Stable time base

– Cyclic schedules

TTP versions

• TTP/A (Automotive Class A = soft real time)

– A scaled-down version of TTP

– A cheaper master/slave variant

• Distributed master slave is expensive

• TTP/C (Automotive Class C = hard real time)

– A full version of TTP

– A fault-tolerant distributed variant

Protocol Layer in TTP/A

TTP/A: Polling

• Operation

– Master polls the other nodes (slaves)

– Non-master nodes transmit messages when they are polled

– Inter-slave communication through the master

Polling Tradeoffs

• Advantages– Simple protocol to implement

– Historically very popular

– Bounded latency for real-time applications

• Disadvantages– Single point of failure from centralized master

– Polling consumes bandwidth

– Network size is fixed during installation• Master can also discover nodes during reconfiguration

• TTP/C

– A time-triggered communication protocol for safety-critical (fault-tolerant) distributed real-time control systems

– Based on a TDMA media access strategy

• Clock synchronization: Each node measures the difference between the expected and the observed arrival time of a message to calculate the difference between the sender’s & receiver’s clocks

– Fail Silence• A subsystem is fail-silent if it either produces correct results or no

results at all, i.e., it is quiet in case it cannot deliver the correct service

Application software in host

FTU Membership

Redundancy

Management (RM)

SRU Membership

Clock Synchronization

Media Access: TDMA

Host Layer

FTU CNIFault tolerance unit

Communication

Network Interface (CNI)

FTU Layer

RM Layer

SRU LayerSmallest

Replaceable Unit

Link/Physical

Basic CNI

TTP/C Protocol Layer

FTU Layer

Group two or more nodes into FTUs

RM Layer

Provide the mechanisms for the cold start of a TTP/C cluster

SRU Layer

Store the data fields of the received frames

Data Link/Physical Layer

Provide the means to exchange frames between the nodes

(a) Two active nodes, two shadow nodes

(b) Triple modular redundancy: three active nodes with one shadow

(c) Two active nodes without a shadow node

FTU Configuration Examples in TTP/C

Controller to run protocol

DPRAM (dual ported RAM)

Used for memory-mapped network interface

BG (Bus Guard)

Hardware watchdog to ensure “fail silent”

HW must use highly accurate time sources

Dual redundant crystal oscillators are used for Boeing 777

TTP/C Frame

• I-Frames used for initialization

• N-Frames used for normal messages

Cycle in TTP/C

• TDMA Cycle– One FTU sends results twice

– Then next FTU sends results

– And so on, until back to the next message from the first FTU

• Cluster Cycle

– Cluster cycle involves scheduling all messages and tasks

TTP/A vs. TTP/C

Service TTP/A TTP/C

Clock Synchronization Central

Multimaster

Distributed,

Fault-Tolerant

Mode Switches yes yes

Communication Error Detection Parity 16/24 bit CRC

Membership Service simple full

External Clock Synchronization yes yes

Time-Redundant Transmission yes yes

Duplex Nodes no yes

Duplex Channels no yes

Redundancy Management no yes

Shadow Node no yes

Pros and Cons of TTP

• Advantages– Simple protocol to implement

– Deterministic response time

– No wasted time for master polling messages

• Disadvantages– Wasted bandwidth when some nodes are idle

– Stable clocks

– Fixed network size during installation

FlexRay• Robust, scalable, deterministic, and fault-tolerant digital serial

bus system designed for use in automotive applications

• Developed by consortium: BMW, Ford, Bosch,Daimler-Chrysler… – Specified in SDL; finalized in 2009

• Built as extension to TTP and Byteflight protocols.• Improved error tolerance and time-determinism

• Meets requirements with transfer rates >> CAN

– initially targeted for ~ 10Mbit/sec;

– design allows much higher data rates

• TDMA (Time Division Multiple Access) protocol:Fixed time slot with exclusive access to the bus

• Cycle subdivided into a static and a dynamic segments

TDMA in FlexRay• Exclusive bus access enabled for short time in each case.

Dynamic segment for transmission of variable length information.Fixed priorities in dynamic segment: minislots for each potential sender.Bandwidth used only when it is actually needed.

Structure of Flexray networksBus Guardian (BG) protects the system against failing processors by gating access to Bus Driver (BD)

Comparison of real-time protocols

FIP = Flexible time triggered protocol; statically scheduled with centralized arbitration

LON = for building automation, uses TDMA with CSMA/CA and dynamically varies the

number of slots per device for each schedule

Wireless communication

• Infrared (IR)– Frequencies just below visible light spectrum

– Diode emits infrared light to generate signal

– Infrared transistor detects signal

– Cheap to build but need line of sight, limited range

– Data transfer rate of 9.6 kbps and 4 Mbps

• Radio frequency (RF)– Electromagnetic wave frequencies in radio spectrum

– Analog circuitry and antenna needed on both sides

– Line of sight not needed, transmitter power determines range

RFID• Use of EM field to transfer data, for identifying and tracking tags

attached to objects; no need for line of sight

• Active vs. passive tags– Active transmits ID, they are low power (~10-100uA) but higher cost ($10-

$200/unit retail)

– Passive can be read by RF - no intrinsic power consumption (powered by EM induction) and cheaper ($0.20-0.40)

• Readers– $100+ to $1000s, range from read and report to smart tracking, etc.

• Using RFID for real-time location systems (RTLS)– Only active tags work with range 100m+ in line of sight, or 1-20m

obstructed

– Battery - up to years on a single charge @ <1Hz transmission rate

– Location accuracy as close as 30cm with reader presence

Bluetooth, BLE, ZigbeeB

th • IEEE 802.15.1

• Developed and licensed by the Bluetooth Special Interest Group (SIG)

BLE • Adopted into

Bluetooth specification

• Bluetooth Low Energy Technology

ee • IEEE 802.15.4

• Maintained and published by the ZigBeeAlliance

Side By Side ComparisonBluetooth BLE ZigBee

Band 2.4GHz 2.4GHz 2.4GHz, 868MHz, 915MHz

Antenna/HW Shared Independent

Power 100 mW ~10 mW 30 mW

Battery Life Days – months 1-2 years 6 months – 2 yrs

Range 10-30 m 10 m 10-75 m

Data Rate 1-3 Mbps 1 Mbps 25-250 Kbps

Network Topologies

Ad hoc, point to point, star

Mesh, ad hoc, star

Time to Wake and Transmit

3s 3ms 15ms

Security 128-bit encryption 128-bit encryption 128-bit encryption

Wireless Protocols: 802.11

• IEEE 802.11

– Standard for wireless LANs

– Specifies parameters for PHY and MAC layers of network• PHY layer

– handles transmission of data between nodes

– data transfer rates up to 600 Mbit/s for 802.11n

– operates in 2.4 / 5 GHz frequency band (RF)

• MAC layer

– medium access control layer

– protocol responsible for maintaining order in shared medium

– collision avoidance/detection

Summary

• Interfacing: on & off chip

• Real-time IO

– Profibus

– CAN

– ARINC

– TTP/A & TTP/C

– FlexRey

• Wireless

– IR, BLE, ZigBee, RFID, 802.11

Hardware/Software Codesign

Tajana Simunic Rosing

Department of Computer Science and Engineering

University of California, San Diego.

ES Design

Verification and Validation

HardwareHardware components

System Architecture: YesterdayPCB design

3MHIGH DENSITY

GraphicsExternal

BusI/OLAN

DRAMVRAM

Processor

Cache/DRAM

Controller

Audio Motion

VideoVRAM

VRAMDRAM

PCI Bus

ISA/EISA

Add-in board

A System Architecture: TodayHW/SW Codesign of a SoC

MEMORY

Cache/SRAM

Processor

Graphics Video

Glue Glue

PCI Interface

EISA InterfaceI/

System Design Problem Areas

Interface

Processor ASIC

Memory

Analog I/O

2. HDL Modeling

Architectural synthesis

Logic synthesis

Physical synthesis

3. Software synthesis,

Optimization,

Retargetable code gen.,

Debugging &

Programming environ.

1. Design environment, co-simulation

constraint analysis.

4. Test Issues

HW-centric view of a Platform

ApplicationSpace

HW-SW Kernel

FPGACPU Processor(s), RTOS(es)

and SW architecture

IP can be:

• HW or SW

• hard, soft or ‘firm’ (HW)

• source or object (SW)

Scaleable

bus, test, power, IO,

clock, timing architectures

+ Reference Design

Programmable

Hardware IP

Pre-Qualified/Verified

Foundation-IP*

Foundry-Specific

HW Qualification

Reconfigurable Hardware Region

(FPGA, LPGA, …)

SW architecture

characterisation

Source: Grant Martin and Henry Chang, “Platform-Based Design:

A Tutorial,” ISQED 2002, 18 March 2002, San Jose, CA.

SW-Centric View of Platforms

Output DevicesInput devices

Hardware Platform

Hardware

Software

network

Software Platform

Application Software

Platform API

Device DriversN

Source: Grant Martin and Henry Chang, “Platform-Based Design:

A Tutorial,” ISQED 2002, 18 March 2002, San Jose, CA.

HW/SW Codesign: Motivations

• Benefit from both HW and SW

–HW:

• Parallelism -> better performance, lower power

• Higher implementation cost

• Sequential implementation -> great for some problems

• Lower implementation cost, but often slower and higher power

Software or hardware?

Decision based on hardware/ software partitioning

Hardware/software codesign

Processor P1

Processor P2 Hardware

Specification

Mapping

System Partitioning

– Good partitioning mechanism:

1) Minimize communication across bus

2) Allows parallelism -> both HW & CPU operating concurrently

3) Near peak processor utilization at all times

process (a, b, c)

in port a, b;

out port c;

read(a);

write(c);

Specification

Line ()

a = …

detach

Processor

Capture

Model HW

Partition

Synthesize

Interface

Determining Communication Level

–Easier to program at application level• (send, receive, wait) but difficult to predict

–More difficult to specify at low level• Difficult to extract from program but timing and

resources easier to predict

Application

Program

Operating

System

I/O driver

I/O bus

Application

hardware

(custom)

I/O driver

I/O bus

Send, Receive, Wait

Register reads/writes

Interrupt service

Bus transactions

Interrupts

Partitioning Costs

• Software Resources–Performance and power consumption

–Lines of code – development and testing cost

–Cost of components

• Hardware Resources–Fixed number of gates, limited memory & I/O

–Difficult to estimate timing for custom hardware

–Recent design shift towards IP• Well-defined resource and timing characteristics

Functional

Blocks

Feature

Points

Source Lines of

Code (SLOC)

Software

Development and

Testing Cost

Calibration

Language

Conversion

Equivalent SLOC

including reuse

Software

development effort

Software

maintenance effort

Software schedule

Software

Analysis

Process

I/O Count

Die Area

Core Area

Gate Count

Characteristics

Design Cost

Tooling Cost

Wafer Fabrication

and Sawing Cost

Single-Chip-

Package Cost

Feature Size

Interconnect

Length

Die Yield

Number Up

Die Cost

Chip Hardware

I/O Format

Rent’s Rule

Test Development Cost

Productivity, reuse

S/G Ratio

I/O Count

Die Area

Core Area

Gate Count

Characteristics

Design Cost

Tooling Cost

Wafer Fabrication

and Sawing Cost

Single-Chip-

Package Cost

Feature Size

Interconnect

Length

Die Yield

Number Up

Die Cost

Chip Hardware

I/O Format

Rent’s Rule

Test Development Cost

Productivity, reuse

S/G RatioHardware

Analysis

Process

Hardware/Software Partitioning

memory

Processor

Simple architectural model: CPU + 1 or more ASICs on a bus

• Properties of classic partitioning algorithms

– Single rate; Single-thread: CPU waits for ASIC

– Type of CPU is known; ASIC is synthesized

HW/SW Partitioning Styles

• HW first approach

– start with all-ASIC solution which satisfies constraints

– migrate functions to software to reduce cost

• SW first approach

– start with all-software solution which does not satisfy constraints

– migrate functions to hardware to meet constraints

Codesign Verification

• Run SW on the CPU

• Simulate HW (Verilog)

Verilog Simulator

Application-specific

hardware

Hardware

Process 1

Hardware

Process 1

Bus interface

Verilog PLI

Software

process 1

Software

process 2

Unix sockets

SpecC model

Gate Count Lines of Code

Derived from

Foresight

I/O Count Number Up

Fab. Cost

Test Cost

Die Size

SCP Cost

Dev. Cost Dev. Schedule

Maintenance Cost

Cost Analysis

(Ghost)

System Performance

Metrics

System

Outputs

Co-Design Process

System

Requirements

Capture

Functional

Behavior Block

Diagram

Machines

Library

Elements

defined

Reusables

Resource

Specification

Architecture

Block Diagram

Data Flow

Monitors

System

Characteristics

Foresight Co-Design

Integrated Toolset

Industry Initiatives • Seamless Co-Verification Environment-CVE

• Proridium (Foresight)

– Customers: Boeing, Microsoft, Raytheon, Oracle etc.

• CoWare (now in Synopsys)

– Cosimulation and IP integration

– One of founding members of SystemC (language)

• New FPGA synthesis tools incorporate CPUs

• Platform-based design

– Platform: predesigned architecture that designers can use to build systems for a given range of applications

ILP for HW/SW Partitioning

Ingredients:

• Cost function

• Constraints

Involving linear expressions of integer variables from a set X

Def.: The problem of minimizing (1) subject to the constraints (2) is called an integer programming (IP) problem.

If all xi are constrained to be either 0 or 1, the IP problem said to be a 0/1

integer programming problem.

Cost function )1(,with NxRaxaC i

Constraints: )2(,with: ,, RcbcxbJjXx

jjijiji

FAQ on integer programming

Maximizing the cost done by setting C‘=-C

Integer programming is NP-complete.

Running times increase exponentially with problem size

Commercial solvers can solve for thousands of variables

IP models are a good starting point for modelling even if in the end heuristics have to be used to solve them.

IP model for HW/SW partitioningNotation:Index set I denotes task graph nodes. Index set L denotes task graph node types

e.g. square root, DCT or FFTIndex set KH denotes hardware component types.

e.g. hardware components for the DCT or the FFT. Index set J of hardware component instancesIndex set KP denotes processors.

All processors are assumed to be of the same typeT is a mapping from task graph nodes to their types

T: I L

Therefore:

Xi,k: =1 if node vi is mapped to HW component type k KH Yi,k: =1 if node vi is mapped to processor k KP NY ℓ,k =1 if at least one node of type ℓ is mapped to processor k KP

ConstraintsOperation assignment constraints

KHk KPk

kiki YXIi 1: ,,

All task graph nodes have to be mapped either in software or in hardware.

Variables are assumed to be integers.

Additional constraints to guarantee they are either 0 or 1:

1:: , kiXKHkIi

1:: , kiYKPkIi

Operation assignment constraints

ℓ L, i:T(vi)=cℓ, k KP: NY ℓ,k Yi,k

•For all types ℓ of operations and for all nodes i of this type:

– if i is mapped to some processor k, then that processor must implement the functionality of ℓ.

•Decision variables must also be 0/1 variables:

ℓ L, k KP: NY ℓ,k 1.

Resource & design constraints

• k KH, the cost for components of that type should not exceed its maximum.

• k KP, the cost for associated data storage area should not exceed its maximum.

• k KP the cost for storing instructions should not exceed its maximum.

• The total cost (k KH) of HW components should not exceed its maximum

• The total cost of data memories (k KP) should not exceed its maximum

• The total cost instruction memories (k KP) should not exceed its maximum

Scheduling

Processor

p1 ASIC h1

FIR1 FIR2

v1 v2 v3 v4

v9 v10

v5 v6 v7 v8

... ...

FIR2 on h1

... ...

Communication channel c1

Scheduling / precedence constraints

• For all nodes vi1 and vi2 that are potentially mapped to the same processor or hardware component instance, introduce a binary decision variable bi1,i2 withbi1,i2=1 if vi1 is executed before vi2 and

= 0 otherwise.Define constraints of the type(end-time of vi1) (start time of vi2) if bi1,i2=1 and(end-time of vi2) (start time of vi1) if bi1,i2=0

• Ensure that the schedule for executing operations is consistent with the precedence constraints in the task graph.

• Timing constraints need to be met

Example• HW types H1, H2 and H3 with

costs of 20, 25, and 30.

• Processors of type P.

• Tasks T1 to T5.

• Execution times:

T H1 H2 H3 P

1 20 100

2 20 100

3 12 10

4 12 10

5 20 100

Operation assignment constraint

T H1 H2 H3 P

1 20 100

2 20 100

3 12 10

4 12 10

5 20 100

X1,1+Y1,1=1 (task 1 mapped to H1 or to P)

X2,2+Y2,1=1

X3,3+Y3,1=1

X4,3+Y4,1=1

X5,1+Y5,1=1

KHk KPk

kiki YXIi 1: ,,

Operation assignment constraint

•Assume types of tasks are ℓ =1, 2, 3, 3, and 1.

ℓ L, i:T(vi)=c ℓ, k KP: NY ℓ,k Yi,k

Functionality 3 to be implemented on

processor if node 4 is mapped to it.

Other equations•Time constraint: Application specific hardware required for time constraints under 100 time units.

T H1 H2 H3 P

1 20 100

2 20 100

3 12 10

4 12 10

5 20 100

Cost function:

C=20 #(H1) + 25 #(H2) + 30 # (H3) + cost(processor) + cost(memory)

Result•For a time constraint of 100 time units and cost(P)<cost(H3):

T H1 H2 H3 P

1 20 100

2 20 100

3 12 10

4 12 10

5 20 100

Solution:T1 H1

Separation of scheduling and partitioning

•Combined scheduling/partitioning very complex; Heuristic: Compute estimated schedule

•Perform partitioning for estimated schedule

•Perform final scheduling

•If final schedule does not meet time constraint, go to 1 using a reduced overall timing constraint.

2nd Iteration

specification

Actual execution time

1st Iteration

approx. execution time

Actual execution time

approx. execution time

New specification

Summary

• HW/SW codesign is complicated and limited by performance estimates

• Algorithms are in research and development,

– much of the work is still done by expert designers

Sources and References

• Peter Marwedel, “Embedded Systems Design,” 2004.

• Giovanni De Micheli @ EPFL

• Vincent Mooney @ Gatech

• Nikil Dutt @ UCI

CMOS VLSI Trends

Yesterday

(1980s)

Today Tomorrow

memory

gate arrays

processors

memory

struc. ASIC

processors

reconfigurable

memory

processors

reconfigurable(no processor)

platform SoC

custom SoC

struc. ASIC(no processor)

struc. SoC

Increasing Customization Cost

Example: Design with

80 M transistors in

100 nm technology

Estimated Cost -

$85 M -$90 M

12 – 18 months

Top cost drivers

Verification (40%)

Architecture Design (26%)

Embedded Design 1400 man months (SW)

1150 man months (HW)

HW/SW integration

*Handel H. Jones, ”How to Slow the Design Cost Spiral,” Electronics Design Chain, September 2002, www.designchain.com

Responses to Increasing Cost

• General purpose ISA

– Universality high volumes and reuse

– Abstraction compilation technologies and high application/development productivity

• Custom silicon for embedded platforms in sufficiently high volumes

– Domain specific ISAs, e.g., DSPs

– Application Specific Standard Products

– Reconfigurable hardware

• HW/SW Codesign

HW/SW Codesign Issues• Task level concurrency management

Which tasks in the final system?

• High level transformationsTransformation outside the scope of traditional compilers

• Hardware/software partitioningWhich operation mapped to hardware, which to software?

• CompilationHardware-aware compilation

• SchedulingPerformed several times, with varying precision

• Design space explorationSet of possible designs, not just one.

Partitioning Analysis

• Result of compilation is synthesizable HDL and assembly code for the processor

• Compiler & profiler determine dependence and rough performance estimates

HW & SW Foundries

• HW1– LSI Logic ASIC Wafer Foundry

Data• 0.18 mm feature size• 8 inch wafers• 6 layers

– TSMC 018 Wafer Processing

• HW2– Samsung Semiconductor ASIC

Wafer Foundry Data• 0.35 mm feature size• 6 inch wafers• 4 layers

– TSMC 035 Wafer Processing

• SW1– Nominal to High development

effort

• SW2– Low to Nominal development

effort

Packaging

Fabrication

Tooling

Design

Testing

gProduction Quantity and Level of Reuse

Software development

Packaging

Fabrication

Tooling

Design

Testing

MIXED Implementation Using HW1 and SW1

Reuse of:

• Gate-level IP

• Code

0 10 20 30 40 50 60 70 80 90 100

Percent Custom Hardware

HW1/SW1 HW1/SW2

HW2/SW1 HW2/SW2

Total Cost Per Chip

10,000 Units

Co-simulation for HW & SW• Transistor-level accurate

– post layout SPICE model

• Gate-level accurate– precise HDL gate delay model

• Cycle accurate– correct transitions at clock edges

– timing information between edges is thrown away

• Bus accurate– cycle accurate bus model

– behavioral model of processor, hardware

• Instruction set accurate– instruction set simulator used for processors

– used for early design space exploration

Communication Protocols - Home | Computer Science

Transcript of Communication Protocols - Home | Computer Science

Communication Protocols - Home | Computer Science

Documents

Transcript of Communication Protocols - Home | Computer Science

Module 10: Computer Network Basic components of computer networks Basic data transmission Communication links and network topology Protocols Network applications.

Computer-based Tracking Protocols: Improving Communication between

COMPUTER NETWORKS COMPUTER NETWORKS AIM To understand the basic concepts of data communication, networking and the usage of protocols. OBJECTIVES To study.

Serial Communication Protocols

EECS122 - UCB 1 CS 194: Distributed Systems Communication Protocols, RPC Computer Science Division Department of Electrical Engineering and Computer Sciences.

Digital Communication Protocols

13 Signaling Protocols for Multimedia Communication · PDF fileSignaling Protocols for Multimedia Communication ... • Signaling = Protocols of the Control Plane – User-to-Network

CSE/EE 461 Introduction to Computer Communication Networks€¦ · Layering and Protocol Stacks Layering is how we combine protocols Higher level protocols build on services provided

COMPUTER COMMUNICATION NETWORKS · communication: unicasting and multicasting. The Internet Group Management Protocol (IGMP) is one of the necessary, but not sufficient, protocols

Scaling RPL to Dense and Large Networks with Constrained ... · C.2.2 [Computer-Communication Networks]: Network Protocols—Routing Protocols General Terms Algorithms, Design, Performance

Lesson 3-Communicating Over Networks. Overview Understand network communication. Decipher computer addressing. Network communication protocols. Network.

Examining Network Protocols. Overview Introduction to Protocols Protocols and Data Transmissions Common Protocols Other Communication Protocols Remote.

Evolving Custom Communication Protocols

Smart Grid Communication Protocols Impact on Smart … Grid Communication Protocols... · Smart Grid Communication Protocols – Impact on Smart Grid ... DLMS/COSEM • AMI metering

CSE 3214: Computer Networks Protocols and Applications€¦ · Transport vs. network layer n network layer: logical communication between hosts n transport layer: logical communication

Computer Networks 364 Protocols

Communication Tools & Protocols TECHNICAL NOTE Form 1820 ...documents.opto22.com/1820_Communication_Tools... · Communication Tools & Protocols Form 1820-200602 Protocols and Communication

Communication and Execution Protocols

Computer Networks - Birla Institute of Technology and ... · Computer Networks About Protocols, Layers, Interfaces, Logical / Virtual Communication & Services Of Network Architectures

01. Communication Protocols [Read-Only] - GE Grid Solutions · Communication Protocols 53 SUBSTATION NAME ADDRESS. 01. Communication Protocols ... 4 Bit Quality Indicator Code BINARY