Distributed Processors Allow Revolutionary Hardware & Software Partitioning Version 1.1 –March...

29
Distributed Processors Allow Revolutionary Hardware & Software Partitioning Version 1.1 –March 2002 – APD / J-L Brelet & P Hardy - All right reserved - © XILINX 2002 th Workshop on Electronis for LHC Experiments – 13 September 2002, Colmar (France) uthors: Jean-Reynald Mace & Jean-Louis Brelet / Xilinx

Transcript of Distributed Processors Allow Revolutionary Hardware & Software Partitioning Version 1.1 –March...

Distributed Processors Allow Revolutionary Hardware & Software Partitioning

Version 1.1 –March 2002 – APD / J-L Brelet & P Hardy - All right reserved - © XILINX 2002

8th Workshop on Electronis for LHC Experiments9 – 13 September 2002, Colmar (France)

Authors: Jean-Reynald Mace & Jean-Louis Brelet / Xilinx

Colmar Workshop XILINX, Sept. 02 p 2

Agenda

• System Partitioning– Traditional techniques– Innovative approaches

• Example 1: DES Encryption Algorithm– HW solution compared to SW solution

• Example 2: Wireless LAN– HW / SW trade-off

• Enabling Technology: Virtex-II Pro

Colmar Workshop XILINX, Sept. 02 p 3

System Partitioning• Definition:

– “The mapping of a system level architecture into specific HW and SW components based upon application requirements”

• Today Implementation in:– Fixed HW components:

• FPGA, ASIC, ASSP,…

– SW components:• Code running on CPU, DSP

processors, microcontrollers,…

Hardware Components

Hardware Components

Embedded SoftwareEmbedded Software

Appl

icatio

n

Cont

rol

Man

agem

ent

Colmar Workshop XILINX, Sept. 02 p 4

Example System Functions

• Hardware:– Physical Layer

– Memory Interfaces

– Protocol Bridges

– Finite State Machine

– Signal Processing

– Encryption

• Software:– Protocol Stack

– User Interface

– Diagnostics

– Control

– Signal Processing

– Encryption

Colmar Workshop XILINX, Sept. 02 p 5

Optimal Solutions Enabled byOn-Demand Architectural Synthesis

• Hardware:

– Physical Layer

– Memory Interfaces

– Protocol Bridges

– FSM

– Signal Processing

– Encryption

• Software:

– Protocol Stack

– User Interface

– Diagnostics

– Control

– Signal Processing

– Encryption

Flexible MappingFlexible Mapping

Colmar Workshop XILINX, Sept. 02 p 6

Traditional System Design

• Fixed HW / SW partitioning• Early and final architecture mapping• Critical commitment made at concept level

SW mgrSW mgr

SW Dev.SW Dev. SW devSW dev SW devSW dev

Fixed InterfaceHW mgrHW mgr

HW engHW eng HW engHW eng PCB engPCB eng

Hardware Components

Hardware Components

Embedded SoftwareEmbedded Software

Colmar Workshop XILINX, Sept. 02 p 7

New System Partitioning• Flexible HW / SW partitioning

– Enables tradeoffs throughout the process

• Architecture redefinition possible– Tune for optimal performance and cost

HW TeamHW TeamHW TeamHW Team

SW TeamSW TeamSW TeamSW Team

SW TeamSW Team

HW TeamHW Team

Hardware Components

Hardware Components

Embedded SoftwareEmbedded Software

Flexible Interface

Colmar Workshop XILINX, Sept. 02 p 8

Innovative Partitioning

• New System Approach:– Enables non-traditional system architecture

• SW modules can be implemented in HW• HW modules can be moved to SW

– Requires a scalable and flexible platform that enables optimal HW / SW integration.

• Co-Design Methodology– Design attributes optimized during development

(Performances, resource usage,…)– SW developers and HW engineers create solutions

at module level for optimal systems

Colmar Workshop XILINX, Sept. 02 p 9

Agenda

• System Partitioning– Traditional techniques– Innovative approaches

• Example 1: DES Encryption Algorithm– HW solution compared to SW solution

• Example 2: Wireless LAN– HW / SW trade-off

• Enabling Technology: Virtex-II Pro

Colmar Workshop XILINX, Sept. 02 p 10

DES Overview

• DES Algorithm:– Message is split into fixed length blocks– Encode each block with fixed « key »– Block length = 64 bits (advanced 128-b), Key length = 56 bits

• 3DES Is An Enhanced Version of Encryption / Decryption– If Key 1 = Key 2 = Key 3, than 3DES is fully compatible with DES

Encrypt Decrypt Encrypt

Data

Key 1 Key 2 Key 3

Colmar Workshop XILINX, Sept. 02 p 11

System Integrator’s Dilemma

• DES Is Simple Algorithm• System Engineer Has To Evaluate:

– SW coding compare to HW implementation– Need for a specific processor and performances– Need for a dedicated solution– Cost effective solution of ASSP– Level of customization required– Fixed or flexible implementation

Colmar Workshop XILINX, Sept. 02 p 12

Architectural Options

• Popular DES Algorithm Is Available As SW code:– Public domain C or C++ code– Example of encryption data rate for 128-b DES :

• TMS320C62xx at 200 MHz delivers ~100 Mbps(*)• MIPS 64-b RISC at 250 MHz delivers ~400 Mbps(*)• Pentium III at 1 GHz delivers ~ 460 Mbps(*)

• HW Implementation Available At:– www.opencores.org– Over 1.5 Gbps data rate in Virtex-II at 130 MHz (*)

• 3DES 56-b Algorithm Achieves 10.7 Gbps Throughtput– Xilinx record-breaking announcement in April 2002

* Source: Helion Technology Limited, Xilinx Design Consultant (Xilinx Xcell journal Issue 43 Summer 2002)

Colmar Workshop XILINX, Sept. 02 p 13

Mixed HW / SW Solution

• Encryption / Decryption Data Path:– DES encryption module is called twice

– Decryption requires more compute power

Decrypt

EncryptDESDecryptionAlgorithm

Processor

DESEncryptionAlgorithm

Processor HW

HW

Data Flow

Data Flow

Colmar Workshop XILINX, Sept. 02 p 14

Full HW Implementation

• Full HW Implementation: – Shared Encryptor EncryptOther Tasks

ProcessorHW

Decrypt

Data Flow

• Full HW Pipelined Solution– Easy to add Parallelism– Easy to couple to distributed

processors

EncryptOr

No Processor?

Processor HW

Decrypt

Encrypt

Data Flow

Colmar Workshop XILINX, Sept. 02 p 15

Choices of HW / SW Partition

• Various Solutions To Fit Each Performances / Cost Requirement:

– SW vs HW vs mixed HW / SW

• New Approach: – On-Demand Architecture Synthesis to modify

HW / SW trade-off dynamically

• Distributed Processors Offer Another Level Of Flexibility Through Parallel Implementations

Colmar Workshop XILINX, Sept. 02 p 16

Agenda

• System Partitioning– Traditional techniques– Innovative approaches

• Example 1: DES Encryption Algorithm– HW solution compared to SW solution

• Example 2: Wireless LAN– HW / SW trade-off

• Enabling Technology: Virtex-II Pro

Colmar Workshop XILINX, Sept. 02 p 17

Networking Application: Wireless LAN

Intra Forwarding Technique: Video transmission

MPEG2

MPEG

2

FTP

File transfert: FTP

QoS

Colmar Workshop XILINX, Sept. 02 p 18

Physical Layer

Wireless LAN: Access point Architecture

Presentation Layer

Network Layer

Application Layer

Transport LayerSession Layer

Data Link LayerBus

HOST I/F

Medium Access ControlChannel Access Control

Colmar Workshop XILINX, Sept. 02 p 19

Wireless LAN: QoS• Wireless LAN example:

– Intra forwarding technique– Complex algorithms of

network access with few levels of prioritization in order to guarantee the QoS

• Select Most Urgent Frame – Choice is based on few

parameters: – priority (Po to Pn)– Lifetime (Normalized

Residual Lifetime , …CAP UP RLDIS NRL DB

Po Pn

256 Ptrs

64 Bits

Ptr of the Selected Frame

Ptr of the Received Frame

Pointer :

Colmar Workshop XILINX, Sept. 02 p 20

QoS: Full Hardware

• Design in FPGA:– FSM like design with adder/subtractor (~1000 LUT / 50MHz)– One table of pointers implemented in FPGA Block Ram

• 2 BRAM used for 4 priorities– Pipelining used– Easy to manage the Lifetime (update every 10 us)

• Complex Function in HW:– Electing two frames from one table of pointer by scrolling and comparison techniques

Table of ptr of frames to be transmitted

Elected ptr of Frame to transmitF11

F1F3F0

F10

Permutation

Colmar Workshop XILINX, Sept. 02 p 21

QoS: Full Software• Design in Firmware:

– Simple ~250 lines of C Code– Microprocessor used: PPC 405– One table of pointers per priority in external memory (SDRAM)– Sort algorithm very well known and easy to implement

• Complex Function in SW: – System Real Time Requirement– Frame lifetime controlled by a set of timers

• In the same time new frame is coming, existing frame should move from upper priority table

…..F41

F52

F7

F22

F11

F31

F10

F21

F1

F3

F0

F11

Highest PriorityTable

Elected ptr of Frame to transmit

Colmar Workshop XILINX, Sept. 02 p 22

QoS: Mixed HW / SW• Hardware Module:

– Liftetime and move ptr between tables– Design :

• FSM like design with adder/subtractor (~200 lut-50MHz)• 4 tables of pointers per priority with the FPGA Block Ram• Updated Lifetime by scrolling • Semaphore

• Software/Hardware interface: – Semaphore based communication

• Software Module:– Insertion and sort of the tables – Design :

• Easy to write (~200 lines of C Code)• Sort algorithm • Semaphore lib

F41

F52

F7

F22

…..

F41

F52

F7

F22

Colmar Workshop XILINX, Sept. 02 p 23

Design Solutions Comparison

• Full HW Solution– Full control of events timing and easy parallelism design– Complex HDL coding of the FSM

• State Machines architecture requires advanced expertise• Important validation time in design cycle

• Full SW Solution– Easy coding in C (sort algorithm) and flexibility– Difficult to handle real-time constraints

• Performances limitation by Von Neumann architecture (Proc.)

• Mixed HW / SW Solution: The Best Of The both Worlds– Offer advantages of HW and SW solution with the right

partitioning

Colmar Workshop XILINX, Sept. 02 p 24

Agenda

• System Partitioning– Traditional techniques– Innovative approaches

• Example 1: DES Encryption Algorithm– HW solution compared to SW solution

• Example 2: Wireless LAN– HW / SW trade-off

• Enabling Technology: Virtex-II Pro

Colmar Workshop XILINX, Sept. 02 p 25

Platform FPGA Architecture

• A Solution that provides:– IP Immersion

• The ability to integrate a wide variety of Hard & Soft IP

– A single Platform for multiple applications

– Total customization– Full Hardware and Firmware

upgradabilityHard-IP

Soft-IP

System Connectivity

HW functions

Colmar Workshop XILINX, Sept. 02 p 26

MGT

MGT

MGT

MGT

Fabric

PowerPC 405 Core

300+ MHz / 450+ DMIPSPerformance

Up to 4 per device

3.125 Gbps Multi-Gigabit Transceivers (MGTs)

Supports 10 Gbps standards

Up to 24 per device

• IP-Immersion™ Fabric• ActiveInterconnect™• 18Kb Dual-Port RAM• Xtreme™ Multipliers• 16 Global Clock Domains

Virtex-II Pro Platform FPGA

Colmar Workshop XILINX, Sept. 02 p 27

High-Bandwidth Communications

• Code (SW) and data are stored in BRAM, without any external resources

• On-Chip Memory (OCM) offers an unique data bandwidth between FPGA fabric (HW) and embedded PowerPC core (SW)

• High-Bandwidth Communications between distributed processors OCM™ Technology

BlockRAMs

I-Cache16KB MMU

Fetch & Decode

Timers and

Debug Logic

Execution Unit32x32b GPRALU, MAC

D-Cache16KB

AccelerationLogic

6.4Gb/sec

6.4Gb/sec

6.4Gb/sec

6.4Gb/sec

Colmar Workshop XILINX, Sept. 02 p 28

Flexibility of Programmable Systems

• Nearly all Systems are composed of:– Logic + Memory + Processor

• Virtex-II Pro enables optimum “system partitioning” between Hardware and Software

Performing SW tasksin HW is Inefficient

Performing HW tasksin SW is Slow

Provides the best of both worlds

Colmar Workshop XILINX, Sept. 02 p 29

Conclusion• Distributed Processors Allow Flexible HW / SW

Partitioning:– Optimal mapping at the module level– Offer to design with best solution of both worlds

• Virtex-II Pro The First Programmable System To Enable True Architectural Synthesis:– Unique bandwidth between embedded processors and HW – Unique on-chip solution provides an application-specific mix

of logic, memory, integrated processors, and high bandwidth I/O