Distributed Processors Allow Revolutionary Hardware & Software Partitioning Version 1.1 –March...
-
Upload
andra-thompson -
Category
Documents
-
view
217 -
download
1
Transcript of Distributed Processors Allow Revolutionary Hardware & Software Partitioning Version 1.1 –March...
Distributed Processors Allow Revolutionary Hardware & Software Partitioning
Version 1.1 –March 2002 – APD / J-L Brelet & P Hardy - All right reserved - © XILINX 2002
8th Workshop on Electronis for LHC Experiments9 – 13 September 2002, Colmar (France)
Authors: Jean-Reynald Mace & Jean-Louis Brelet / Xilinx
Colmar Workshop XILINX, Sept. 02 p 2
Agenda
• System Partitioning– Traditional techniques– Innovative approaches
• Example 1: DES Encryption Algorithm– HW solution compared to SW solution
• Example 2: Wireless LAN– HW / SW trade-off
• Enabling Technology: Virtex-II Pro
Colmar Workshop XILINX, Sept. 02 p 3
System Partitioning• Definition:
– “The mapping of a system level architecture into specific HW and SW components based upon application requirements”
• Today Implementation in:– Fixed HW components:
• FPGA, ASIC, ASSP,…
– SW components:• Code running on CPU, DSP
processors, microcontrollers,…
Hardware Components
Hardware Components
Embedded SoftwareEmbedded Software
Appl
icatio
n
Cont
rol
Man
agem
ent
Colmar Workshop XILINX, Sept. 02 p 4
Example System Functions
• Hardware:– Physical Layer
– Memory Interfaces
– Protocol Bridges
– Finite State Machine
– Signal Processing
– Encryption
• Software:– Protocol Stack
– User Interface
– Diagnostics
– Control
– Signal Processing
– Encryption
Colmar Workshop XILINX, Sept. 02 p 5
Optimal Solutions Enabled byOn-Demand Architectural Synthesis
• Hardware:
– Physical Layer
– Memory Interfaces
– Protocol Bridges
– FSM
– Signal Processing
– Encryption
• Software:
– Protocol Stack
– User Interface
– Diagnostics
– Control
– Signal Processing
– Encryption
Flexible MappingFlexible Mapping
Colmar Workshop XILINX, Sept. 02 p 6
Traditional System Design
• Fixed HW / SW partitioning• Early and final architecture mapping• Critical commitment made at concept level
SW mgrSW mgr
SW Dev.SW Dev. SW devSW dev SW devSW dev
Fixed InterfaceHW mgrHW mgr
HW engHW eng HW engHW eng PCB engPCB eng
Hardware Components
Hardware Components
Embedded SoftwareEmbedded Software
Colmar Workshop XILINX, Sept. 02 p 7
New System Partitioning• Flexible HW / SW partitioning
– Enables tradeoffs throughout the process
• Architecture redefinition possible– Tune for optimal performance and cost
HW TeamHW TeamHW TeamHW Team
SW TeamSW TeamSW TeamSW Team
SW TeamSW Team
HW TeamHW Team
Hardware Components
Hardware Components
Embedded SoftwareEmbedded Software
Flexible Interface
Colmar Workshop XILINX, Sept. 02 p 8
Innovative Partitioning
• New System Approach:– Enables non-traditional system architecture
• SW modules can be implemented in HW• HW modules can be moved to SW
– Requires a scalable and flexible platform that enables optimal HW / SW integration.
• Co-Design Methodology– Design attributes optimized during development
(Performances, resource usage,…)– SW developers and HW engineers create solutions
at module level for optimal systems
Colmar Workshop XILINX, Sept. 02 p 9
Agenda
• System Partitioning– Traditional techniques– Innovative approaches
• Example 1: DES Encryption Algorithm– HW solution compared to SW solution
• Example 2: Wireless LAN– HW / SW trade-off
• Enabling Technology: Virtex-II Pro
Colmar Workshop XILINX, Sept. 02 p 10
DES Overview
• DES Algorithm:– Message is split into fixed length blocks– Encode each block with fixed « key »– Block length = 64 bits (advanced 128-b), Key length = 56 bits
• 3DES Is An Enhanced Version of Encryption / Decryption– If Key 1 = Key 2 = Key 3, than 3DES is fully compatible with DES
Encrypt Decrypt Encrypt
Data
Key 1 Key 2 Key 3
Colmar Workshop XILINX, Sept. 02 p 11
System Integrator’s Dilemma
• DES Is Simple Algorithm• System Engineer Has To Evaluate:
– SW coding compare to HW implementation– Need for a specific processor and performances– Need for a dedicated solution– Cost effective solution of ASSP– Level of customization required– Fixed or flexible implementation
Colmar Workshop XILINX, Sept. 02 p 12
Architectural Options
• Popular DES Algorithm Is Available As SW code:– Public domain C or C++ code– Example of encryption data rate for 128-b DES :
• TMS320C62xx at 200 MHz delivers ~100 Mbps(*)• MIPS 64-b RISC at 250 MHz delivers ~400 Mbps(*)• Pentium III at 1 GHz delivers ~ 460 Mbps(*)
• HW Implementation Available At:– www.opencores.org– Over 1.5 Gbps data rate in Virtex-II at 130 MHz (*)
• 3DES 56-b Algorithm Achieves 10.7 Gbps Throughtput– Xilinx record-breaking announcement in April 2002
* Source: Helion Technology Limited, Xilinx Design Consultant (Xilinx Xcell journal Issue 43 Summer 2002)
Colmar Workshop XILINX, Sept. 02 p 13
Mixed HW / SW Solution
• Encryption / Decryption Data Path:– DES encryption module is called twice
– Decryption requires more compute power
Decrypt
EncryptDESDecryptionAlgorithm
Processor
DESEncryptionAlgorithm
Processor HW
HW
Data Flow
Data Flow
Colmar Workshop XILINX, Sept. 02 p 14
Full HW Implementation
• Full HW Implementation: – Shared Encryptor EncryptOther Tasks
ProcessorHW
Decrypt
Data Flow
• Full HW Pipelined Solution– Easy to add Parallelism– Easy to couple to distributed
processors
EncryptOr
No Processor?
Processor HW
Decrypt
Encrypt
Data Flow
Colmar Workshop XILINX, Sept. 02 p 15
Choices of HW / SW Partition
• Various Solutions To Fit Each Performances / Cost Requirement:
– SW vs HW vs mixed HW / SW
• New Approach: – On-Demand Architecture Synthesis to modify
HW / SW trade-off dynamically
• Distributed Processors Offer Another Level Of Flexibility Through Parallel Implementations
Colmar Workshop XILINX, Sept. 02 p 16
Agenda
• System Partitioning– Traditional techniques– Innovative approaches
• Example 1: DES Encryption Algorithm– HW solution compared to SW solution
• Example 2: Wireless LAN– HW / SW trade-off
• Enabling Technology: Virtex-II Pro
Colmar Workshop XILINX, Sept. 02 p 17
Networking Application: Wireless LAN
Intra Forwarding Technique: Video transmission
MPEG2
MPEG
2
FTP
File transfert: FTP
QoS
Colmar Workshop XILINX, Sept. 02 p 18
Physical Layer
Wireless LAN: Access point Architecture
Presentation Layer
Network Layer
Application Layer
Transport LayerSession Layer
Data Link LayerBus
HOST I/F
Medium Access ControlChannel Access Control
Colmar Workshop XILINX, Sept. 02 p 19
Wireless LAN: QoS• Wireless LAN example:
– Intra forwarding technique– Complex algorithms of
network access with few levels of prioritization in order to guarantee the QoS
• Select Most Urgent Frame – Choice is based on few
parameters: – priority (Po to Pn)– Lifetime (Normalized
Residual Lifetime , …CAP UP RLDIS NRL DB
Po Pn
256 Ptrs
64 Bits
Ptr of the Selected Frame
Ptr of the Received Frame
Pointer :
Colmar Workshop XILINX, Sept. 02 p 20
QoS: Full Hardware
• Design in FPGA:– FSM like design with adder/subtractor (~1000 LUT / 50MHz)– One table of pointers implemented in FPGA Block Ram
• 2 BRAM used for 4 priorities– Pipelining used– Easy to manage the Lifetime (update every 10 us)
• Complex Function in HW:– Electing two frames from one table of pointer by scrolling and comparison techniques
Table of ptr of frames to be transmitted
Elected ptr of Frame to transmitF11
F1F3F0
F10
Permutation
Colmar Workshop XILINX, Sept. 02 p 21
QoS: Full Software• Design in Firmware:
– Simple ~250 lines of C Code– Microprocessor used: PPC 405– One table of pointers per priority in external memory (SDRAM)– Sort algorithm very well known and easy to implement
• Complex Function in SW: – System Real Time Requirement– Frame lifetime controlled by a set of timers
• In the same time new frame is coming, existing frame should move from upper priority table
…..F41
F52
F7
F22
F11
F31
F10
F21
F1
F3
F0
F11
Highest PriorityTable
Elected ptr of Frame to transmit
Colmar Workshop XILINX, Sept. 02 p 22
QoS: Mixed HW / SW• Hardware Module:
– Liftetime and move ptr between tables– Design :
• FSM like design with adder/subtractor (~200 lut-50MHz)• 4 tables of pointers per priority with the FPGA Block Ram• Updated Lifetime by scrolling • Semaphore
• Software/Hardware interface: – Semaphore based communication
• Software Module:– Insertion and sort of the tables – Design :
• Easy to write (~200 lines of C Code)• Sort algorithm • Semaphore lib
F41
F52
F7
F22
…..
F41
F52
F7
F22
Colmar Workshop XILINX, Sept. 02 p 23
Design Solutions Comparison
• Full HW Solution– Full control of events timing and easy parallelism design– Complex HDL coding of the FSM
• State Machines architecture requires advanced expertise• Important validation time in design cycle
• Full SW Solution– Easy coding in C (sort algorithm) and flexibility– Difficult to handle real-time constraints
• Performances limitation by Von Neumann architecture (Proc.)
• Mixed HW / SW Solution: The Best Of The both Worlds– Offer advantages of HW and SW solution with the right
partitioning
Colmar Workshop XILINX, Sept. 02 p 24
Agenda
• System Partitioning– Traditional techniques– Innovative approaches
• Example 1: DES Encryption Algorithm– HW solution compared to SW solution
• Example 2: Wireless LAN– HW / SW trade-off
• Enabling Technology: Virtex-II Pro
Colmar Workshop XILINX, Sept. 02 p 25
Platform FPGA Architecture
• A Solution that provides:– IP Immersion
• The ability to integrate a wide variety of Hard & Soft IP
– A single Platform for multiple applications
– Total customization– Full Hardware and Firmware
upgradabilityHard-IP
Soft-IP
System Connectivity
HW functions
Colmar Workshop XILINX, Sept. 02 p 26
MGT
MGT
MGT
MGT
Fabric
PowerPC 405 Core
300+ MHz / 450+ DMIPSPerformance
Up to 4 per device
•
•
•
3.125 Gbps Multi-Gigabit Transceivers (MGTs)
Supports 10 Gbps standards
Up to 24 per device
•
•
• IP-Immersion™ Fabric• ActiveInterconnect™• 18Kb Dual-Port RAM• Xtreme™ Multipliers• 16 Global Clock Domains
Virtex-II Pro Platform FPGA
Colmar Workshop XILINX, Sept. 02 p 27
High-Bandwidth Communications
• Code (SW) and data are stored in BRAM, without any external resources
• On-Chip Memory (OCM) offers an unique data bandwidth between FPGA fabric (HW) and embedded PowerPC core (SW)
• High-Bandwidth Communications between distributed processors OCM™ Technology
BlockRAMs
I-Cache16KB MMU
Fetch & Decode
Timers and
Debug Logic
Execution Unit32x32b GPRALU, MAC
D-Cache16KB
AccelerationLogic
6.4Gb/sec
6.4Gb/sec
6.4Gb/sec
6.4Gb/sec
Colmar Workshop XILINX, Sept. 02 p 28
Flexibility of Programmable Systems
• Nearly all Systems are composed of:– Logic + Memory + Processor
• Virtex-II Pro enables optimum “system partitioning” between Hardware and Software
Performing SW tasksin HW is Inefficient
Performing HW tasksin SW is Slow
Provides the best of both worlds
Colmar Workshop XILINX, Sept. 02 p 29
Conclusion• Distributed Processors Allow Flexible HW / SW
Partitioning:– Optimal mapping at the module level– Offer to design with best solution of both worlds
• Virtex-II Pro The First Programmable System To Enable True Architectural Synthesis:– Unique bandwidth between embedded processors and HW – Unique on-chip solution provides an application-specific mix
of logic, memory, integrated processors, and high bandwidth I/O