Communication Synthesis: Buses and Network-on-Chip (NOC) Dr. Eng. Amr T. Abdel-Hamid ELECT 1002...
-
Upload
barnard-jones -
Category
Documents
-
view
214 -
download
0
Transcript of Communication Synthesis: Buses and Network-on-Chip (NOC) Dr. Eng. Amr T. Abdel-Hamid ELECT 1002...
Communication Synthesis:
Buses and Network-on-Chip (NOC)
Dr. Eng. Amr T. Abdel-Hamid
ELECT 1002
Spring 2008
System
-On
-a-Ch
ip
Desig
n
Dr. A
mr T
alaat
ELECT1002
So
C D
esign
The SoC nightmare
The architecture is tightly coupled
Source: Prof Jan Rabaey CS-252-2000 UC Berkeley
DMA CPU DSP
MemCtrl. Bridge
MPEGI oo
The “Board-on-a-Chip”Approach
C
System Bus
Peripheral Bus
Dr. A
mr T
alaat
ELECT1002
So
C D
esign
Very long wires
1 ns (1 GHz) 0.1 ns (10 GHz)
A
B
A
B
Year 2005 Year 2010
Dr. A
mr T
alaat
ELECT1002
So
C D
esign
Why NoC?
Global wire delays
increase exponentially or linearly by inserting repeaters The delay may exceed one clock cycle after repeater insertion In ultra-deep submicron processes, 80% or more of the delay of
critical paths will be due to interconnections Communication structures need to be designed first and then followed by
functional blocks
Dr. A
mr T
alaat
ELECT1002
So
C D
esign
Homogeneous SoC (MP-SoC)
CPU
MEM
CPU
MEM
CPU
MEM
CPU
MEM
CPU
MEM
CPU
MEM
CPU
MEM
CPU
MEM
Interconnection network (BUS, XBAR)
Dr. A
mr T
alaat
ELECT1002
So
C D
esign
Why not bus? Shared medium arbitrated bus, the most frequently used on-chip interconnect
architectures Pros
Simple, low area cost, and extensibility Cons
The intrinsic parasitic resistance and capacitance can be quite high for a long bus line
Every additional IP block adds to parasitic capacitance and causes increased propagation delay
The number of IP blocks that can be connected by the bus is limited
Dr. A
mr T
alaat
ELECT1002
So
C D
esign
On-Chip Communication
Bus based interconnect Low cost Easier to Implement Flexible
Networks on Chip Layered Approach Buses replaced with Networked
architectures Better electrical properties Higher bandwidth Energy efficiency Scalable
Irregular architectures Regular ArchitecturesBus-based architectures
Dr. A
mr T
alaat
ELECT1002
So
C D
esign
Network on Chip
Software
Transport
Network
Wiring
Separation of concerns
Software
Transport
Network
Wiring
Data Link Layer
Communication-based Design Orthogonalizes function and communication Builds on well-known models-of-computation and correct-by-
construction synthesis flow Parallels layered approach exploited by communications community
Dr. A
mr T
alaat
ELECT1002
So
C D
esign
NoCWhat is Network-on-Chip (NoC)?
• Leveraging existing computer networking principles to improve inter-component intra-chip communications for SoC.
• Each on chip component connected by switch to a particular comm wire(s)
• Improvement over standard bus based interconnections for SoC architectures in terms of throughput
Dr. A
mr T
alaat
ELECT1002
So
C D
esign
SOC Current Trend
Explicitly parallel SoC architectures
Integrating huge amounts of Memory in chip designs
Distributed Shared Memory Environments
Should allow Interconnection centric design flow and better predictability Physical design Closure Wire delay dominates gate delay
Dr. A
mr T
alaat
ELECT1002
So
C D
esign
Design goal of NoC
High throughput Low latency Less energy consumption Small area requirements
Network-on-Chip Basics: Architectures Routing Strategies Evaluation
IP Core
CNI
Router Logic
To/From Network
Figure 1: NoC Architecture
Dr. A
mr T
alaat
ELECT1002
So
C D
esign
Routing: Circuit/Packet Switching
Circuit Switching
• Dedicated path, or circuit, is established over which data packets will travel
• Naturally lends itself to time-sensitive guaranteed service due to resource allocation
• Reservation of bandwidth decreases overall throughput and increases average delays
Packet Switching
• Intermediate routers are now responsible for the routing of individual packets through the network, rather than following a single path
• Provides for so-called best-effort services
Dr. A
mr T
alaat
ELECT1002
So
C D
esign
Routing: Wormhole/Virtual Cut ThroughWormhole Switching
• Message is divided up into smaller, fixed length flow units called flits
• Only first flit contains routing information, subsequent flits follow
• Buffer size is significantly reduced due to the limitation on the number of flits needed to be buffered at any given time
Virtual Cut Through Switching
• Much like Wormhole switching
• Header flit can travel ahead and undergo processing while remaining flits are still navigating the network
• Higher acceptance rates and lower latencies than Wormhole
Dr. A
mr T
alaat
ELECT1002
So
C D
esign
Wormhole Switching
Dr. A
mr T
alaat
ELECT1002
So
C D
esign
Routing: Contention
•Contention occurs when routers or IP blocks attempt to send data over the same link at the same time
• For Circuit switching, contention is resolved at the time of actual connection setup
• For packet switching, contention resolution is handled at a much finer level, by the router buffering and scheduling individual packets of information
• Better overall performance for packet switched networks at the cost of lack of service guarantee
Dr. A
mr T
alaat
ELECT1002
So
C D
esign
Architectures: SPIN• SPIN: Scalable, Programmable, Integrated Network
• Every level has same number switches
• Network grows like (NlogN)/8
• Trades area overhead and decreased power efficiency for higher throughput
• Illustrative of performance vs. power consumption
Dr. A
mr T
alaat
ELECT1002
So
C D
esign
Architectures: CLICHE
•CLICHÉ: Chip-Level Integration of Communicating Heterogeneous Elements
• Two-dimensional mesh network layout for NoC design
• All switches are connected to the four closest other switches and target resource block, except those switches on the edge of the layout
• Connections are two unidirectional links
Dr. A
mr T
alaat
ELECT1002
So
C D
esign
Architectures: Torus
•Similar to mesh based architectures
• Wires are wrapped around from the top component to the bottom and rightmost to leftmost
• Smaller hop count
• Higher bandwidth
• Decreased Contention
• Increased chip space usage
Dr. A
mr T
alaat
ELECT1002
So
C D
esign
Architectures: Folded Torus•Similar to Torus
•Torus, the long end-around connections can yield excessive delays
•Avoided by folding the torus
Dr. A
mr T
alaat
ELECT1002
So
C D
esign
Architectures: Octagon•Standard model: 8 components, 12 interconnects
• Design complexity increases linearly with number of nodes
• Largest packet travel distance is two hops
• High throughput
• Shortest path routing easy to implement
Dr. A
mr T
alaat
ELECT1002
So
C D
esign
Architectures: BFT•BFT: Butterfly Fat Tree
• Each node in tree model has coordinates (level, position) where level is depth and position is from left to right
• Leaves are component blocks
• Interior nodes are switches
• Four child ports per switch and two parent ports
•LogN levels, ith level has n/(2^i+1) switches, n = leaves (blocks)
• Use traffic aggregation to reduce congestion
Dr. A
mr T
alaat
ELECT1002
So
C D
esign
Network interface
Open Core Protocol (OCP) An interface standard between IP cores and the interconnection
fabric
Dr. A
mr T
alaat
ELECT1002
So
C D
esign
Packet Format
Type: Head, Data, Tail and CompleteVCID: Virtual Channel IdentifierRoute: ‘N’ bit route field with last 2 bits specifying the Route to be used in the next controller
00 - Left01 - Right
10 - Straight11 - Extract
Data: Actual Data field
Dr. A
mr T
alaat
ELECT1002
So
C D
esign
Routing Example
Dr. A
mr T
alaat
ELECT1002
So
C D
esign
Simulation
A simulator is used to investigate various metrics:
•Each system consists of 256 functional IP blocks
•Wormhole routing is used
•User can choose uniform and localized traffic
•Support both Poisson and self-similar message injection distributions
A flit is only one word (36 bits, 4 bits are for packet framing).
Dr. A
mr T
alaat
ELECT1002
So
C D
esign
Area comparison
SPIN and Octagon have a considerably higher silicon area overhead.
Dr. A
mr T
alaat
ELECT1002
So
C D
esign
Projected performance