Designing On-chip Memory Systems for Throughput Architectures
Network on Chip - Architectures and Design Methodology
description
Transcript of Network on Chip - Architectures and Design Methodology
Network on Chip -Architectures
and Design Methodology
Natt ThepayasuwanRohit Pai
“By the end of the decade, SOCs using 50-nm transistors operating below one volt, will grow to 4 billion transistors running at 10GHz”
-International Technology Roadmap for Semiconductors
• Synchronization with single clock source and negligible skew will be extremely difficult, if not impossible.
• Globally asynchronous – Locally synchronous
• Distributed System on single chip
• Global control of information traffic is unlikely to succeed.
• Autonomous data transfers by components
Challenges in SOC design
Challenges (cont)
• Electrical noise due to cross-talk, electro-magnetic interference (EMI) and radiation induced charge injection (soft errors) will be likely to produce data upsets.
The mere transmission of digital values on wires will be inherently unreliable.
Drawbacks of Bus
• Every unit attached adds additional parasitic
• Bus timing is difficult in deep sub micron process
• Bus testability is problematic
• Bus arbiter delay grows with number of masters
• Bandwidth is limited and shared
The Network-on-Chip
The seven layer OSI stack for communication !
Physical
Data link
Network
Transport
System
Application
Micro – Network Stack
Physical Layer
• Lowest level
• signal voltage, timing, bus widths and pulse shape
• Power consumption difficult to compute at this stage
• signal synchronization is a concern
Data Link
• Reliable transfer of data
• Error detection and correction
• Arbitration of physical medium
• MAC protocols – token ring and TDMA
• Arbitration scheme affects delay, throughput, power consumption
Network Layer
• Provides topology independent view of end to end communication to upper layers
• Data routes can be persistent / each transaction can be dynamically routed
• Congestion control may be required if dynamically routed
Transport Layer
• End –to-end connection !
• Flow control, packet reassembly, re-ordering
• Abstraction of network
• Formal method of communication
System- Session & Presentation
Session
• Adds state to end-to-end connections.
• Synchronous messaging requires sending and receiving components rendezvous as message is passed
• State maintained by a semaphore used as an indicator
• System components are CPU, DSP core, memory ….
Presentation
• Byte ordering format conversion
Application Layer
• Highest layer of abstraction
• eg: Embedded system performs video processing
• Separation of computation and communication
• Builds upon functionality of lower level
NOC Architectures
Platform based design
Same architecture for different application –speeds up
design process and reduces verification time
Issues- Generality / Performance
CLIQUE Architecture
Chip level Integration of Communicating Heterogeneous Elements
Regions & Wrappers
• Region : Area insulated from the network and has a different internal topology/ communication
• Allows resources of larger size than atomic mesh
• Connected to NOC by Wrappers, routes the packets to insulate from external traffic
• Wrappers convert messages messages to appropriate formats
Backbone-Platform-System
• Encapsulate design into reusable platforms
• Backbone (Region Type)
• Topological & communication issues
• channels, switches & network interface
• Performance evaluation of topologies
• Customized (wire-length,timing,physical) topology enables NOC where QoS is optimized in the beginning
BPS (cont)
• Platform (Region scaling) – Requires understanding of the functionality (System level control)
• Complexity and performance requirements
• Metrics – utilization,performance, capacity, temporal and spatial effectsCommunicationStructur
e
Processor
Hardware
Code
Configuration
BPS (cont)
• Application Development – (Resource level)
• control of network
• functionality of network
Switching Networks
• Circuit Switching
• Space switching- S (crossbar)
• Time switching – T : buffer to swap order of time-slices on TDMA links
• Adv: Formal guarantee of bandwidth
• Disadv: Lack of reactivity against changing communcation
• eg: not suited for random traffic b/w CPU and slaves
PROPHID (TST)
T
T
S
Packet Switching
• Routers as switching elements
• Header + Payload = Packet
• Routing decisions dynamic and distributed
• Very reactive
• What about latency?
Wormhole
• Extensive use in in high performance parallel computing
• Router does not wait for trailer
Head
Tail
SPIN
• Scalable, Programmable, Integrated Network
• 32 bit packets – header byte for destination address
• 256 terminals addressed
• Trailer has checksum for error detection
• Payload should be large
• Deterministic routing
• Latency independent
FAT – Tree
Router Design
• Area optimization (on-chip)
• Packets queued in FIFO at input leads to max contention
• Addition output buffers required
• Contention in the child links than father links
• Output buffers reduces cascaded contention
Beyond NOC
Currently used communication architectures on SoC
• Priority Based Shared Static Bus
• Time Division Multiplexing Access (TDMA) Shared Bus
Static Priority based shared Bus
TDMA
TDMA
Problems with both
Static Priority Based Shared Buslack of control over the allocation of communication bandwidth to different system components or data flows
TDMA Based Shared Bus significant latencies resulting from variations in the time-profile of the communication requests
Lottery Bus
Operation
The probability that bus is granted to Ci
The probability that a task with t tickets can access the bus after n lottery drawings:
Hardware (static)
Hardware (dynamic)
Comparing BUS with NOCBus says:Bus latency is zero once arbiter has granted control
Noc says: Internal n/w causes small latency
Bus says: Silicon cost of a bus is near zero
NOC says: Significant silicon area
Bus says: compatible with most Ips
NOC says: IPs need smart wrappers
NOC says: What do I do now?
Comparing NOC with BUS (cont)