MINIMISING DYNAMIC POWER CONSUMPTION IN ON-CHIP NETWORKS Robert Mullins Computer Architecture Group...
-
date post
20-Dec-2015 -
Category
Documents
-
view
214 -
download
0
Transcript of MINIMISING DYNAMIC POWER CONSUMPTION IN ON-CHIP NETWORKS Robert Mullins Computer Architecture Group...
MINIMISING DYNAMIC POWER CONSUMPTION IN ON-CHIP NETWORKS
Robert Mullins
Computer Architecture Group
Computer Laboratory
University of Cambridge, UK
2/19
• Future performance gains will primarily come from increasing the number of IP cores in a system not their complexity or operating frequency
• Many reasons:– Diminishing returns from simply scaling what we have– Energy efficiency– Complexity – Fault tolerance– Economics
Communication-Centric Architectures
3/19
On-Chip Networks
• An efficient general purpose chip-wide communication infrastructure is becoming essential
• One flexible networking option is to use packet-switched networks with support for virtual-channels
4/19
The Lochside Router
• Router Architecture– Highly parameterised
implementation– Packet-switched network
with virtual-channel flow-control
– Best case latency is one cycle per network hop.
• Results presented here are from post P&R simulations targeting a 90nm technology
TILE
TrafficGenerator, Debug &
Test
R
Lochside Chip (2004/05) 180nm Technology
7/19
• Apply existing power saving techniques to an on-chip network design– e.g. clock and signal gating, gate-level optimisations
etc.– Importance of applying such techniques before
making comparisons• Measure power consumption and provide an
accurate breakdown of where the remaining power is dissipated
• Where is best place to look for future power savings?
Aims of this work
8/19
Measuring and Optimizing Dynamic Power
• Our Test Case– 8mm x 8mm die– 4x4 mesh network– Low-latency routers, best
case latency is one cycle per hop (incl. interconnect)
– 1.2V, 90nm technology– 4 input-buffers/ VC– 4 VC/ input port– 48 x 80-bit network links– 800MHz @ WC PVT
• ~32 FO4 clock period– Results reported at
250MHz
9/19
Interconnect Delay/Energy Trade-offs
• Power dissipated in network links depends on how links are spaced and buffered
• At least a factor of 3 difference in energy consumption over range of potential interconnect options
• Could move to low-swing differential schemes for even greater energy savings
For results we assume min. spaced wires, opt. energy x delay product
10/19
• Clock gating optimisations applied at two levels:– Local Clock Gating
• Automated clock gating within router• Some tuning of RTL involved to maximise
opportunities for synthesis tool
– Router Level Clock Gating• Exploit opportunities to gate clock as it enters the
router• Isolates router’s clock completely, only static
power consumption remains
Clock Gating
11/19
• Clock gating exposes clock tree insertion delay• Need to know early if router will be required• Generate ‘early valid’ signals in neighbouring routers
– Early-valid signals are slightly pessimistic – Based on what is requested not granted
Router-Level Clock Gating
12/19
• Automated signal gating and gate-level power optimisations had minimal impact
• Inserting signal gating logic manually did reduce input FIFO power requirements significantly
• The reported results could be further improved (by 12%) by enabling logic optimisation across module boundaries– This was restricted to accurately determine where
power is dissipated
Gate-Level Optimizations and Signal Gating
13/19
• Simple power optimisations can quarter power requirements + many more opportunities to save power
• Network is ~5% of core area• Perhaps 10% of system power at present• Don’t make comparisons without optimizing power!
Power consumption of a single router and its links
Analysis of Power Consumption
14/19
• 22% Static power, 11% Inter-Router Links• ~1% Global Clock tree• 65% Dynamic Power
– Power Breakdown• ~50% of dynamic power is consumed in local clock
tree and input FIFOs• ~30% on router datapath• ~20% on scheduling and arbitration
– Scheduling is probably more complex than typical implementations due to speculation
Analysis of Power Consumption
15/19
Low-Power On-Chip Networks
• Interconnect and static power set to increase– Many low-power link technologies
• Low-swing differential techniques
– Power gating and other leakage reduction techniques
• Potential power savings begin to require lots of different techniques – no one silver bullet?
16/19
Low-Power On-Chip Networks
• Topology– Don’t want to sacrifice general or at least multi-
purpose nature of our networked SoC– Results suggest higher radix routers and longer
interconnects could reduce power• Probably not a long term solution• Reduces path diversity, bad for fault-tolerance
• Architecture– Scope for minimising memory required to store
precomputed router schedule (particular to our router)– Simpler routers– Single cycle routers reduce power? Speculation for
low-power?
17/19
Supporting Best-Effort (BE) and Guaranteed Services (GS) Efficiently
• Current timing of the datapath and link suggests additional GS data could be routed in the same clock cycle– Allocate datapath/link to GS traffic for first ½ of clock
cycle
• Double capacity of network – Exploit simpler GS circuit-switched routing when
possible– Reduce power
• Very little additional overhead
18/19
• Network system timing issues are interesting– naturally event-driven not synchronous
• Work is investigating placing local data-driven clock generators in each network router– Clock is stretched when no data to be routed– Clock matches rate of incoming data streams – Robust synchronisation solution (true GALS)– Also investigating incorporating power gating support
• See also Distributed Clock Generator – DCG (Fairbanks/Moore)
Clocking On-Chip Networks
19/19
Challenges and Future Work
• These are early results in a much more rigorous study on the power requirements of networked on-chip comummunication– Much more soon!
• Exploiting a general-purpose on-chip network– Exploiting execution diversity to improve energy-efficiency – Multi-use platforms and Virtual-IP – Fault tolerance– Networks of processing elements or networks that process?
• Scope for removing unnecessary interfaces and boundaries• Impact of networking on IP and processor core design