French 207 MAPLD 2005 Slide 1 Integrated Tool Suite for Post Synthesis FPGA Power Consumption...

18
Slide 1 French 207 MAPLD 2005 Integrated Tool Suite for Post Synthesis FPGA Power Consumption Analysis Matthew French, Li Wang University of Southern California, Information Sciences Institute Tyler Anderson, Michael Wirthlin Brigham Young University

Transcript of French 207 MAPLD 2005 Slide 1 Integrated Tool Suite for Post Synthesis FPGA Power Consumption...

Slide 1French 207 MAPLD 2005

Integrated Tool Suite for Post Synthesis FPGA Power Consumption

Analysis

Matthew French, Li WangUniversity of Southern California, Information Sciences Institute

Tyler Anderson, Michael WirthlinBrigham Young University

Slide 2French 207 MAPLD 2005

FPGA Power Trends & Needs

1

10

100

1,000

10,000

100,000

1,000,000

Virtex Virtex-E Virtex-II Virtex-II Pro Virtex 4 LX

Xilinx Family

Number of F-F’s

Power (mW)

Clocking Frequency (MHz)

Voltage (V)

Internal Power Consumption

Power calculated assuming 80% device utilization, 80% peak clock frequency, 12.5% toggling rate. Internal logic only, no I/O.

• Number of logic blocks & maximum operating frequency track Moore’s Law• Voltage reduction is slower• Resulting power increase is exponential• Power needs to be a first class design constraint• Limited power tools available

– Spreadsheets• Manual entry

• Prone to guess-timation

– XPower (post-routing)• At end of design cycle

• Profiled after timing simulation• Time intensive

• Unwieldy file sizes

• Limited Reporting• Only total power consumed

• No ability to capture power transients

• Limited design path if specifications not met

• Routing tools optimize only throughput

Slide 3French 207 MAPLD 2005

Power Tools: Goals

• Push power analysis, visualization, and optimization to front of the tools chain:

– Analyze power consumption at logic simulation with two levels of accuracy

• Pre-place-and-route, using heuristic estimates based on fanout

• Back-annotated with precise post-place-and-route RC data

– Visualize by providing intuitive views to help the designer rapidly find and correct inefficient circuits, operating modes, data patterns, etc.

– Optimize systems by automatically identifying problem paths and suggesting improvements

• Benefits– Closer to logical level and design entry– Power profiling during functional simulation– Early estimation before place and route– Automatic specific resource utilization power

details – Facilitates high level design alternative

exploration

FPGA Tool Flow

Proposed Power Tool Entry Point

Current Power Tool Entry Point

Slide 4French 207 MAPLD 2005

Tool Backbone: JHDL & EDIF Parser

• Leverage JHDL simulation Environment with EDIF Parser circuit manipulation• JHDL

– Java-based structural design tool for FPGAs– Circuits described by creating Java Classes– Design libraries provided for several FPGA families– http://www.jhdl.org

• JHDL design aides– Logic simulator & waveform viewer– Circuit schematic & hierarchy browser – Module Generators

• Circuit designer does not need to know Java!

JHDLData

StructureEDIFNetlist

EDIFData

Structure

ManipulationTools

EDIFParser

3rd PartyTools

• EDIF Parser– Supports multiple EDIF files

– Virtex2 libraries and memory initialization

– Support for “black boxes”

– No JHDL wrapper required

– http://splish.ee.byu.edu/reliability/edif/

– Verified: Synplicity, Synplcity Pro, Coregen, System Generator, Chipscope

JHDL Environment

EDIF Parser

Slide 5French 207 MAPLD 2005

Power Tool Flow: Timing-Level

Source Code

Synthesis

Map Place & Route

Xpower

Bitgen

EDIF Parser

JHDLPower Analysis & Visualization

Routed Circuit Model

EDIF

VHDL Verilog JHDL

Xilinx Tool Flow

.ncd .ncd

To Target

.pwr

Power Tools

• Event Model Restructured– Tool Interoperability– Cross-probing Enabled

• Support dynamic insertion of 3rd party (Power) tools– Circuit APIs in place– Graphical User Interfaces (GUI) support

Slide 6French 207 MAPLD 2005

Power Visualization Tool

• Two views:– Instantaneous vs. cumulative power

consumption over time

– Sorted tree view of “worst offenders”

• Integrated “cross-probing” with existing JHDL tools

– Unified Environment

– Allows Experimentation

– Smart Re-use of CPU Memory

• Help rapidly identify inefficient circuits and operating modes

• Per-cell / per-bit granularity

• Simulation trigger on power specification

Cross Probing

Slide 7French 207 MAPLD 2005

Post Synthesis Level Power Modeling

• Power Modeling– Quiescent power based on total circuit size– Dynamic Power

• Toggle Rates (Data Dependant)• Components Used• Routing Interconnect

– Actual quiescent and dynamic power not known until circuit is placed and routed

• Leverage existing JHDL tool environment– Toggling rates derived from simulator

• Will lose glitching information– Components known from EDIF or JHDL primitives

• Component capacitance imported from Xpower

– How to model routing interconnect?• Do not have exact routing information at

synthesis• Routing tools can pick different route each

iteration– Interconnect length and combinations vary

))()((% WireComponentClock CapCapFreqtogglePower Component Cap

(pF)Component Cap

(pF)

FF 1.21 LUT 1.0

SRL 3.0 LD 1.0

INV 1.0 AND 1.0

RAM 1.0 MULT 17.2

DLL 40.0 IBUF 1.0

BUFG 6.0 BRAM 59.0

Xpower Component Capacitance

Interconnect Cap (pF)

Long Line 11.8

Hex Line 0.59

Double Line 0.44

Direct Connect 0.29

Xpower Interconnect Capacitance

Slide 8French 207 MAPLD 2005

Wire Power Model Analysis

• Developed power tools to analyze relationships

• Can plot capacitance vs – Fanout– Programmable Interconnect Points– Wire Length– Total Number of Nets– Total Number of Components

• Which relationships maintain correlation from synthesis to place and route?

– Optimizer removes components, nets

• Can also use tools to judge routing quality

– Identify Outliers– Information Available to do Power Weighted

Placement and Routing• Use Placement Macros in JHDL• Use UCF placement and/or timing

constraints

Optimization Candidates

Slide 9French 207 MAPLD 2005

Low Fanout Capacitance Variance

• Not all routes are created Equal

• Up to 60% variance on “same” route length

• East-West vs North-South Bias

• Switches sometimes use Doubles instead of Direct Connects

2.45 pF (#2727)

YQ -> F2 (omux-B3)

2.37 pF (#4791)

YQ -> G4 (omux-B4)

1.46 pF (#2768)

YQ -> F4 (omux-A2)

0.75 pF (#131)

YQ -> F2 (omux-A7)Direct Connect Double Wire

Direct vs Double

Switch Logic

Slide 10French 207 MAPLD 2005

Capacitance vs Fanout

• Fanout model well correlated

• Secondary fit line corresponds to Macros

• High variance at low fanout

• Achieving 4.3% average error, 16% variance

• Explored device utilization models as well

Placement Macros

Slide 11French 207 MAPLD 2005

Resulting Power Tool Flow

Source Code

Synthesis

Map Place & Route

Xpower

Bitgen

EDIF Parser

JHDLPower Analysis & Visualization

Virtex II Power Model

Routed Circuit Model

EDIF

VHDL Verilog JHDL

Xilinx Tool Flow

.ncd .ncdTo Target

.pwr

Power Tools

Slide 12French 207 MAPLD 2005

Power Optimization Approach

• Influence Xilinx Place&Route tools for power efficiency

– Minimize clock/wire lengths of high power nets

• Use power analysis tools to identify hot-spots and generate constraints

– Timing constraints on non-clock signals– Location constraints on sink flip-flops of clock signals

• Verify power optimization approaches– Use final circuit timing model to verify power savings

Timing Constraint

(ns)

Placement Constraint

(X,Y)

bitgenPlace & Route

Xilinx Tool Flow.ncdNgdbuild

& Map

.ncd

.ucf

EDIF Parser

Power Tools

EDIF

Optimization

Xpower

Tool Verification

vcd

ModelSim

vhd

Verification

Slide 13French 207 MAPLD 2005

Timing Constraint Power Optimization

• Wire power is optimized by reducing length

– MAXDELAY constraint in UCF file defines the maximum latency a wire has

• Power tools contain Wire Table database– Sortable by: Average power, Toggling rate,

Fanout, Load

– Apply constraints

Default Constraints Constraint Freq : 50 MHz Operating Freq : 50 MHz Poor Power Efficiency

Power Timing Constraints Constraint Freq : 100 MHz Operating Freq : 50 MHz Better Power Efficiency

Wire Table

Slide 14French 207 MAPLD 2005

Timing Constraint Power Optimization:

Preliminary Results

- Power is reduced by from –1.4% to 11.8%

- More constraints are not necessarily better

- Can also vary amount of timing that nets are constrained by

- Circuits still meet original timing specification requirements

% of total nets constrained

Clock (mW) Signal (mW) Total Power (mW)

Clock + Signal

Baseline, no constraints

N/A 442.5 19.9 462.4

All nets constrained

12.5% 439.3 29.4 468.7 (-1.4%)

Fanout < 10 constrained

11.1% 394.2 23.7 417.9 (9.6%)

Fanout < 4 constrained

10.6% 400.6 23.1 423.7 (8.4%)

Top 25% constrained

4.1% 384.5 23.4 407.9 (11.8%)

Slide 15French 207 MAPLD 2005

Location Constraint Power Optimization

• Power Optimization Guidelines

– Minimize clock zone utilization

– Group flip-flops as tightly as possible

– Group flip-flops closer to clock trunks

Less Power Efficient More Power Efficient

Reduce clock paths by putting constraints on flip-flops

locations, thus reducing the clock capacitance and power.

Slide 16French 207 MAPLD 2005

Location Constraint Power Optimization

Interface

• Clock table can be sorted by power, number of flip-flops etc.

• Users can select locations of flip-flops- Users can select how tightly flip-flops are placed- Users can define the area where flip-flops are placedThe tool checks the validity of constraint areas.- Users can select which flip-flop groups are added with the constraints

Clock Table

Slide 17French 207 MAPLD 2005

Location Constraint Power Optimization

Preliminary Results

Clock (mW) Signal (mW) Logic (mW) Total Power (mW) Clock + Signal + Logic

Baseline, no constraints

442.5 19.9 285.8 748.2

All FFs Placed

293.7 (33.6%)

27.6 (-38.8%)

255.4 (10.6%)

576.7 (22.9%)

IOs in IOBs, all other FFs placed

356,251 (19.5%)

21,909 (-10%)

285,787 (0%)

663,947 (11.3%)

- Individual clock net improvement ranged from -4% to 57%

- Achieve up to 22.9% total power improvement

- Circuits still meet timing requirement if IO buffer flip-flops are left in IOBs

- Power could be further reduced if IO buffer flip-flops are not constrained to be within IOBs

Unconstrained

Constrained

Slide 18French 207 MAPLD 2005

Conclusions

• Post-synthesis level power modeling is feasible– Some accuracy trade-offs inevitable– Quicker power results enable

• Capability to determine power specifications early in the design flow

• Feedback on design-level circuit power ramifications• Tighter feedback loop to designer for more design

iterations

• Optimization– Preliminary results encouraging– Tools do not alter original circuit functionality & use COTS

inputs– Developing optimization algorithms & routines

• Tools are open source: http://rhino.east.isi.edu• This research made possible by a grant from

the NASA Earth-Sun System Technology Office