DAG-Aware AIG Rewriting A Fresh Look at Technology-Independent Combinational Logic Synthesis
description
Transcript of DAG-Aware AIG Rewriting A Fresh Look at Technology-Independent Combinational Logic Synthesis
Alan Mishchenko Satrajit Chatterjee Robert Brayton
UC Berkeley
DAG-Aware AIG Rewriting
A Fresh Look at Technology-Independent Combinational Logic Synthesis
Overview Motivation Previous work Example Experiments Conclusions
Motivation for Improved Synthesis Traditional combinational tech-independent synthesis
suboptimal complicated hard to implement slow
We propose to replace it with synthesis that is suboptimal, but simple easier to implement fast
Previous Work Per Bjesse and Arne Boralv, "DAG-aware circuit
compression for formal verification", ICCAD 2004
Pre-compute all two-level subgraphs group them into equivalence classes by functionality
For each node in the topological order Rewrite the subgraph rooted at a node, as long as area
(the number of nodes) does not increase Account for logic sharing (DAG-aware)
(Optional) Iterate until no improvement
Illustration
a b a c
Subgraph 1
b c
a
Subgraph 2
Pre-computing subgraphs Consider function f = abc
a c
b
Subgraph 3
Rewriting subgraphs
Rewriting node A
Rewriting node B
a b a c
a b a c
A
Subgraph 1
b c
a
A
Subgraph 2
b c
a
B
Subgraph 2
a b a c
B
Subgraph 1
In both cases 1 node is saved
Proposed Approach First, we introduce two concepts
NPN-classes of Boolean functions k-feasible cuts in the network
NPN-Classes of Boolean Functions Definition. Two functions belong to the same NPN-
class (are NPN-equivalent) if one of them can be derived from the other by permuting inputs, complementing inputs, and complementing the output
Example 2:
F = ab + c
F = ab
Example 1:
F = ab + c
G = ac + b
= a + bc
Example 3:
F = ab + c
H = a(b+c)
NPN-equivalent Not NPN-equivalent NPN-equivalent
k-Feasible Cuts Definition. A set of nodes C is a k-feasible cut for a
node n if (1) all the paths from the primary inputs (PIs) to node n pass through at least one node in C, (2) the number of nodes in C does not exceed k
k Average number of cuts per node
4 6
5 20
6 80
7 150
3-feasible cuts of n:
C1 = { p, k }
C2 = { a, b, s }
Not 3-feasible cuts of n:
C3 = { p, b, c }
C4 = { a, b, s, c }
n
p k
a b
s
c
PIs: a, b, c
Proposed Approach Pre-compute AIGs with non-redundant structure for
all NPN-classes of 4-variable functions The total number (222) Appear in benchmark circuits (~100) Account for 99% of area implement in rewriting (~40)
For each node in the topological order Compute all 4-input cuts (on average, 6 cuts per node) Match each cut into its NPN class (a hash table lookup) Try all structural implementations of the class, choose the best If the area (and delay!) does not increase, accept the change
Example of Rewriting (s27.blif)
The number of AIG nodes is reduced. The number of AIG levels is the same.
3 levels4 nodes
3 levels3 nodes
Experimental Results Cost functions for technology-independent synthesis
The number of factored form literals (the previous work) The number of AIG nodes and levels (the proposed work)
These cost functions are “apples” and “oranges”
For fairness, the comparison is done after mapping Using MCNC benchmarks (vs. SIS and MVSIS) Using IWLS 2005 benchmarks (vs. MVSIS)
ABC mapper for standard cells and FPGAs is used
Experimental Setup ABC script “resyn2”
“b; rw; rf; b; rw; rwz; b; rfz; rwz; b”
(performs 10 rewriting passes over the network) b (balance) tries to reduce delay without increasing area rw/rf (rewrite/refactor) tries to reduce area without increasing delay rwz/rfz the same as above, but allow for zero-cost replacements
MVSIS “mvsis.rugged” SIS “script.delay” and “script.rugged + speed_up” Runtimes are measured on 1.6GHz CPU
Experimental Results (MCNC)
Logic synthesis flow Standard cells FPGAs (k=5) used for optimization Area
retio Delay ratio
Area ratio
Delay ratio
Runtime ratio
No optimization 1.00 1.00 1.00 1.00 0.00 ABC (AIG rewriting) 0.87 0.96 0.93 0.98 1.00 MVSIS (mvsis.rugged) 0.91 1.10 0.93 1.03 7.12 SIS (script.delay) 0.94 0.99 0.98 0.97 ~100.00 SIS (script.rugged + speed_up) 0.94 0.90 0.98 0.94 ~1000.00
IWLS 2005: Benchmark statisticsIWLS Runtime of
benchmarks PI PO Latch AND2 Lev optimization and mapping MVSIS ABC Map
des_perf 234 64 8808 76716 17 1010.70 33.23 8.87 ethernet 98 115 2235 19654 27 50.39 3.81 0.99 mem_ctrl 115 152 1083 15191 28 17.20 1.95 0.72 pci_bridge32 162 207 3359 22742 22 51.57 3.14 1.46 systemcaes 260 129 670 12279 44 28.17 1.87 0.87 tv80 14 32 359 9503 42 50.07 1.75 0.72 usb_funct 128 121 1746 15670 23 66.59 2.29 0.96 vga_lcd 89 109 17079 126687 19 1998.27 32.55 8.51 wb_conmax 1130 1416 770 47535 18 2323.01 12.38 3.31 wb_dma 217 215 563 4044 22 12.67 1.09 0.30 … Ratio 24.53 1.00 0.40
IWLS 2005: Standard-Cell MappingIWLS Results of mapping into mcnc.genlib
benchmarks Original circuit Optim. by MVSIS Optim. by ABC Area Delay Area Delay Area Delay
des_perf 162228 14.10 155708 17.50 145133 14.80 ethernet 33949 22.40 29180 24.40 23142 21.30 mem_ctrl 25521 23.30 23537 26.50 15865 21.10 pci_bridge32 40322 18.60 35254 20.60 34860 17.70 systemcaes 21715 28.70 16483 34.60 16533 28.10 tv80 17396 33.70 13939 37.40 13568 33.90 usb_funct 27617 17.80 24386 28.30 23637 19.70 vga_lcd 240071 15.70 169276 16.50 201141 15.50 wb_conmax 82353 15.90 87082 17.60 66124 15.90 wb_dma 7805 18.90 6913 21.40 6675 18.60 … Ratio 1.00 1.00 0.88 1.22 0.83 0.97
IWLS 2005: FPGA Mapping (k=5)IWLS Results of mapping into 5-input LUTs
benchmarks Original circuit Optim. by MVSIS Optim. by ABC Area Delay Area Delay Area Delay
des_perf 19177 5 23406 5 19163 5 ethernet 4665 9 5170 9 4297 8 mem_ctrl 4854 9 4551 10 3191 9 pci_bridge32 6150 8 5888 9 5908 7 systemcaes 2547 9 2770 13 2329 10 tv80 2651 14 2594 14 2467 14 usb_funct 4530 7 4475 8 4030 7 vga_lcd 28458 8 28866 8 29562 7 wb_conmax 16073 7 17165 8 13370 7 wb_dma 1316 8 1283 9 1247 8 … Ratio 1.00 1.00 1.01 1.17 0.94 0.97
Conclusions Introduced ABC Presented DAG-aware AIG rewriting Showed promising experimental results Future work
AIG rewriting with larger cut size Sequential AIG rewriting