1 An Interconnect-Centric Approach to Cyclic Shifter Design David M. Harris Harvey Mudd College....

33
1 An Interconnect-Centric Approach to Cyclic Shifter Design David M. Har ris Harvey Mudd Coll ege. Haikun Zhu, Yi Zhu C.-K. Cheng Harvey Mudd College.
  • date post

    22-Dec-2015
  • Category

    Documents

  • view

    214
  • download

    1

Transcript of 1 An Interconnect-Centric Approach to Cyclic Shifter Design David M. Harris Harvey Mudd College....

Page 1: 1 An Interconnect-Centric Approach to Cyclic Shifter Design David M. Harris Harvey Mudd College. Haikun Zhu, Yi Zhu C.-K. Cheng Harvey Mudd College.

1

An Interconnect-Centric Approach to Cyclic Shifter Design

David M. HarrisHarvey Mudd College.

Haikun Zhu, Yi ZhuC.-K. Cheng

Harvey Mudd College.

Page 2: 1 An Interconnect-Centric Approach to Cyclic Shifter Design David M. Harris Harvey Mudd College. Haikun Zhu, Yi Zhu C.-K. Cheng Harvey Mudd College.

2

Outline

• Motivation• Previous Work• Approaches

• Fanout-Splitting• Cell order optimization by ILP

• Conclusions

Page 3: 1 An Interconnect-Centric Approach to Cyclic Shifter Design David M. Harris Harvey Mudd College. Haikun Zhu, Yi Zhu C.-K. Cheng Harvey Mudd College.

3

Motivation

• Interconnect dominates gate in present process technology• Delay, power, reliability, process variation, etc.

• Conventional datapath design focuses on logic depth minimization

Source: ITRS roadmap 2005

Page 4: 1 An Interconnect-Centric Approach to Cyclic Shifter Design David M. Harris Harvey Mudd College. Haikun Zhu, Yi Zhu C.-K. Cheng Harvey Mudd College.

4

Technology Trends

• Device (ITRS roadmap 2005, Table 40a)

• Updated Berkeley Predictive Interconnect Model

Gate length (nm) 90 65 % decreasing

Inter-layer dielectric constant

2.8 2.6 -

Capacitance of local interconnect (fF/um)

0.186 0.173 7.0%

Gate length (nm) 90 65 % decreasing

Vdd (V) 1.1 1.1 -

Vth (V) 0.195 0.165 -

NMOS gate Cap (fF/m) 0.573 0.469 18.2%

NMOS intrinsic delay (ps)

0.870 0.640 26.4%

Page 5: 1 An Interconnect-Centric Approach to Cyclic Shifter Design David M. Harris Harvey Mudd College. Haikun Zhu, Yi Zhu C.-K. Cheng Harvey Mudd College.

5

Shifter Taxonomy

• Functionality• Logical Shift: MSBs stuffed with 0’s• Arithmetic Shift: Extend original MSB• Cyclic Shift (rotation)• Bidirectional Shift

• Circuit Topology• Barrel Shifter• Logarithmic Shifter

Page 6: 1 An Interconnect-Centric Approach to Cyclic Shifter Design David M. Harris Harvey Mudd College. Haikun Zhu, Yi Zhu C.-K. Cheng Harvey Mudd College.

6

Barrel Shifter

Sh3Sh2Sh1Sh0

Sh3

Sh2

Sh1

A3

A2

A1

A0

B3

B2

B1

B0

: Control Wire

: Data Wire

BufferSh3Sh2Sh1Sh0

A3

A2

A1

A0

• Pros• Every data signal pass only one transmission gate

• Cons• Input capacitance is • # transistors =• Requires additional decoder for control signals

Schematic

layout

Page 7: 1 An Interconnect-Centric Approach to Cyclic Shifter Design David M. Harris Harvey Mudd College. Haikun Zhu, Yi Zhu C.-K. Cheng Harvey Mudd College.

7

Logarithmic Shifter

Sh1 Sh1 Sh2 Sh2 Sh4 Sh4

A3

A2

A1

A0

B1

B0

B2

B3

Schematic

layout

• Pros• # transistors =

• Cons• Long inter-stage wires, especially for cyclic shiftersTarget of Optimization

Page 8: 1 An Interconnect-Centric Approach to Cyclic Shifter Design David M. Harris Harvey Mudd College. Haikun Zhu, Yi Zhu C.-K. Cheng Harvey Mudd College.

8

Cyclic Shifter -- Applications• Finite Field Arithmetic

• In normal basis, squaring is done by cyclic shifting.• Encryption

• ShiftRows operation in Rijndael algorithm.• DCT processing unit

• Address generator• Bidirectional shifting

• Can be implemented as a cyclic shifter with additional masking logic

• CORDIC algorithm• etc …

Page 9: 1 An Interconnect-Centric Approach to Cyclic Shifter Design David M. Harris Harvey Mudd College. Haikun Zhu, Yi Zhu C.-K. Cheng Harvey Mudd College.

9

Previous Work

• Bit interleave• Two dimensional folding strategy• Gate duplicating• Ternary shifting• Comparison between barrel shifter and

log shifter

Page 10: 1 An Interconnect-Centric Approach to Cyclic Shifter Design David M. Harris Harvey Mudd College. Haikun Zhu, Yi Zhu C.-K. Cheng Harvey Mudd College.

10

Cyclic Shifter – Traditional Design

• MUX-based

S2

S1

S0

D7 D6 D5 D4 D3 D2 D1 D0

Z7 Z6 Z5 Z4 Z3 Z2 Z1 Z0

0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

(7,1) (6,1) (5,1) (4,1) (3,1) (2,1) (1,1) (0,1)

(7,2) (6,2) (5,2) (4,2) (3,2) (2,2) (1,2) (0,2)

(7,3) (6,3) (5,3) (4,3) (3,3) (2,3) (1,3) (0,3)

(7,0) (6,0) (5,0) (4,0) (3,0) (2,0) (1,0) (0,0)

• Shifting & non-shifting paths are intertwined together.

• Wire load on the critical path is

Timing

Power

• When configured to pass through, the non-shifting paths have to be switched as well.

Page 11: 1 An Interconnect-Centric Approach to Cyclic Shifter Design David M. Harris Harvey Mudd College. Haikun Zhu, Yi Zhu C.-K. Cheng Harvey Mudd College.

11

Fanout Splitting Shifter

• Use DEMUXes instead of MUXes

Z7 Z6 Z5 Z4 Z3 Z2 Z1 Z0

D7 D6 D5 D4 D3 D2 D1 D0

S2

S1

S0 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

(7,1) (6,1) (5,1) (4,1) (3,1) (2,1) (1,1) (0,1)

(7,2) (6,2) (5,2) (4,2) (3,2) (2,2) (1,2) (0,2)

(7,3) (6,3) (5,3) (4,3) (3,3) (2,3) (1,3) (0,3)

(7,0) (6,0) (5,0) (4,0) (3,0) (2,0) (1,0) (0,0)

Power

• When shifting, non-shifting paths are at rest.

• Shifting & non-shifting paths are now decoupled.

• Wire load on the longest path is

Timing

Page 12: 1 An Interconnect-Centric Approach to Cyclic Shifter Design David M. Harris Harvey Mudd College. Haikun Zhu, Yi Zhu C.-K. Cheng Harvey Mudd College.

12

Example

D7 D6 D5 D4 D3 D2 D1 D0

S2=1

S1=0

S0=1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

(7,1) (6,1) (5,1) (4,1) (3,1) (2,1) (1,1) (0,1)

(7,2) (6,2) (5,2) (4,2) (3,2) (2,2) (1,2) (0,2)

(7,3) (6,3) (5,3) (4,3) (3,3) (2,3) (1,3) (0,3)

(7,0) (6,0) (5,0) (4,0) (3,0) (2,0) (1,0) (0,0)

D0 D7 D6 D5 D4 D3 D2 D1

D0 D7 D6 D5 D4 D3 D2 D1

D0 D7 D6 D5D4 D3 D2 D1

Right rotate 5 bits

Red lines are signal linesGreen lines are quiet lines

Page 13: 1 An Interconnect-Centric Approach to Cyclic Shifter Design David M. Harris Harvey Mudd College. Haikun Zhu, Yi Zhu C.-K. Cheng Harvey Mudd College.

13

Dynamic Power Consumption

• Dynamic Power• Switching Probability

S2

S1

S00 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

(7,1) (6,1) (5,1) (4,1) (3,1) (2,1) (1,1) (0,1)

(7,2) (6,2) (5,2) (4,2) (3,2) (2,2) (1,2) (0,2)

(7,3) (6,3) (5,3) (4,3) (3,3) (2,3) (1,3) (0,3)

(7,0) (6,0) (5,0) (4,0) (3,0) (2,0) (1,0) (0,0)

MUX based design

DEMUX based design

SP= 1/4

SP= 3/16

SP = Switching Probability

S2

S1

S0 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

(7,1) (6,1) (5,1) (4,1) (3,1) (2,1) (1,1) (0,1)

(7,2) (6,2) (5,2) (4,2) (3,2) (2,2) (1,2) (0,2)

(7,3) (6,3) (5,3) (4,3) (3,3) (2,3) (1,3) (0,3)

(7,0) (6,0) (5,0) (4,0) (3,0) (2,0) (1,0) (0,0)

Page 14: 1 An Interconnect-Centric Approach to Cyclic Shifter Design David M. Harris Harvey Mudd College. Haikun Zhu, Yi Zhu C.-K. Cheng Harvey Mudd College.

14

Gate Complexity

• Re-factoring design• No extra complexity at gate level,

both are

MUX-based DEMUX-based

Page 15: 1 An Interconnect-Centric Approach to Cyclic Shifter Design David M. Harris Harvey Mudd College. Haikun Zhu, Yi Zhu C.-K. Cheng Harvey Mudd College.

15

DualityNAND gates network NOR gates network

• Duality provides flexibility for low level implementation• NAND gates are good for static CMOS.• NOR gates are good for dynamic circuits.

Page 16: 1 An Interconnect-Centric Approach to Cyclic Shifter Design David M. Harris Harvey Mudd College. Haikun Zhu, Yi Zhu C.-K. Cheng Harvey Mudd College.

16

Cell Permutation

• Datapath usually assumes bit-slice structure• The cell order of the input/output stages must be

fixed• However, the cells in the intermediate stages are

free to permute.7 6 5 4 3 2 1 0

7 6 5 4 3 2 1 0

7 6 5 4 3 2 1 0

7 6 5 4 3 2 1 0

>> 1-bit

>> 2-bit

>> 4-bit

fixed

fixed

free

Page 17: 1 An Interconnect-Centric Approach to Cyclic Shifter Design David M. Harris Harvey Mudd College. Haikun Zhu, Yi Zhu C.-K. Cheng Harvey Mudd College.

17

Problem Statement

• Given• A N-bit rotator• Fixed linear order of the input/output stages

• Find• An optimal permutation scheme of the inter

mediate stages such that the longest path is minimized (or, the total wire length s.t. delay constraint).

Page 18: 1 An Interconnect-Centric Approach to Cyclic Shifter Design David M. Harris Harvey Mudd College. Haikun Zhu, Yi Zhu C.-K. Cheng Harvey Mudd College.

18

ILP Formulation

• Introduce a set of binary decision variables • if and only if logic cell is at

physical location on level

• The solution space is fully defined by constraints

Page 19: 1 An Interconnect-Centric Approach to Cyclic Shifter Design David M. Harris Harvey Mudd College. Haikun Zhu, Yi Zhu C.-K. Cheng Harvey Mudd College.

19

ILP Formulation (cont’)

• Minimum delay formulation

• Minimum power formulation

Which can be expanded into

objective

Page 20: 1 An Interconnect-Centric Approach to Cyclic Shifter Design David M. Harris Harvey Mudd College. Haikun Zhu, Yi Zhu C.-K. Cheng Harvey Mudd College.

20

ILP Formulation (cont’)

• Represent the length of a single wire segment

• Formulating absolute operation

Psuedo-linear constraints discarded because we’re trying to minimize

Page 21: 1 An Interconnect-Centric Approach to Cyclic Shifter Design David M. Harris Harvey Mudd College. Haikun Zhu, Yi Zhu C.-K. Cheng Harvey Mudd College.

21

Complexity

• Minimum total wire length formulation• The case of one level of free cells is a minimum

weight bipartite matching problem

• For the case of two or more levels of free cells, optimal polynomial algorithm is unknown

• Hardness of minimum delay formulation un-established.

7 6 5 4 3 2 1 0

7 6 5 4 3 2 1 0

7 6 5 4 3 2 1 0

>> 1-bit

>> 2-bit

Logic index

Physical location

. . .

. . .

. . .

7

6

5

4

3

2

1

0

7

6

5

4

3

2

1

0

Page 22: 1 An Interconnect-Centric Approach to Cyclic Shifter Design David M. Harris Harvey Mudd College. Haikun Zhu, Yi Zhu C.-K. Cheng Harvey Mudd College.

22

ILP Complexity• The ILP formulation does not scale well

• Both #integer variables and #constraints are • CPLEX uses on branch & bound – exponential growth

• Sliding window scheme• Only cells in the window are allowed to permute• Consists of multiple passes; terminate when there is no

improvement between passes

WW = 8 columnsWH = 3 rowsHS = 4 columnsVS = 1 column

Page 23: 1 An Interconnect-Centric Approach to Cyclic Shifter Design David M. Harris Harvey Mudd College. Haikun Zhu, Yi Zhu C.-K. Cheng Harvey Mudd College.

23

Power & Delay Evaluation

• Overall Flow

Page 24: 1 An Interconnect-Centric Approach to Cyclic Shifter Design David M. Harris Harvey Mudd College. Haikun Zhu, Yi Zhu C.-K. Cheng Harvey Mudd College.

24

Optimal solution for 8-bit case

7 6 5 4 3 2 1 0

7 6 5 4 3 2 1 0

7 6 5 4 3 2 1 0

7 6 5 4 3 2 1 0

>> 1-bit

>> 2-bit

>> 4-bit

0 0 00 0 0 0 0

0 0 00 0 0 0 0 11111117

11111117

7 7 1 7 1 9 1 3 1 3 1 3 1 3 1 3

7 7 9 3 3 3 3 3

7 7 7 7 9 7 3 7 3 11 3 11 3 13 3 7

7 7 9 7 11 11 13 7

7 6 5 4 3 2 1 0

6 5 4 3 7 2 1 0

3 4 2 6 7 5 1 0

7 6 5 4 3 2 1 0

>> 1-bit

>> 2-bit

>> 4-bit

0 0 00 0 0 0 0

1 1 01 1 4 0 0 11130000

11141111

4 2 2 2 4 1 4 5 4 3 5 5 1 4 1 3

4 2 4 5 4 5 4 3

8 4 7 5 8 8 4 7 8 4 7 7 4 6 3 8

8 7 8 7 8 7 6 8

A global optimal solution

Page 25: 1 An Interconnect-Centric Approach to Cyclic Shifter Design David M. Harris Harvey Mudd College. Haikun Zhu, Yi Zhu C.-K. Cheng Harvey Mudd College.

25

16-bit and 32-bit cases16-bit, global optimal solution

32-bit, suboptimal solution by sliding window method

Page 26: 1 An Interconnect-Centric Approach to Cyclic Shifter Design David M. Harris Harvey Mudd College. Haikun Zhu, Yi Zhu C.-K. Cheng Harvey Mudd College.

26

Results

Page 27: 1 An Interconnect-Centric Approach to Cyclic Shifter Design David M. Harris Harvey Mudd College. Haikun Zhu, Yi Zhu C.-K. Cheng Harvey Mudd College.

27

Power-Delay Tradeoff

8-bit 16-bit

32-bit 64-bit

• Given Tmax constraint, optimize Ttotal

8-bit & 16-bit are global optimum by cplex

32-bit & 64-bit are suboptimal result by sliding window scheme

Page 28: 1 An Interconnect-Centric Approach to Cyclic Shifter Design David M. Harris Harvey Mudd College. Haikun Zhu, Yi Zhu C.-K. Cheng Harvey Mudd College.

28

Outline

• Motivation• Previous Work• Approaches

• Fanout-Splitting• Cell order optimization by ILP

• Conclusions and future work

Page 29: 1 An Interconnect-Centric Approach to Cyclic Shifter Design David M. Harris Harvey Mudd College. Haikun Zhu, Yi Zhu C.-K. Cheng Harvey Mudd College.

29

Conclusions & Future Work• We have proposed

• Fanout-splitting design• ILP based layout optimization

• Future directions• Extend the fanout splitting idea and ILP formulation to ternary shifter• Try alternative hierarchical approach to tackle the ILP complexity issu

e31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

Page 30: 1 An Interconnect-Centric Approach to Cyclic Shifter Design David M. Harris Harvey Mudd College. Haikun Zhu, Yi Zhu C.-K. Cheng Harvey Mudd College.

30

Thank you!

The End

Page 31: 1 An Interconnect-Centric Approach to Cyclic Shifter Design David M. Harris Harvey Mudd College. Haikun Zhu, Yi Zhu C.-K. Cheng Harvey Mudd College.

31

Derivation of switching probability

Gate level implementation ofMUX and DEMUX

Truth table

For DEMUX

For MUX

Page 32: 1 An Interconnect-Centric Approach to Cyclic Shifter Design David M. Harris Harvey Mudd College. Haikun Zhu, Yi Zhu C.-K. Cheng Harvey Mudd College.

32

Evaluating interconnect effect

• Based on logical effort model• Technology independent• Easy to incorporate interconnect

effectGate delay

Logical effort, only depends on gate typeElectrical effort, depends on load cap

Parasitic delay, only depends on gate typeWire load is integrated into h

Electrical effort contributed by wire per column spanned Wire length normalized to cell width

Page 33: 1 An Interconnect-Centric Approach to Cyclic Shifter Design David M. Harris Harvey Mudd College. Haikun Zhu, Yi Zhu C.-K. Cheng Harvey Mudd College.

33

Deciding

• Evaluate for a set of technology nodes

• It is safe to assume hw=1