An Asynchronous Implementation of a Divider · 2012-06-19 · •Sometimes retires 2 quotient...

46
Asynchronous Research Center Asynchronous Research Center AN ASYNCHRONOUS DIVIDER IMPLEMENTATION Navaneeth Jamadagni and Jo Ebergen

Transcript of An Asynchronous Implementation of a Divider · 2012-06-19 · •Sometimes retires 2 quotient...

Page 1: An Asynchronous Implementation of a Divider · 2012-06-19 · •Sometimes retires 2 quotient digits per iteration •An Asynchronous Design ... Two quotient digits sometimes 2. One

Asynchronous Research Center Asynchronous Research Center

AN ASYNCHRONOUS DIVIDER IMPLEMENTATION

Navaneeth Jamadagni

and

Jo Ebergen

Page 2: An Asynchronous Implementation of a Divider · 2012-06-19 · •Sometimes retires 2 quotient digits per iteration •An Asynchronous Design ... Two quotient digits sometimes 2. One

Asynchronous Research Center

Acknowledgements

•Oracle Labs

•DARPA

•Asynchronous Research Center

•Reviewers

2

Page 3: An Asynchronous Implementation of a Divider · 2012-06-19 · •Sometimes retires 2 quotient digits per iteration •An Asynchronous Design ... Two quotient digits sometimes 2. One

Asynchronous Research Center

Take Away

• Division Algorithm •Usually retires 1 quotient digit per iteration

• Sometimes retires 2 quotient digits per iteration

•An Asynchronous Design •Exploits the time disparity between

addition and shift operations

•Result • Improvement in speed

3

Page 4: An Asynchronous Implementation of a Divider · 2012-06-19 · •Sometimes retires 2 quotient digits per iteration •An Asynchronous Design ... Two quotient digits sometimes 2. One

Asynchronous Research Center

Outline

•Introduction

•Divine Division

•Hardware Design

•Results

•Conclusion

4

Page 5: An Asynchronous Implementation of a Divider · 2012-06-19 · •Sometimes retires 2 quotient digits per iteration •An Asynchronous Design ... Two quotient digits sometimes 2. One

Asynchronous Research Center

Outline

•Introduction

•Divine Division

•Hardware Design

•Results

•Conclusion

5

Page 6: An Asynchronous Implementation of a Divider · 2012-06-19 · •Sometimes retires 2 quotient digits per iteration •An Asynchronous Design ... Two quotient digits sometimes 2. One

Asynchronous Research Center

Division

6

Numerator, N = ((Quotient, Q) * (Denominator, D))

+ Remainder, R

Page 7: An Asynchronous Implementation of a Divider · 2012-06-19 · •Sometimes retires 2 quotient digits per iteration •An Asynchronous Design ... Two quotient digits sometimes 2. One

Asynchronous Research Center

Long Division, Example

7

375 15

2

30

75

5

– 75

0

Page 8: An Asynchronous Implementation of a Divider · 2012-06-19 · •Sometimes retires 2 quotient digits per iteration •An Asynchronous Design ... Two quotient digits sometimes 2. One

Asynchronous Research Center

Long Division

•Given a Numerator and a Denominator

1. Guess the quotient digit

a. Ten choices, from the set {0 … 9}

2. Multiply the denominator by your guess

3. Subtract the product from the remainder to get

a new remainder

4. Retire that quotient digit

5. Repeat steps 1 to 4 until the remainder is 0 or

you run out of time

8 It

erat

ion

Page 9: An Asynchronous Implementation of a Divider · 2012-06-19 · •Sometimes retires 2 quotient digits per iteration •An Asynchronous Design ... Two quotient digits sometimes 2. One

Asynchronous Research Center

Two Division Methods

• Multiplicative Methods • Many choices (e.g., {0 …. 28})

• Expensive multiplication

• Retires many quotient bits per iteration

• Few iterations

• Digit-Recurrence Methods • Very few choices (e.g., {0,1} or {-2, -1, 0, 1, 2})

• Inexpensive multiplication

• Retires one or two quotient bits per iteration

• Many iterations

9

Page 10: An Asynchronous Implementation of a Divider · 2012-06-19 · •Sometimes retires 2 quotient digits per iteration •An Asynchronous Design ... Two quotient digits sometimes 2. One

Asynchronous Research Center

SRT Division

•Most common implementation in microprocessors

•Carry-Save Additions or Subtractions

•Guess by Carry Propagate Addition

•Guess by Table Lookup

•After Sweeney, Robertson and Tocher

10

Page 11: An Asynchronous Implementation of a Divider · 2012-06-19 · •Sometimes retires 2 quotient digits per iteration •An Asynchronous Design ... Two quotient digits sometimes 2. One

Asynchronous Research Center

1. Guess : quotient digit from the set {–1, 0, 1}

2. Multiply : –1×D, 0×D, 1×D

SRT Division, Iteration

11

Page 12: An Asynchronous Implementation of a Divider · 2012-06-19 · •Sometimes retires 2 quotient digits per iteration •An Asynchronous Design ... Two quotient digits sometimes 2. One

Asynchronous Research Center

1. Guess : quotient digit from the set {–1, 0, 1}

2. Multiply : –1×D, 0×D, 1×D

3. Subtract or Add:

SRT Division, Iteration

12

CSA

CSA = Carry Save Addition

Page 13: An Asynchronous Implementation of a Divider · 2012-06-19 · •Sometimes retires 2 quotient digits per iteration •An Asynchronous Design ... Two quotient digits sometimes 2. One

Asynchronous Research Center

1. Guess : quotient digit from the set {–1, 0, 1}

2. Multiply : –1×D, 0×D, 1×D

3. Subtract or Add:

SRT Division, Iteration

13

CSA Table

Lookup

CPA

or

CPA = Carry Propagate Addition

Page 14: An Asynchronous Implementation of a Divider · 2012-06-19 · •Sometimes retires 2 quotient digits per iteration •An Asynchronous Design ... Two quotient digits sometimes 2. One

Asynchronous Research Center

1. Guess : quotient digit from the set {–1, 0, 1}

2. Multiply : –1×D, 0×D, 1×D

3. Subtract or Add :

4. Retire : One quotient digit always

SRT Division, Iteration

14

CSA Basis

for the Guess CPA = Carry Propagate Addition

CSA = Carry Save Addition

Page 15: An Asynchronous Implementation of a Divider · 2012-06-19 · •Sometimes retires 2 quotient digits per iteration •An Asynchronous Design ... Two quotient digits sometimes 2. One

Asynchronous Research Center

Outline

•Introduction

•Divine Division

•Hardware Design

•Results

•Conclusion

15

Page 16: An Asynchronous Implementation of a Divider · 2012-06-19 · •Sometimes retires 2 quotient digits per iteration •An Asynchronous Design ... Two quotient digits sometimes 2. One

Asynchronous Research Center

Divine Division

“Divine” = Discover by guess work or Intuition

•Carry-Save Addition or Subtraction only

•Two versions in the paper: E and H

•Developed at Sun Labs, by

Jo Ebergen, Ivan Sutherland

and Danny Cohen

16

Page 17: An Asynchronous Implementation of a Divider · 2012-06-19 · •Sometimes retires 2 quotient digits per iteration •An Asynchronous Design ... Two quotient digits sometimes 2. One

Asynchronous Research Center

Divine Division, Iteration

1. Guess : quotient digit from the set {-2, -1, 0, 1, 2}

2. Multiply : –2×D, -1×D, 0×D, 1×D, 2×D

17

Page 18: An Asynchronous Implementation of a Divider · 2012-06-19 · •Sometimes retires 2 quotient digits per iteration •An Asynchronous Design ... Two quotient digits sometimes 2. One

Asynchronous Research Center

Divine Division, Iteration

1. Guess : quotient digit from the set {-2, -1, 0, 1, 2}

2. Multiply : –2×D, -1×D, 0×D, 1×D, 2×D

3. Subtract or Add :

18

CSA

Page 19: An Asynchronous Implementation of a Divider · 2012-06-19 · •Sometimes retires 2 quotient digits per iteration •An Asynchronous Design ... Two quotient digits sometimes 2. One

Asynchronous Research Center

Divine Division, Iteration

1. Guess : quotient digit from the set {-2, -1, 0, 1, 2}

2. Multiply : –2×D, -1×D, 0×D, 1×D, 2×D

3. Subtract or Add :

19

CSA 2 MSBs

Page 20: An Asynchronous Implementation of a Divider · 2012-06-19 · •Sometimes retires 2 quotient digits per iteration •An Asynchronous Design ... Two quotient digits sometimes 2. One

Asynchronous Research Center

Divine Division, Iteration

1. Guess : quotient digit from the set {-2, -1, 0, 1, 2}

2. Multiply : –2×D, -1×D, 0×D, 1×D, 2×D

3. Subtract or Add :

20

Parity Bits

Majority Bits CSA

Page 21: An Asynchronous Implementation of a Divider · 2012-06-19 · •Sometimes retires 2 quotient digits per iteration •An Asynchronous Design ... Two quotient digits sometimes 2. One

Asynchronous Research Center

Divine Division, Iteration

1. Guess : quotient digit from the set {-2, -1, 0, 1, 2}

2. Multiply : –2×D, -1×D, 0×D, 1×D, 2×D

3. Subtract or Add :

1. Retire : One quotient digit usually, Two quotient digits sometimes

2. One more iteration than SRT for equal accuracy

21

Parity Bits

Majority Bits CSA

Page 22: An Asynchronous Implementation of a Divider · 2012-06-19 · •Sometimes retires 2 quotient digits per iteration •An Asynchronous Design ... Two quotient digits sometimes 2. One

Asynchronous Research Center 22

SUB2 &

2X* SUB1

& 2X*

SUB1 &

2X*

2X 2X*

4X* 4X*

2X*

2X*

4X*

2X

ADD1 &

2X* ADD2

& 2X*

ADD1 &

2X*

2X*

4X*

Va

lue

of

the

Re

ma

ind

er

= P

ari

ty +

Ma

jori

ty

4

3

2

1

0

-1

-2

-3

-4

Divine Division Choices

Page 23: An Asynchronous Implementation of a Divider · 2012-06-19 · •Sometimes retires 2 quotient digits per iteration •An Asynchronous Design ... Two quotient digits sometimes 2. One

Asynchronous Research Center 23

SUB2 &

2X* SUB1

& 2X*

SUB1 &

2X*

2X 2X*

4X* 4X*

2X*

2X*

4X*

2X

ADD1 &

2X* ADD2

& 2X*

ADD1 &

2X*

2X*

4X*

Va

lue

of

the

Re

ma

ind

er

= P

ari

ty +

Ma

jori

ty

4

3

2

1

0

-1

-2

-3

-4

Divine Division Choices

Page 24: An Asynchronous Implementation of a Divider · 2012-06-19 · •Sometimes retires 2 quotient digits per iteration •An Asynchronous Design ... Two quotient digits sometimes 2. One

Asynchronous Research Center 24

SUB2 &

2X* SUB1

& 2X*

SUB1 &

2X*

2X 2X*

4X* 4X*

2X*

2X*

4X*

2X

ADD1 &

2X* ADD2

& 2X*

ADD1 &

2X*

2X*

4X*

Va

lue

of

the

Re

ma

ind

er

= P

ari

ty +

Ma

jori

ty

4

3

2

1

0

-1

-2

-3

-4

Divine Division Choice: 4X*

Quotient = 0 and 0

Remainder = Left shift by 2

and Invert MSBs

Page 25: An Asynchronous Implementation of a Divider · 2012-06-19 · •Sometimes retires 2 quotient digits per iteration •An Asynchronous Design ... Two quotient digits sometimes 2. One

Asynchronous Research Center 25

SUB2 &

2X* SUB1

& 2X*

SUB1 &

2X*

2X 2X*

4X* 4X*

2X*

2X*

4X*

2X

ADD1 &

2X* ADD2

& 2X*

ADD1 &

2X*

2X*

4X*

Va

lue

of

the

Re

ma

ind

er

= P

ari

ty +

Ma

jori

ty

4

3

2

1

0

-1

-2

-3

-4

Divine Division Choice: 2X*

Quotient = 0

Remainder = Left shift by 1

and Invert MSBs

Page 26: An Asynchronous Implementation of a Divider · 2012-06-19 · •Sometimes retires 2 quotient digits per iteration •An Asynchronous Design ... Two quotient digits sometimes 2. One

Asynchronous Research Center 26

SUB2 &

2X* SUB1

& 2X*

SUB1 &

2X*

2X 2X*

4X* 4X*

2X*

2X*

4X*

2X

ADD1 &

2X* ADD2

& 2X*

ADD1 &

2X*

2X*

4X*

Va

lue

of

the

Re

ma

ind

er

= P

ari

ty +

Ma

jori

ty

4

3

2

1

0

-1

-2

-3

-4

Divine Division Choice: 2X

Quotient = 0

Remainder = Left shift by 1

Page 27: An Asynchronous Implementation of a Divider · 2012-06-19 · •Sometimes retires 2 quotient digits per iteration •An Asynchronous Design ... Two quotient digits sometimes 2. One

Asynchronous Research Center 27

SUB2 &

2X* SUB1

& 2X*

SUB1 &

2X*

2X 2X*

4X* 4X*

2X*

2X*

4X*

2X

ADD1 &

2X* ADD2

& 2X*

ADD1 &

2X*

2X*

4X*

Va

lue

of

the

Re

ma

ind

er

= P

ari

ty +

Ma

jori

ty

4

3

2

1

0

-1

-2

-3

-4

Divine Division Choice: SUB1 & 2X*

Quotient = 1

Remainder = SUB 1×D

& Left shift by 1

and Invert MSBs

Page 28: An Asynchronous Implementation of a Divider · 2012-06-19 · •Sometimes retires 2 quotient digits per iteration •An Asynchronous Design ... Two quotient digits sometimes 2. One

Asynchronous Research Center 28

SUB2 &

2X* SUB1

& 2X*

SUB1 &

2X*

2X 2X*

4X* 4X*

2X*

2X*

4X*

2X

ADD1 &

2X* ADD2

& 2X*

ADD1 &

2X*

2X*

4X*

Va

lue

of

the

Re

ma

ind

er

= P

ari

ty +

Ma

jori

ty

4

3

2

1

0

-1

-2

-3

-4

Divine Division Choice: SUB2 & 2X*

Quotient = 2

Remainder = SUB 2×D

& Left shift by 1

and Invert MSBs

Page 29: An Asynchronous Implementation of a Divider · 2012-06-19 · •Sometimes retires 2 quotient digits per iteration •An Asynchronous Design ... Two quotient digits sometimes 2. One

Asynchronous Research Center 29

SUB2 &

2X* SUB1

& 2X*

SUB1 &

2X*

2X 2X*

4X* 4X*

2X*

2X*

4X*

2X

ADD1 &

2X* ADD2

& 2X*

ADD1 &

2X*

2X*

4X*

Va

lue

of

the

Re

ma

ind

er

= P

ari

ty +

Ma

jori

ty

4

3

2

1

0

-1

-2

-3

-4

Divine Division Choice: ADD1 & 2X*

Quotient = -1

Remainder =

ADD 1×D &

Left shift by 1 and

Invert MSBs

Page 30: An Asynchronous Implementation of a Divider · 2012-06-19 · •Sometimes retires 2 quotient digits per iteration •An Asynchronous Design ... Two quotient digits sometimes 2. One

Asynchronous Research Center 30

SUB2 &

2X* SUB1

& 2X*

SUB1 &

2X*

2X 2X*

4X* 4X*

2X*

2X*

4X*

2X

ADD1 &

2X* ADD2

& 2X*

ADD1 &

2X*

2X*

4X*

Va

lue

of

the

Re

ma

ind

er

= P

ari

ty +

Ma

jori

ty

4

3

2

1

0

-1

-2

-3

-4

Divine Division Choice: ADD2 & 2X*

Quotient = -2

Remainder =

ADD 2×D &

Left shift by 1 and

Invert MSBs

Page 31: An Asynchronous Implementation of a Divider · 2012-06-19 · •Sometimes retires 2 quotient digits per iteration •An Asynchronous Design ... Two quotient digits sometimes 2. One

Asynchronous Research Center

Number of Iterations per Division

0

0,1

0,2

0,3

0,4

0,5

0,6

0,7

0,8

0,9

1

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26

Pro

ba

bil

ity

Number of Iterations per division (25-bit operands)

Divine Division SRT Division

31

Average = 22.6

• One million pairs of uniform-random 25-bit input operands

Page 32: An Asynchronous Implementation of a Divider · 2012-06-19 · •Sometimes retires 2 quotient digits per iteration •An Asynchronous Design ... Two quotient digits sometimes 2. One

Asynchronous Research Center

Outline

•Introduction

•Divine Division

•Hardware Design

•Results

•Conclusion

32

Page 33: An Asynchronous Implementation of a Divider · 2012-06-19 · •Sometimes retires 2 quotient digits per iteration •An Asynchronous Design ... Two quotient digits sometimes 2. One

Asynchronous Research Center

Asynchronous Divine Divider

•Control Path • Uses GasP modules

• Generates the control signals for the registers

• Delay matched to data path

• Shift steps are faster than addition steps

• Asynchronous loop counter

•Data Path • Registers and computational blocks (e.g., CSA)

• Single rail bundled data

33

Page 34: An Asynchronous Implementation of a Divider · 2012-06-19 · •Sometimes retires 2 quotient digits per iteration •An Asynchronous Design ... Two quotient digits sometimes 2. One

Asynchronous Research Center

Data Path

34

1. Guess the first Quotient

Digit

Page 35: An Asynchronous Implementation of a Divider · 2012-06-19 · •Sometimes retires 2 quotient digits per iteration •An Asynchronous Design ... Two quotient digits sometimes 2. One

Asynchronous Research Center

Data Path

35

2. Carry Save Add 3. Retires one digit 4. Guess the next

quotient digit

Page 36: An Asynchronous Implementation of a Divider · 2012-06-19 · •Sometimes retires 2 quotient digits per iteration •An Asynchronous Design ... Two quotient digits sometimes 2. One

Asynchronous Research Center

Data Path

36

2. Left shift by 1 3. Retires 1 quotient

digit, namely 0 4. Guess the next

quotient digit

Page 37: An Asynchronous Implementation of a Divider · 2012-06-19 · •Sometimes retires 2 quotient digits per iteration •An Asynchronous Design ... Two quotient digits sometimes 2. One

Asynchronous Research Center

Data Path

37

2. Left shift by 2 3. Retires 2

quotient digits, 0 and 0

4. Guess the next quotient digit

Page 38: An Asynchronous Implementation of a Divider · 2012-06-19 · •Sometimes retires 2 quotient digits per iteration •An Asynchronous Design ... Two quotient digits sometimes 2. One

Asynchronous Research Center

Outline

•Introduction

•Divine Division

•Hardware Design

•Results

•Conclusion

38

Page 39: An Asynchronous Implementation of a Divider · 2012-06-19 · •Sometimes retires 2 quotient digits per iteration •An Asynchronous Design ... Two quotient digits sometimes 2. One

Asynchronous Research Center

Simulation

• SPICE Simulation in TSMC 90nm

• Partial layout with estimated wire lengths

• 50 pairs of random test vectors

• Iteration statistics similar to 106 random test vectors

• Delay measurements are normalized to FO4 delays

• Comparisons 1FO4 ≈ 25ps in 90nm process

• Shift Path = 6 FO4, Add Path = 8.5 FO4

• Energy data estimated for Data and Control Paths

• Comparison normalized to 1V Vdd

39

Page 40: An Asynchronous Implementation of a Divider · 2012-06-19 · •Sometimes retires 2 quotient digits per iteration •An Asynchronous Design ... Two quotient digits sometimes 2. One

Asynchronous Research Center

Average Delay

•From SPICE • Shift Path = 6 FO4

• Add Path = 8.5 FO4

•Asynchronous Design • Sometimes retires two bits and Shift steps are quicker

• Average delay per bit is 6.3 FO4

• Synchronous Design • Sometimes retires two bits

• Average delay per bit is 7.4 FO4

40

Page 41: An Asynchronous Implementation of a Divider · 2012-06-19 · •Sometimes retires 2 quotient digits per iteration •An Asynchronous Design ... Two quotient digits sometimes 2. One

Asynchronous Research Center

06

16

16

10

07

07

00 04 08 12 16

Delay per Quotient bit Normalized

41

Jamadagni and Ebergen, 2012 Divine Division, CMOS

Williams and Horowitz, 1991, Radix-2 SRT, Domino, Async

Renaudin et al, 1996 Radix-2 SRT, LDCVSL, Async

Harris et al, 1996 Radix-4 SRT, CMOS, Sync

Async

Sync Delay in

FO4

Liu and Nannarelli, 2008, Radix-4, CMOS, Sync

Page 42: An Asynchronous Implementation of a Divider · 2012-06-19 · •Sometimes retires 2 quotient digits per iteration •An Asynchronous Design ... Two quotient digits sometimes 2. One

Asynchronous Research Center

118

92

112

275

0 100 200 300

Normalized by Vdd2 to 1V process

Energy per Division

42

Energy in pico Joules

Jamadagni and Ebergen, 2012, 25-bit Division, TSMC 90nm, Asynchronous

Liu and Nannarelli, 2008, 24-bit Division, STM 90nm, Synchronous

Liu and Nannarelli, 2008, 24-bit Division, STM 90nm, Synchronous-Low Power

Renaudin et al, 1996, 32-bit Division, 0.5um, Asynchronous – Low Power

Page 43: An Asynchronous Implementation of a Divider · 2012-06-19 · •Sometimes retires 2 quotient digits per iteration •An Asynchronous Design ... Two quotient digits sometimes 2. One

Asynchronous Research Center

Outline

•Introduction

•Divine Division

•Hardware Design

•Results

•Conclusion

43

Page 44: An Asynchronous Implementation of a Divider · 2012-06-19 · •Sometimes retires 2 quotient digits per iteration •An Asynchronous Design ... Two quotient digits sometimes 2. One

Asynchronous Research Center

Summary and Conclusion

• An Asynchronous design

• Exploits the average case behavior of the Divine Division algorithm

• Exploits the disparity in data path delays

• Future Work

• Add computation in the feedback path • Insert another data path

• Mitigate the effect of sequencing overhead

• Reduce power consumption • Controlling the data inputs to the adder

44

Page 45: An Asynchronous Implementation of a Divider · 2012-06-19 · •Sometimes retires 2 quotient digits per iteration •An Asynchronous Design ... Two quotient digits sometimes 2. One

Asynchronous Research Center

Questions?

45

Page 46: An Asynchronous Implementation of a Divider · 2012-06-19 · •Sometimes retires 2 quotient digits per iteration •An Asynchronous Design ... Two quotient digits sometimes 2. One

Asynchronous Research Center

Thank You

46