Digital Arithmetic CSE 237D: Spring 2008 Topic #8 Professor Ryan Kastner.

97
Digital Arithmetic CSE 237D: Spring 2008 Topic #8 Professor Ryan Kastner

Transcript of Digital Arithmetic CSE 237D: Spring 2008 Topic #8 Professor Ryan Kastner.

Page 1: Digital Arithmetic CSE 237D: Spring 2008 Topic #8 Professor Ryan Kastner.

Digital Arithmetic

CSE 237D: Spring 2008 Topic #8

Professor Ryan Kastner

Page 2: Digital Arithmetic CSE 237D: Spring 2008 Topic #8 Professor Ryan Kastner.

Data Representation

Floating point representation Large dynamic range and high

precision Costly

Fixed point representation Requires fewer number of

resources Comparable performance Bitwidth analysis for trading off

estimation accuracy and the number of fixed-point bits

8 bits is sufficient

Page 3: Digital Arithmetic CSE 237D: Spring 2008 Topic #8 Professor Ryan Kastner.

Moorea Modem Receiver Specification

Note: 112 samples/symbol + 112 samples for channel clearing.

MatchingPursuitCore

MatchingPursuitCore

MatchingPursuitCore

MatchingPursuitCore

arg

min i

Generalized multiple hypothesis test (GMHT)

Page 4: Digital Arithmetic CSE 237D: Spring 2008 Topic #8 Professor Ryan Kastner.

Walsh/m-Sequence Waveforms

Chip rate – 5 kcps, approx. 5 kHz bandwidth. Uses 25 kHz carrier.

Use 7 chip m-sequence c per Walsh symbol, 8 bits per Walsh symbol bi. Composite symbol duration is thus T = 11.2 msec. (Longer than maximum multipath spread.)

Symbol rate is 266 bps, or 133 bps using 11.2 msec. time guard band for channel clearing.

11 msec.

Page 5: Digital Arithmetic CSE 237D: Spring 2008 Topic #8 Professor Ryan Kastner.

Transmitted Signal

1 1 -1 1 -1 -1-1 1 1 -1 1 -1 -1-1-1 -1 1 -1 1 1 1

Page 6: Digital Arithmetic CSE 237D: Spring 2008 Topic #8 Professor Ryan Kastner.

Walsh/m-sequence Signal Parameters

1 1 -1 1 -1 -1-1 1 1 -1 1 -1 -1-1-1 -1 1 -1 1 1 1

Page 7: Digital Arithmetic CSE 237D: Spring 2008 Topic #8 Professor Ryan Kastner.

8 Walsh Symbols

Page 8: Digital Arithmetic CSE 237D: Spring 2008 Topic #8 Professor Ryan Kastner.

Moorea Modem Receiver Specification

Note: 112 samples/symbol + 112 samples for channel clearing.

MatchingPursuitCore

MatchingPursuitCore

MatchingPursuitCore

MatchingPursuitCore

arg

min i

Generalized multiple hypothesis test (GMHT)

Channel Estimation

Page 9: Digital Arithmetic CSE 237D: Spring 2008 Topic #8 Professor Ryan Kastner.

Goal: Map matching pursuits to reconfigurable device Parameterizable – number of samples, data representation Tradeoffs - Provides designs with various area, latency, energy, …

Matching Pursuits Algorithm

Matching Pursuits Core

Reconfigurable System

MP( r, S, A, a ) 1 for i = 1, 2, …,SN // compute matched filter (MF) outputs 2 rSV T

ii 0 3 0if 4 0ig 5 end for 6 00 q // do successive interference cancellation 7 for j = 1, 2, …,fN // update MF outputs 8 11

1

jj qqjj AfVV

9 for k = 0, 1, …, 1SN 10 k

jkk avg

11 kjkk gvQ *)(

12 end for 13 }{maxarg

11,...,,k

qqkkj Qq

j

14 jj qq gf 15 end for 16 return (f)

CLB Block RAM IP Core (Multiplier)

* + -*

+

control control1jqf

0

iV

kg

kg

kQ

kQ

j

kV

iS iSkAka

r

System Design Tools

Page 10: Digital Arithmetic CSE 237D: Spring 2008 Topic #8 Professor Ryan Kastner.

In Depth: Data Representation

Page 11: Digital Arithmetic CSE 237D: Spring 2008 Topic #8 Professor Ryan Kastner.

History of Number Systems

Oldest Number System? Fingers, but only 10 Toes, but only 20 Base 10, “digit”al Roman schools taught finger counting

– multiplication/division on hands/toes

"Counting in binary is just like counting in decimal if you are all thumbs." ~ Glaser and Way

Sand Tables Stones in the sand Three grooves with up to ten stones per groove “Calculate” said to be derived from the Latin word "calcis“ because

limestone was used in the first sand tables.

"Base eight is just like base ten really, if you're missing two fingers." ~ Tom Lehrer

Page 12: Digital Arithmetic CSE 237D: Spring 2008 Topic #8 Professor Ryan Kastner.

Key Idea: Formal Notation

Notches on bones – 8500 BC in Africa, Europe

Count in multiples of some basic number 5 or 10 based on fingers Mayans used 360 Babylonians 60

Greeks, Romans extended this – fundamentally still the same

Positional notation key – same symbol in different spots has different meaning

Page 13: Digital Arithmetic CSE 237D: Spring 2008 Topic #8 Professor Ryan Kastner.

Numbers

Any number system requires: A set of digits A set of possible values for the digits Rules for interpreting the digits and values onto a number

Example: Roman Numerals Symbols used to represent a value Roman Numerals

1 = I 100 = C

5 = V 500 = D

10 = X 1000 = M

50 = L

For example: 2004 = MMVIII

Page 14: Digital Arithmetic CSE 237D: Spring 2008 Topic #8 Professor Ryan Kastner.

Unsigned Number Systems

Unsigned integer decimal systemsSet of digits represented by a digit vector X = (Xn-1, Xn-2,…,

X1, X0)

Set of values for the digits: Si = {0, 1, 2, …, 9}

Rules for determine number: Unsigned binary systems

Set of digits represented by a digit vector X = (Xn-1, Xn-2,…, X1, X0)

Set of values for the digits: Si = {0, 1}

Rules for determine number:

X = X i ⋅10i

i =0

n−1

X = X i ⋅2i

i =0

n−1

Page 15: Digital Arithmetic CSE 237D: Spring 2008 Topic #8 Professor Ryan Kastner.

Source: Parhami

Other Useful Encodings

Some 4-bit number representation formats

Base-2logarithm

Exponent in{2, 1, 0, 1}

Significand in{0, 1, 2, 3}

Page 16: Digital Arithmetic CSE 237D: Spring 2008 Topic #8 Professor Ryan Kastner.

Source: Parhami

Encoding Numbers in 4 Bits

0 2 4 6 8 10 12 14 16 2 4 6 8 10 12 14 16

Unsigned integers

Signed-magnitude

3 + 1 fixed-point, xxx.x

Signed fraction, .xxx

2’s-compl. fraction, x.xxx

2 + 2 floating-point, s 2 e in [ 2, 1], s in [0, 3]

2 + 2 logarithmic (log = xx.xx)

Number format

log x

s e e

Page 17: Digital Arithmetic CSE 237D: Spring 2008 Topic #8 Professor Ryan Kastner.

Source: Parhami

Sign and Magnitude Representation

0000 0001 1111

0010 1110

0011 1101

0100 1100

1000

0101 1011

0110 1010

0111 1001

0 +1

+3

+4

+5

+6 +7

-7

-3

-5

-4

-0 -1

+2-

+ _

Bit pattern (representation)

Signed values (signed magnitude)

+2 -6

Increment Decrement

-

Page 18: Digital Arithmetic CSE 237D: Spring 2008 Topic #8 Professor Ryan Kastner.

Source: Parhami

Sign and Magnitude Adder

Adder cc

s

x ySign x Sign y

Sign

Sign s

Selective Complement

Selective Complement

out in

Comp x

Control

Comp s

Add/Sub

Compl x

___ Add/Sub

Compl s

Selective complement

Selective complement

Page 19: Digital Arithmetic CSE 237D: Spring 2008 Topic #8 Professor Ryan Kastner.

Source: Parhami

Biased Representations

0000 0001 1111

0010 1110

0011 1101

0100 1100

1000

0101 1011

0110 1010

0111 1001

-8 -7

-5

-4

-3

-2 -1

+7

+3

+5

+4

0 +1

+2

+ _

Bit pattern (representation)

Signed values (biased by 8)

-6 +6

Increment Increment

Page 20: Digital Arithmetic CSE 237D: Spring 2008 Topic #8 Professor Ryan Kastner.

Source: Parhami

Arithmetic with Biased Numbers

Addition/subtraction of biased numbersx + y + bias = (x + bias) + (y + bias) – biasx – y + bias = (x + bias) – (y + bias) + bias

A power-of-2 (or 2a – 1) bias simplifies addition/subtraction

Comparison of biased numbers:Compare like ordinary unsigned numbersfind true difference by ordinary subtraction

We seldom perform arbitrary arithmetic on biased numbersMain application: Exponent field of floating-point numbers

Page 21: Digital Arithmetic CSE 237D: Spring 2008 Topic #8 Professor Ryan Kastner.

Source: Parhami

One’s Complement Number Representation

One’s complement = digit complement (diminished radix complement) system for r = 2

M = 2k – ulp

(2k – ulp) – x = xcompl

Range of representable numbers in with k whole bits:

from –2k–1 + ulp to 2k–1 – ulp

0000 0001 1111

0010 1110

0011 1101

0100 1100

1000

0101 1011

0110 1010

0111 1001

+0 +1

+3

+4

+5

+6 +7

-0

-4

-2

-3

-7 -6

-5

+ _

Unsigned representations

Signed values (1’s complement)

+2 -1

Page 22: Digital Arithmetic CSE 237D: Spring 2008 Topic #8 Professor Ryan Kastner.

Source: Parhami

Two’s Complement Number Representation

0000 0001 1111

0010 1110

0011 1101

0100 1100

1000

0101 1011

0110 1010

0111 1001

+0 +1

+3

+4

+5

+6 +7

-1

-5

-3

-4

-8 -7

-6

+ _

Unsigned representations

Signed values (2’s complement)

+2 -2

Two’s complement = radix complement system for r = 2

M = 2k

2k – x = [(2k – ulp) – x] + ulp = xcompl + ulp

Range of representable numbers in with k whole bits:

from –2k–1 to 2k–1 – ulp

Page 23: Digital Arithmetic CSE 237D: Spring 2008 Topic #8 Professor Ryan Kastner.

Source: Parhami

Two’s Complement Adder/Subtractor

Mux

Adder

0 1

x y

y or y _

s = x y

add/sub ___

c in

Controlled complementation

0 for addition, 1 for subtraction

c out

Can replace this mux with k XOR gates

Page 24: Digital Arithmetic CSE 237D: Spring 2008 Topic #8 Professor Ryan Kastner.

Source: Parhami

Sign and Magnitude vs Two’s Complement

Mux

Adder

0 1

x y

y or y _

s = x y

add/sub ___

c in

Controlled complementation

0 for addition, 1 for subtraction

c out

Adder cc

s

x ySign x Sign y

Sign

Sign s

Selective Complement

Selective Complement

out in

Comp x

Control

Comp s

Add/Sub

Compl x

___ Add/Sub

Compl s

Selective complement

Selective complement

Signed-magnitude adder/subtractor is significantly more complex than a simple adder

Two’s-complement adder/subtractor needs very little hardware other than a simple adder

Page 25: Digital Arithmetic CSE 237D: Spring 2008 Topic #8 Professor Ryan Kastner.

Fixed Point Representations

Allows us to use rational numbers: a/b Numbers represented in the form:

Unsigned mappings

Two’s complement mapping:

X = Xa−1Xa−2L X1X0.X−1X−2L Xb

X = X i ⋅2i

i=−b

a−1

X =1

2b

⎝ ⎜

⎠ ⎟ 2i

i =0

n−1

∑ ⋅X i

X =1

2b

⎝ ⎜

⎠ ⎟ −2n−1 ⋅Xn−1 + 2i ⋅X i

i =0

n−2

∑ ⎡

⎣ ⎢

⎦ ⎥

Page 26: Digital Arithmetic CSE 237D: Spring 2008 Topic #8 Professor Ryan Kastner.

Fixed Point Properties Resolution: Smallest non-zero magnitude

Directly related to the number of fractional bits (b) Unsigned binary fixed point: resolution = 1/2b

Range: Difference between most positive and most negative number Unsigned binary fixed point: range = 2a – 2-b

Largely dependent on number of integer bits

Accuracy: Magnitude of the max difference between a real value and its representation Unsigned binary fixed point: accuracy = 1/2b+1

Accuracy(x) = resolution(x)/2 If one fractional bit, worst possible number is ¼ (since it is ¼ from

both 0 and ½ which are representable with 1 fractional bit

Page 27: Digital Arithmetic CSE 237D: Spring 2008 Topic #8 Professor Ryan Kastner.

Example

Denote unsigned fixed point systems as U(a,b) Given fixed point number system U(6,2),

What is number does 8A16 represent?

What is the range of U(6,2)?What is the resolution?What is the accuracy?

Page 28: Digital Arithmetic CSE 237D: Spring 2008 Topic #8 Professor Ryan Kastner.

Rules of Fixed Point Arithmetic

Unsigned Wordlength U(a,b): a + b bits Signed Wordlength S(a,b): a + b + 1 bits Unsigned Range U(a,b): 0 ≤ x ≤ 2a – 2-b

Signed Range S(a,b): -2a ≤ x ≤ 2a – 2-b

Addition Z(a+1,b) = X(a1,b1) + Y(a2,b2) X and Y must be scaled i.e. a1= a2 and b1= b2

Unsigned Multiplication: U(a1,b1) x U(a2,b2) = U(a1 + a2, b1 + b2)

Signed Multiplication: S(a1,b1) x S(a2,b2) = S(a1 + a2 + 1, b1 + b2)

Page 29: Digital Arithmetic CSE 237D: Spring 2008 Topic #8 Professor Ryan Kastner.

In Depth: Arithmetic Operations

Page 30: Digital Arithmetic CSE 237D: Spring 2008 Topic #8 Professor Ryan Kastner.

1 Bit AdditionHalf Adder (HA)

Full Adder (HA)

A B

Cou

t

S

HA(2 : 2)

counter

A B

Cou

t

HA

HA

Cin

S

FA(3 : 2)

counter

Page 31: Digital Arithmetic CSE 237D: Spring 2008 Topic #8 Professor Ryan Kastner.

Half Adder Implementations

c

s

(b) NOR-gate half-adder.

x

y

x

y

(c) NAND-gate half-adder with complemented carry.

x

y

c

s

s

cx

y

x

y

(a) AND/XOR half-adder._

_

_c

Source: Parhami

Page 32: Digital Arithmetic CSE 237D: Spring 2008 Topic #8 Professor Ryan Kastner.

Full Adder Implementations

HA

HA

xy

c in

cout

(a) Built of half-adders.

s

(b) Built as an AND-OR circuit.

(c) Suitable for CMOS realization.

cout

s

c in

xy

0 1 2 3

0 1 2 3

xy

c in

cout

s

0

1

Mux

Source: Parhami

Page 33: Digital Arithmetic CSE 237D: Spring 2008 Topic #8 Professor Ryan Kastner.

Full Adder Implementations

(a) FA built of two HAs

(c) Two-level AND-OR FA (b) CMOS mux-based FA

1

0

3

2

HA

HA

1

0

3

2

0

1

x y

x y

x y

s

s s

c out

c out

c out

c in

c in

c in

Source: Parhami

Page 34: Digital Arithmetic CSE 237D: Spring 2008 Topic #8 Professor Ryan Kastner.

Bit Serial Addition

Perform addition one bit at a time Xi + Yi + C0-(i-1)

Result stored in registered that is right shifted Slow but small area

Page 35: Digital Arithmetic CSE 237D: Spring 2008 Topic #8 Professor Ryan Kastner.

Ripple Carry Adder

FA

A0B0

S0

FA

A1B1

S1

FA

A2B2

S2

Cin. . .FA

An-1Bn-1

Sn-1

Cout

n-bit Ripple Carry Adder

“Bit parallel adder” Area, delay? n-bit Two

Operand Adder

nn

AB

n

S

CinCout

Page 36: Digital Arithmetic CSE 237D: Spring 2008 Topic #8 Professor Ryan Kastner.

Another View of Ripple Carry Adder

A0

B0

G0 P0

A1

B1

G1 P1

A2

B2

G2 P2

A3

B3

G3 P3

C0

C4

Carry Network

Page 37: Digital Arithmetic CSE 237D: Spring 2008 Topic #8 Professor Ryan Kastner.

Faster Addition We need to break the carry chain The carry recurrence: ci+1 = gi + pi ° ci

Observation: Carry only propagates in certain situations

Bit positions 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 ----------- ----------- ----------- ----------- 1 0 1 1 0 1 1 0 0 1 1 0 1 1 1 0

cout 0 1 0 1 1 0 0 1 1 1 0 0 0 0 1 1 cin \__________/\__________________/ \________/\____/ 4 6 3 2

Carry chains and their lengths

. . . c

k 1

c

k c

k 2

c

1

g

p

1

1

g

p

0

0

g

p

k 2

k 2

g

p

k 1

k 1

c

0 c

2

Page 38: Digital Arithmetic CSE 237D: Spring 2008 Topic #8 Professor Ryan Kastner.

Manchester Adder

SCC

A0 B0

SCC

A1 B1

Cin

C2 C1

. . .SCC

An-1Bn-1

Cn-1

Ai Bi

Gi PiKi

KGP

1

0

CiCi+

1

SwitchedCarry Chain

(SCC)

Cout

Ai

Bi

Gi PiKi

Kill,Generate,Propagate

(KGP)

Page 39: Digital Arithmetic CSE 237D: Spring 2008 Topic #8 Professor Ryan Kastner.

Carry Look Ahead

A B C-out0 0 0 “kill”0 1 C-in “propagate”1 0 C-in “propagate”1 1 1 “generate”

G = A and BP = A xor B

A0

B0

A1

B1

A2

B2

A3

B3

S

S

S

S

GP

GP

GP

GP

C0 = Cin

C1 = G0 + C0 P0

C2 = G1 + G0 P1 + C0 P0 P1

C3 = G2 + G1 P2 + G0 P1 P2 + C0 P0 P1 P2

G

C4 = . . .

P

Page 40: Digital Arithmetic CSE 237D: Spring 2008 Topic #8 Professor Ryan Kastner.

Plumbing as Carry Lookahead Analogy

p0

c0g0

c1

p0

c0g0

p1g1

c2

p0

c0g0

p1g1

p2g2

p3g3

c4

Page 41: Digital Arithmetic CSE 237D: Spring 2008 Topic #8 Professor Ryan Kastner.

A0

B0

G0 P0

A1

B1

G1 P1

C0

C1

S1

S0

C2

2 bit CLA

0 0 0 0

P1 = P0P1

0 0

G1 = G0P1 + G1

0 0 0

2 Bit Carry Lookahead Adder

Page 42: Digital Arithmetic CSE 237D: Spring 2008 Topic #8 Professor Ryan Kastner.

Source: Parhami

4 Bit Carry Look Ahead

Complexity reduced by deriving the carry-out indirectly, but increases critical path

g0

g1

g2

g3

c0

c4

c1

c2

c3

p3

p2

p1

p0

Full carry lookahead is quite practical for a 4-bit adder

c1 = g0 c0 p0

c2 = g1 g0 p1 c0 p0 p1

c3 = g2 g1 p2 g0 p1 p2 c0 p0 p1 p2

c4 = g3 g2 p3 g1 p2 p3 g0 p1 p2 p3 c0 p0 p1 p2 p3

Page 43: Digital Arithmetic CSE 237D: Spring 2008 Topic #8 Professor Ryan Kastner.

Carry Look Ahead, multiple levels

c0g0

p0

c1g1

p1

c2g2

p2

c3g3

p3

A0

B0

A1

B1

A2

B2

A3

B3

C0

C4

0

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

C0

G0

P0

C1

G1

P1

C2

G2

P2

C3

G3

P3

c0

C16

.

Page 44: Digital Arithmetic CSE 237D: Spring 2008 Topic #8 Professor Ryan Kastner.

Cascaded Carry Look-ahead (16-bit): Abstraction

CLA

4-bitAdder

4-bitAdder

4-bitAdder

C1 = G0 + C0 P0

C2 = G1 + G0 P1 + C0 P0 P1

C3 = G2 + G1 P2 + G0 P1 P2 + C0 P0 P1 P2

GP

G0P0

C4 = . . .

C0

Page 45: Digital Arithmetic CSE 237D: Spring 2008 Topic #8 Professor Ryan Kastner.

Carry Lookahead Generator Plumbing Analogy

p0g0

p1g1

p2g2

p3g3

G0

p1

p2

p3

P0

Page 46: Digital Arithmetic CSE 237D: Spring 2008 Topic #8 Professor Ryan Kastner.

P1

0

A0

B0

A1

B1

2 bit CLA

A2

B2

A3

B3

2 bit CLA

G1

0

P1

1

G1

1

C0

P2 = P0P1

1 1

P1

1

G2 = G0P1 + G11 1 1

C4 = C0P0P1 + G0P1 + G11 1 1 1 1

2 bit CLG

C2

4 Bit Hierarchical CLA

Page 47: Digital Arithmetic CSE 237D: Spring 2008 Topic #8 Professor Ryan Kastner.

A0

B0

A1

B1

2 bit CLA

G1

0P1

0

A2

B2

A3

B3

2 bit CLA

G1

1P1

1

A4

B4

A5

B5

2 bit CLA

G1

2P1

2

A6

B6

A7

B7

2 bit CLA

G1

3P1

3

2 bit CLG2 bit CLG

2 bit CLG

G2

0P2

0G2

1P2

1

C0

C4

C2

C6

C8

8 Bit Hierarchical CLA

Page 48: Digital Arithmetic CSE 237D: Spring 2008 Topic #8 Professor Ryan Kastner.

Design Trick: Guess (or “Precompute”)

n-bit adder n-bit adderCP(2n) = 2*CP(n)

n-bit adder n-bit addern-bit adder 1 0

Cout

CP(2n) = CP(n) + CP(mux)

Carry-select adder

Page 49: Digital Arithmetic CSE 237D: Spring 2008 Topic #8 Professor Ryan Kastner.

FA

A0B0

S0

FA

A1B1

S1

FA

A2B2

S2

Cin. . .FA

An-1Bn-1

Sn-1

Cout

......n – 1FF

......... n – 1FF

n – 2FF

n – 3FF

Pipelined Ripple Carry Adder

Page 50: Digital Arithmetic CSE 237D: Spring 2008 Topic #8 Professor Ryan Kastner.

Multiple Operand Addition

Many applications require summation of many operands

What is best way to compute this?

• • • • a • • • • x ---------- • • • • x a • • • • x a • • • • x a • • • • x a ---------------- • • • • • • • • p

0 1 2 3

0 1 2 3

2 2 2 2

• • • • • • p • • • • • • p • • • • • • p • • • • • • p • • • • • • p • • • • • • p • • • • • • p ----------------- • • • • • • • • • s

(0) (1) (2) (3) (4) (5) (6)

Inner ProductMultiplication

Page 51: Digital Arithmetic CSE 237D: Spring 2008 Topic #8 Professor Ryan Kastner.

Terminology

Page 52: Digital Arithmetic CSE 237D: Spring 2008 Topic #8 Professor Ryan Kastner.

Serial Implementation

Two Operand Carry Propagate Adder

Si[n + log i]Oi[n]

Si+1[n + log (i+1)]

Register S

Tserial-multi-add = O(m log(n + log m))

= O(m log n + m log log m)

Therefore, addition time grows superlinearly with n when k is fixed and logarithmically with k for a given n

Page 53: Digital Arithmetic CSE 237D: Spring 2008 Topic #8 Professor Ryan Kastner.

Parallel Implementation

CPA

O1[n]O2[n]

CPA

O3[n]O4[n]

CPA

CPA

CPA

CPA

O5[n]O6[n]

CPA

O7[n]O8[n]

CPA

CPA

Om-

1[n]Om-

2[n]

CPA

Om-

1[n]Om[n

]

CPA

CPA

O(log m)CPA Tree

. . .

S[n + log m]

Ttree-fast-multi-add = O(log n + log(n + 1) + . . . + log(n + log2m – 1))

= O(log m log n + log m log log m)

Can we do this faster?

Page 54: Digital Arithmetic CSE 237D: Spring 2008 Topic #8 Professor Ryan Kastner.

Carry Save Adder (CSA)

. . .

n-bit Carry Save

AdderFA

O1[n]O2[n]

S[n]

O3[n]

C[n]

n-bit Carry Save Adder

nn

O1[n]O3[n]

n

S[n]

n

O2[n]

n

C[n]

FA

O1[1]O2[1]

S[1]

O3[1]

C[1]

FA

O1[2]O2[2]

S[2]

O3[2]

C[2]

Page 55: Digital Arithmetic CSE 237D: Spring 2008 Topic #8 Professor Ryan Kastner.

Source: Parhami

Carry Save Adders

FA FAFA FA FAFA

FA FAFA FA FAFA

Cut

Carry-propagate adder

Carry-save adder (CSA) or (3; 2)-counter or 3-to-2 reduction circuit

c

in

c

out

Carry propagate adder (CPA) and carry save adder (CSA) functions in

dot notation.

Half-adder

Full-adder

Specifying full- and half-adder blocks, with their inputs and outputs, in

dot notation.

Page 56: Digital Arithmetic CSE 237D: Spring 2008 Topic #8 Professor Ryan Kastner.

Ci[n + log i]

Ci+1[n + log (i+1)]

Carry Save Adder

Si[n + log i]Oi[n]

Si+1[n + log (i+1)]

Register C Register C

Serial CSA Implementation

Tserial-csa-multi-add = O(m)

In the end there are two operands (C, S)

Page 57: Digital Arithmetic CSE 237D: Spring 2008 Topic #8 Professor Ryan Kastner.

S[n]C[n] S[1]C[1]S[2]C[2]S[3]C[3]…

Carry Propagate Adder

Bit 1Bit 2

S[i]C[i-1]…

Bit i-1

……HA

T[n+2]T[n+1]T[i+1] T[3] T[2] T[1]

Cout

Final Reduction (2:1)

Page 58: Digital Arithmetic CSE 237D: Spring 2008 Topic #8 Professor Ryan Kastner.

CSA

O4[n]O5[n]

S1[n:1]

O6[n]

C1[n+1:2]

CSA

S1[n:1]C1[n+1:2]

S2[n+1:1]C2[n+2:3]

CSA

S3[n+2:1]C3[n+2:3]

O1[n]O2[n]O3[n]

CSA

O6[n]: xxxx O5[n]: xxxx+ O4[n]: xxxx S1[n:1]: xxxx C1[n+1:2]:xxxx

C1[n+1:2]: xxxx S1[n:1]: xxxx + C1[n+1:2]: xxxx S2[n+1:1]: xxxxx C2[n+2:3]:xxxx

S1[n:1]: xxxx S2[n+1:1]: xxxxx + C2[n+2:3]: xxxx S3[n+2:1]: xxxxxx C3[n+2:2]: xxxxx

Page 59: Digital Arithmetic CSE 237D: Spring 2008 Topic #8 Professor Ryan Kastner.

Carry Save Arithmetic

CSA CSA

CSA

CSA

+

A B C D E F

Delay = 3 + log2(M + 3)

3 = height of CSA tree

M = bitwidth of operands

S

S

S

SCC

C

C

F

CLA

Tree height = log1.5(N/2)

Page 60: Digital Arithmetic CSE 237D: Spring 2008 Topic #8 Professor Ryan Kastner.

Carry Save Arithmetic

RCA

RCA

RCA

RCA

RCA

(M +1)

Delay = (M+5) + 4

Delay comparison

0

20

40

60

80

100

120

2 6 10 14 18 22 26 30 34 38 42 46 50

# of operands

Del

ay (

full

ad

der

del

ays)

RCA

CSA

Area comparison

0

500

1000

1500

2000

# Operands

Are

a (f

ull

ad

der

un

its)

RCA

CSA

Using Ripple carry adders (RCAs)

(M +2)(M +3)(M +4)

(M +5)

Delay thru CSA network =

3 + log1.5(M + 3)

Page 61: Digital Arithmetic CSE 237D: Spring 2008 Topic #8 Professor Ryan Kastner.

Source: Parhami

Example Reduction by a CSA Tree

12 FAs

6 FAs

6 FAs

4 FAs + 1 HA

7-bit adder

Total cost = 7-bit adder + 28 FAs + 1 HA

Addition of seven 6-bit numbers in dot notation.

8 7 6 5 4 3 2 1 0 Bit position

7 7 7 7 7 7 62 = 12 FAs 2 5 5 5 5 5 3 6 FAs

3 4 4 4 4 4 1 6 FAs

1 2 3 3 3 3 2 1 4 FAs + 1 HA

2 2 2 2 2 1 2 1 7-bit adder

--Carry-propagate adder--

1 1 1 1 1 1 1 1 1

Representing a seven-operand addition in tabular form.

A full-adder compacts 3 dots into 2 (compression ratio of 1.5)

A half-adder rearranges 2 dots (no compression, but still useful)

Page 62: Digital Arithmetic CSE 237D: Spring 2008 Topic #8 Professor Ryan Kastner.

Source: Parhami

 

Wallace and Dadda Reduction Trees

6 FAs

11 FAs

7 FAs

4 FAs + 1 HA

7-bit adder

Total cost = 7-bit adder + 28 FAs + 1 HA

Adding seven 6-bit numbers using Dadda’s strategy.

12 FAs

6 FAs

6 FAs

4 FAs + 1 HA

7-bit adder

Total cost = 7-bit adder + 28 FAs + 1 HA

Addition of seven 6-bit numbers in dot notation.

Wallace tree: Reduce the number of operands at the earliest possible opportunity

Dadda tree: Postpone the reduction to the extent possible without causing added delay

h n(h) 2 4 3 6 4 9 5 13 6 19

Page 63: Digital Arithmetic CSE 237D: Spring 2008 Topic #8 Professor Ryan Kastner.

Source: Parhami

Generalized Parallel Counters

(5, 5; 4)-counter Dot notation for a (5, 5; 4)-counter and the use of such counters for reducing five

numbers to two numbers.

. . .

Multicolumn reduction

(2, 3; 3)-counter

Unequal columns

Gen. parallel counter = Parallel compressor

Page 64: Digital Arithmetic CSE 237D: Spring 2008 Topic #8 Professor Ryan Kastner.

Compressors

Compressors allow for carry in and carry outs

FA

O2[i]O3[i]O4[i]

Cout[i]

FA

Cin[i]

O1[i]

S[i]C[i]

FA

O2[i-1]O3[i-1]O4[i-1]

Cout[i-1]

FA

Cin[i-1]

O1[i-1]

S[i-1]C[i-1]

Bit i Bit i-1

[4:2] Compress

or

[4:2] Compres

sor

Page 65: Digital Arithmetic CSE 237D: Spring 2008 Topic #8 Professor Ryan Kastner.

[4 : 2] Compressor Adder

. . .

n-bit [4:2] Adder

n-bit [4:2] Adder

nn

O1[n]O3[n]

n

S[n]

n

O2[n]

n

C[n]

n

O4[n]

[4:2] Compressor

O1[1]O3[1]

S[1]

O2[1]

C[1]

O4[1]

[4:2] Compressor

O1[2]O3[2]

S[2]

O2[2]

C[2]

O4[2]

[4:2] Compressor

O1[n]O3[n]

S[n]

O2[n]

C[n]

O4[n]

Page 66: Digital Arithmetic CSE 237D: Spring 2008 Topic #8 Professor Ryan Kastner.

Higher Order Compressors

FA

O3[i]O4[i]O5[i]

FA

O1[i]O2[i]

S[i]C[i]

FA

FA

O3[i-1]O4[i-1]O5[i-1]

FA

O1[i-1]O2[i-1]

S[i-1]C[i-1]

FA[5:2] Compressor Bit i

[5:2] Compressor Bit i-1

Page 67: Digital Arithmetic CSE 237D: Spring 2008 Topic #8 Professor Ryan Kastner.

Moorea Modem Receiver Specification

Note: 112 samples/symbol + 112 samples for channel clearing.

MatchingPursuitCore

MatchingPursuitCore

MatchingPursuitCore

MatchingPursuitCore

arg

min i

Generalized multiple hypothesis test (GMHT)

Linear System Optimizations

Page 68: Digital Arithmetic CSE 237D: Spring 2008 Topic #8 Professor Ryan Kastner.

Linear System Optimization

Linear systems ubiquitous in signal processing applications

We have developed many methods for optimization to hardware, software, FPGA [ASAP04, ASPDAC05, DATE06, ICCD06, Journal of VLSI Signal Processing07]

1D linear systems on previous slide, aka FIR filters

3

2

1

0

3

2

1

0

)85cos()8cos()8

7cos()83cos(

)47cos()4

5cos()43cos()4cos(

)87cos()8

5cos()83cos()8cos(

)0cos()0cos()0cos()0cos(

x

x

x

x

y

y

y

y

Page 69: Digital Arithmetic CSE 237D: Spring 2008 Topic #8 Professor Ryan Kastner.

+

x

z-1 +

x

z-1 +

x

z-1+

x

z-1

x

z-1

X [n]

y [n]

h0hL-1 hL-2 hL-3 h1

. . .

FIR Filter Implementations:Multiply Accumulate Method

Convolution of the latest L input samples. L is the number of coefficients h(k) of the filter, and x(n) represents the input time series. y[n] = ∑ h[k] x[n-k] k= 0, 1, ..., L-1

Disadvantages Large area on FPGA due to multipliers and the fact that full flexibility of

general purpose multipliers are not required Limited number of embedded resources such as MAC engines,

multipliers, etc. in FPGAs

Page 70: Digital Arithmetic CSE 237D: Spring 2008 Topic #8 Professor Ryan Kastner.

FIR Filter Implementations:Distributed Arithmetic

Summation of inner product: Ak= constant coefficients

Xk = input data

We can write each input data in two’s complement:

Substituting this into the above yields:

Exchange order of the summations:

Y = Ak ⋅Xkk =1

K

X = −X0 + Xb ⋅2−b

b=1

n−1

Y = Ak −Xk 0 + Xkb ⋅2−b

b=1

B−1

∑ ⎡

⎣ ⎢

⎦ ⎥

k =1

K

Y = Ak ⋅Xkbk =1

K

∑ ⎡

⎣ ⎢

⎦ ⎥

b=1

B−1

∑ ⋅2−b + Ak ⋅ −Xk0( )k =1

K

Page 71: Digital Arithmetic CSE 237D: Spring 2008 Topic #8 Professor Ryan Kastner.

From previous slide: How do we compute the bracketed term?

Multiply a particular bit b of each of the inputs by the binary constants A1, A2, …, Ak

Questions: Assume we are looking at b=1 (LSB of inputs), but this generalizes to any bWhat if each bit of Xk1 are 0 i.e. Xk1 = [00000…0]?

What if X11 = 1 and remaining are 0? Xk1 = [00000…1]?

What if X11 = 1, X21 = 1 and rest are 0? Xk1 = [00000…11]?

FIR Filter Implementations:Distributed Arithmetic

Y = Ak ⋅Xkbk =1

K

∑ ⎡

⎣ ⎢

⎦ ⎥

b=1

B−1

∑ ⋅2−b + Ak ⋅ −Xk0( )k =1

K

Page 72: Digital Arithmetic CSE 237D: Spring 2008 Topic #8 Professor Ryan Kastner.

FIR Filter Implementations:Distributed Arithmetic

Looking at summations in a different way

Y = Ak −Xk 0 + Xkb ⋅2−b

b=1

n−1

∑ ⎡

⎣ ⎢

⎦ ⎥

k=1

K

Y =

A1 ⋅ −X10 + X11 ⋅2−1 + X12 ⋅2−2 +L + X1 B−1( )

⋅2− B−1( )( )

+A2 ⋅ −X20 + X21 ⋅2−1 + X22 ⋅2−2 +L + X2 B−1( )

⋅2− B−1( )( )

M

AK ⋅ −XK 0 + XK1 ⋅2−1 + XK 2 ⋅2−2 +L + XK B−1( )

⋅2− B−1( )( )

Y =

A1 ⋅X11 + A2 ⋅X21 + A3 ⋅X31 +L + AK ⋅XK1[ ]⋅2−1

+ A1 ⋅X12 + A2 ⋅X22 + A3 ⋅X32 +L + AK ⋅XK 2[ ]⋅2−2

M

+ A1 ⋅X1 B−1( )

+ A2 ⋅X2 B−1( )

+ A3 ⋅X3 B−1( )

+L + AK ⋅XK B−1( )[ ]⋅2− B−1( )

+A1 ⋅ −X10( ) + A2 ⋅ −X20( ) + A3 ⋅ −X30( ) +L + AK ⋅ −XK 0( )

Y = Ak ⋅ Xkb

k=1

K

∑ ⎡

⎣ ⎢

⎦ ⎥

b=1

n−1

∑ ⋅2−b + Ak ⋅ −Xk0( )k=1

K

Page 73: Digital Arithmetic CSE 237D: Spring 2008 Topic #8 Professor Ryan Kastner.

0A1A2

A1+ A2

A1+ A2 + … +AK

2K entry LUT

Precision of constant bits wide:

Usually equal to precision of input data B

Address00…0000…0100…1000…11

11…11

.

.

.

.

.

.

Value

X1b + X2b + … + XKb

+

>>

Y

.

.

.

FIR Filter Implementations:Distributed Arithmetic

Page 74: Digital Arithmetic CSE 237D: Spring 2008 Topic #8 Professor Ryan Kastner.

Advantages Replaces multiplication with LUT Coefficients stored in LUTs

Disadvantages Performance limited as next input

sample processed only after every bit of the current input sample is processed

Increasing number of bits to be processed has a significant effect on resource utilization

Larger size scaling accumulator needed for higher number of bits

Increases critical path delay

FIR Filter Implementations:Distributed Arithmetic

Address Data

0000 0

0001 C0

0010 C1

… …

1111 C0+C1+C2+C3

LUT

LUT

+ +

Q

QSET

CLR

D

x0[i]x1[i]x2[i]x3[i]

scaling accumulator

<<

x4[i]x5[i]x6[i]x7[i]

Page 75: Digital Arithmetic CSE 237D: Spring 2008 Topic #8 Professor Ryan Kastner.

FIR Filter Implementations:Distributed Arithmetic

LUT

LUT

+

x0[i]x1[i]x2[i]x3[i]

LUT

LUT

+

+

Q

QSET

CLR

D

scaling accumulator

<<

+

x4[i]x5[i]x6[i]x7[i]

x0[i+1]x1[i+1]x2[i+1]x3[i+1]

x4[i+1]x5[i+1]x6[i+1]x7[i+1]

The performance improved by replication - process multiple bits at a time

Significant effect on resource utilization More LUTs Larger size scaling

accumulator

Page 76: Digital Arithmetic CSE 237D: Spring 2008 Topic #8 Professor Ryan Kastner.

X[n]

+ +

+

+

+ +

y0 y1 yL-1

+

y2

Z-1

+

Z-1 + Z-1 Z-1 + y[n]

MultiplierBlock

DelayBlock

+

x

z-1 +

x

z-1 +

x

z-1+

x

z-1

x

z-1

X [n]

y [n]

h0hL-1 hL-2 hL-3 h1

. . .

FIR Filter Implementations:Add and Shift Method

Page 77: Digital Arithmetic CSE 237D: Spring 2008 Topic #8 Professor Ryan Kastner.

Idea: Constant Multiplication to Shift/Add

Multiplication is expensive in hardware Decompose constant multiplications into shifts and additions

13*X = (1101)2*X = X + X<<2 + X<<3

Signed digits can reduce the number of additions/subtractions Canonical Signed Digits (CSD) (Knuth’74) (57)10 = (0110111)2 = (100-1001)CSD

Further reduction possible by common subexpression elimination Up to 50% reduction (R.Hartley TCS’96)

Page 78: Digital Arithmetic CSE 237D: Spring 2008 Topic #8 Professor Ryan Kastner.

Introduction

Common subexpressions = common digit patterns

F1 = 7*X = (0111)*X = X + X<<1 + X<<2 F2 = 13*X = (1101)*X = X + X<<2 + X<<3

D1 = X + X<<2 F1 = D1 + X<<1 F2 = D1 + X<<3

Good for single variable: FIR filters (transposed form) Multiple variable? (DFT, DCT etc..??)

“0101”

=> X + X<<23+, 3<<

4+, 4<<

Page 79: Digital Arithmetic CSE 237D: Spring 2008 Topic #8 Professor Ryan Kastner.

Linear Systems and polynomial transformation

Y0 1 1 1 1 X0

Y1 = 2 1 -1 -2 X1

Y2 1 -1 -1 1 X2

Y3 1 -2 2 -1 X3

Decomposing constant multiplications

Y0 = X0 + X1 + X2 + X3

Y1 = X0<<1 + X1 - X2 - X3<<1

Y2 = X0 - X1 - X2 + X3

Y3 = X0 - X1<<1 + X2<<1 - X3

Y0 = X0 + X1 + X2 + X3

Y1 = X0<<1 + X1 - X2 - X3<<1

Y2 = X0 - X1 - X2 + X3

Y3 = X0 - X1<<1 + X2<<1 - X3 12+, 4<<12+, 4<<

H.264 Integer Transform

Page 80: Digital Arithmetic CSE 237D: Spring 2008 Topic #8 Professor Ryan Kastner.

Linear Systems and polynomial transformation

Y0 1 1 1 1 X0

Y1 = 2 1 -1 -2 X1

Y2 1 -1 -1 1 X2

Y3 1 -2 2 -1 X3

Polynomial Transformation

H.264 Integer Transform

Y0 = X0 + X1 + X2 + X3

Y1 = X0L + X1 - X2 - X3L

Y2 = X0 - X1 - X2 + X3

Y3 = X0 - X1L + X2L - X3

Y0 = X0 + X1 + X2 + X3

Y1 = X0L + X1 - X2 - X3L

Y2 = X0 - X1 - X2 + X3

Y3 = X0 - X1L + X2L - X3 12+, 4<<12+, 4<<

Page 81: Digital Arithmetic CSE 237D: Spring 2008 Topic #8 Professor Ryan Kastner.

H.264 Example

Y0 = X0 + X1 + X2 + X3

Y1 = X0L + X1 - X2 - X3L

Y2 = X0 - X1 - X2 + X3

Y3 = X0 - X1L + X2L - X3

Y0 = X0 + X1 + X2 + X3

Y1 = X0L + X1 - X2 - X3L

Y2 = X0 - X1 - X2 + X3

Y3 = X0 - X1L + X2L - X3

Select D0 = (X0 + X3)

Page 82: Digital Arithmetic CSE 237D: Spring 2008 Topic #8 Professor Ryan Kastner.

H.264 Example

Select D1 = (X1 – X2)

Y0 = D0 + X1 + X2

Y1 = X0L + X1 - X2 - X3L

Y2 = D0 - X1 - X2

Y3 = X0 - X1L + X2L - X3

Y0 = D0 + X1 + X2

Y1 = X0L + X1 - X2 - X3L

Y2 = D0 - X1 - X2

Y3 = X0 - X1L + X2L - X3

Page 83: Digital Arithmetic CSE 237D: Spring 2008 Topic #8 Professor Ryan Kastner.

H.264 Example

Select D2 = (X1 + X2)

Y0 = D0 + X1 + X2

Y1 = X0L + D1 - X3L

Y2 = D0 - X1 - X2

Y3 = X0 - D1L - X3

Y0 = D0 + X1 + X2

Y1 = X0L + D1 - X3L

Y2 = D0 - X1 - X2

Y3 = X0 - D1L - X3

Page 84: Digital Arithmetic CSE 237D: Spring 2008 Topic #8 Professor Ryan Kastner.

H.264 Example

Select D3 = (X0 – X3)

Y0 = D0 + D2

Y1 = X0L + D1 - X3L

Y2 = D0 - D2

Y3 = X0 - D1L - X3

Y0 = D0 + D2

Y1 = X0L + D1 - X3L

Y2 = D0 - D2

Y3 = X0 - D1L - X3

Page 85: Digital Arithmetic CSE 237D: Spring 2008 Topic #8 Professor Ryan Kastner.

Final Implementation

Extracting 4 divisors

D0 = X0 + X3 Y0 = D0 + D2

D1 = X1 – X2 Y1 = D1 + D3L

D2 = X1 + X2 Y2 = D0 - D2

D3 = X0 - X3 Y3 = D3 – D1L

D0 = X0 + X3 Y0 = D0 + D2

D1 = X1 – X2 Y1 = D1 + D3L

D2 = X1 + X2 Y2 = D0 - D2

D3 = X0 - X3 Y3 = D3 – D1L

8+, 2<<8+, 2<<

Original: 12+, 4<<

Rectangle Covering:10+, 3<<

Page 86: Digital Arithmetic CSE 237D: Spring 2008 Topic #8 Professor Ryan Kastner.

FPGA FIR Filter Implementations:Add and Shift Method

F1 = A + B + C + DF2 = A + B + C + E

Unoptimized Expression Trees

Extracting Common Expression (A + B + C)

Extracting Common Expression (A + B)

Optimization

Page 87: Digital Arithmetic CSE 237D: Spring 2008 Topic #8 Professor Ryan Kastner.

Filter Implementation Using Add and Shift Method

Filter Implementation Using Xilinx Coregen (PDA)

Filter(# taps)

Slices LUTs FFsPerformance

(Msps)

6 264 213 509 251

10 474 406 916 222

13 386 334 749 252

20 856 705 1650 250

28 1294 1145 2508 227

41 2154 1719 4161 223

61 3264 2591 6303 192

119 6009 4821 11551 203

151 7579 6098 14611 180

Filter(# taps)

Slices LUTs FFsPerformance

(Msps)

6 524 774 1012 245

10 781 1103 1480 222

13 929 1311 1775 199

20 1191 1631 2288 199

28 1774 2544 3381 199

41 2475 3642 4748 222

61 3528 5335 6812 199

119 6484 9754 12539 205

151 8274 12525 15988 199

Resource Utilization + Performance Results

Page 88: Digital Arithmetic CSE 237D: Spring 2008 Topic #8 Professor Ryan Kastner.

Experimental ResultsDA vs. Add and Shift Method

Reduction in Resources

0

10

20

30

40

50

60

70

80

6 10 13 20 28 41 61 119 152

# of Taps

% R

eduction

SLICEs

LUTs

FFs

Page 89: Digital Arithmetic CSE 237D: Spring 2008 Topic #8 Professor Ryan Kastner.

Dynamic Power Consumption

0200

400600

8001000

12001400

1600

6 10 13 20 28 41 61 119

Filter size (# of taps)

Pow

er (m

w)

Add/Shift

Coregen

Experimental ResultsDA vs. Add and Shift Method

Page 90: Digital Arithmetic CSE 237D: Spring 2008 Topic #8 Professor Ryan Kastner.

Filter(# taps)

Add ShiftMethod

MACfilter

Slices Msps Slices Msps

6 264 296 219 262

10 475 296 418 253

13 387 296 462 253

20 851 271 790 251

28 1303 305 886 251

41 2178 296 1660 243

61 3284 247 1947 242

119 6025 294 3581 241

151 7623 294 7631 215

Experimental ResultsMAC vs. Add and Shift Method

Page 91: Digital Arithmetic CSE 237D: Spring 2008 Topic #8 Professor Ryan Kastner.

Experimental ResultsMAC vs. Add and Shift Method

resource utilization

0100020003000400050006000700080009000

1 2 3 4 5 6 7 8 9

# of taps

# of

slic

es

MAC

Add and Shift

Page 92: Digital Arithmetic CSE 237D: Spring 2008 Topic #8 Professor Ryan Kastner.

Experimental ResultsMAC vs. Add and Shift Method

Performance

0

50

100

150

200

250

300

350

1 2 3 4 5 6 7 8 9

# of taps

Msp

s

Add and Shift

MAC

Page 93: Digital Arithmetic CSE 237D: Spring 2008 Topic #8 Professor Ryan Kastner.

CSA CSE for Linear Systems

Y1 = X1 + X1<<2 + X2 + X2<<1 + X2<<2

Y2 = X1<<2 + X2<<2 + X2<<3

D1 = X1 + X2 + X2<<1

Y1 = (D1S + D1

C) + X1<<2 + X2<<2

Y2 = (D1S + D1

C)

Page 94: Digital Arithmetic CSE 237D: Spring 2008 Topic #8 Professor Ryan Kastner.

Algebraic methods

Greedy Iterative algorithmExtracts the “best” 3-term divisorRewrites the expressions containing it

Terminates when there are no more common subexpressions

F1 = a + b + c + d + e

F2 = a + b + c + d + f

>> D1 = a + b + c

F1 = D1S + D1

C + d + e

F2 = D1S + D1

C + d + f

>> D2 = D1S + D1

C + d

F1 = D2S + D2

C + e

F2 = D2S + D2

C + f

Page 95: Digital Arithmetic CSE 237D: Spring 2008 Topic #8 Professor Ryan Kastner.

Experimental results

Comparing # of CSAs

Comparing # of CSAs

0

50

100

150

200

250

Example

# C

SA

s

Original

Optimized

Average 38.4% reduction

Page 96: Digital Arithmetic CSE 237D: Spring 2008 Topic #8 Professor Ryan Kastner.

Experimental results FPGA synthesis

Virtex II FPGAs Synthesized designs and performed place & route

Reduction in LUTs and slices

05

10152025303540

H.264 DCT8 IDCT8 6 tapFIR

20 tapFIR

41 tapFIR

Average

Examples

% R

educt

ion

LUTs

Slices

Avg 14.1 % reduction in #Slices and Avg 12.9% reduction in # LUTs

Avg 5.7% increase in the delay

Page 97: Digital Arithmetic CSE 237D: Spring 2008 Topic #8 Professor Ryan Kastner.

Conclusions

Optimized acoustic modem by focusing on channel estimation and FIR filters

In depth study of parallelization, number representation, arithmetic, and linear system optimization

Note: 112 samples/symbol + 112 samples for channel clearing.

MatchingPursuitCore

MatchingPursuitCore

MatchingPursuitCore

MatchingPursuitCore

arg

min i