Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation
Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation
description
Transcript of Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation
Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation
Tor Aamodt and Paul ChowUniversity of Toronto
{ aamodt, pc }@eecg.utoronto.ca
3rd ACM International Conference on Compilers, Architectures and Synthesis for Embedded Systems, Nov. 17-18th, 2000, San Jose CA
Tor Aamodt & Paul Chow
University of Toronto
Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation Slide 2 of 32
What is this presentation about?
FOCUS: Signal processing applications developed using high-level language representation and floating-point data types...
WANT: Faster fixed-point software development...
QUESTION: Are there “better” fixed-point DSP instruction-sets in terms of runtime, power, or roundoff-noise performance?
Tor Aamodt & Paul Chow
University of Toronto
Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation Slide 3 of 32
Presentation Outline
Motivation & BackgroundFocus on…
Automatic Conversion to Fixed-PointArchitectural EnhancementsSome Experimental Results
Summary / Future Directions
Tor Aamodt & Paul Chow
University of Toronto
Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation Slide 4 of 32
Motivation
80% of DSPs in use are Fixed-Point. Why?
Because fixed-point hardware is cheaper and uses less power …
… however, it is much harder to develop signal-processing software for.
Tor Aamodt & Paul Chow
University of Toronto
Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation Slide 5 of 32
Background
UTDSP Project: DSP Compiler/Architecture Co-design Traditional DSP architectures are hard for compilers to generate
efficient code for… eg. extended precision accumulators First Generation Silicon Sept. 30, 1999: 108 pin PGA 0.35 µm
CMOS / 63 MHz (Sean Peng’s M.A.Sc.) 16-bit Fixed-Point VLIW DSP with novel 2-level Instruction
fetching architecture (reduced pin-count)
June 2000: Synopsys CoCentric Fixed-Point Designer Tool First commercial tool for transforming floating-point ANSI C
programs into fixed-point ($20,000 US)
Tor Aamodt & Paul Chow
University of Toronto
Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation Slide 6 of 32
Background: Fixed-Point versus Floating-Point
Fractional PartInteger Partsign bit
sign bit 8 bit exponent (excess 127)
23+1 bit normalizedmantissa
Fixed-Point:
32 bit Floating-Point (IEEE):
implied binary-point
explicitbinary-point
Tor Aamodt & Paul Chow
University of Toronto
Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation Slide 7 of 32
Background: Using Fixed-Point Arithmetic
yn = yn-1 + xn
yn = ((•yn-1>>3) + xn ) << 1
Floating-Point:
Fixed-Point:
Explicit Scaling Operations
Tor Aamodt & Paul Chow
University of Toronto
Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation Slide 8 of 32
Automatic Conversion Process
Input Program
Parser Optimizer Code Generator Processor
Traditional Optimizing Compiler:
• CONSTRAINT: Input/Output Invariance
• GOAL: Application Speedup
ie. make code faster, but do not break anything!!!
Tor Aamodt & Paul Chow
University of Toronto
Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation Slide 9 of 32
Automatic Conversion Process
Input Program Parser Optimizer Code Generator Processor
Floating-Point to Fixed-Point Translator
• “RELAX” CONSTRAINTS…
• GOALS: “Good” Input/Ouput Fidelity (eg. good signal-to-noise ratio) Fast/Low-Power Operation (10-500 faster than FP emulation)
Traditional Optimizing Compiler:
SampleInputs
Tor Aamodt & Paul Chow
University of Toronto
Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation Slide 10 of 32
Floating-Point to Fixed-Point Translation
float a, b, x[N];y = a*x[i] + b*x[i+1];
int a, b, x[N];y = a•x[i] >> 2 + b•x[i+1];
1. Type Conversion
3. Fractional Fixed-Point Operations
2. Scaling Operations
Tor Aamodt & Paul Chow
University of Toronto
Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation Slide 11 of 32
Floating-Point to Fixed-Point Translator
SUIF Parser*
*SUIF = Stanford University Intermediate Format See: http://suif.stanford.edu
Identifier Assignment
Optimizer
Instrument Code
ProfileSample Inputs
Fixed-PointConversion
Tor Aamodt & Paul Chow
University of Toronto
Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation Slide 12 of 32
Collecting Dynamic Range Information
profile(tmp_1,1);
profile(tmp_2,2);
profile(y,0);
Code Instrumentation:
Consider the ANSI C code:
float a, b, x[N]; y = a*x[i] + b*x[i+1]; tmp_1 = a*x[i];
tmp_2 = b*x[i+1];
y = tmp_1 * tmp_2;
ID Assignment:
“1” : tmp_1
“2” : tmp_2
“0” :
Equivalent Expression Tree:
+
*
*
a
x[i+1]
b
x[i]y
Tor Aamodt & Paul Chow
University of Toronto
Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation Slide 13 of 32
Generating Scaling Operations
Signal Scaling: Integer Word Length (IWL)definition: IWL[x] = log2 max(x) + 1
Fractional PartInteger PartSign bit
IWL
Tor Aamodt & Paul Chow
University of Toronto
Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation Slide 14 of 32
Generating Scaling Operations
IWLA measured
IWLA current
IWLA op B measured
IWLA op B current
IWLB measured
IWLB current
Converted Sub-Expressions
Example: “A op B”:
op
A B
?
Tor Aamodt & Paul Chow
University of Toronto
Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation Slide 15 of 32
Automatic Conversion Process:
IRP: Using Intermediate Result Profile Data Previous Algorithms:
‘Worst-Case Evaluation’: Markus Willems et. al. FRIDGE: An Interactive Code Generation Environment for HW/SW CoDesign. ICASSP, April 1997. (a.k.a. Predecessor to Synopsys CoCentric Fixed-Point Designer Tool)
A ‘Statistical’ Approach: Ki-Il Kum, Jiyang Kang, and Wonyong Sung. A Floating-Point to Fixed-Point C Converter for Fixed-Point Digital Signal Processors. In Proc. 2nd SUIF Compiler Workshop, August 1997.
Neither use Intermediate Result Profile data, instead, they combine range information from leaf nodes Is Useful Information Lost?
Tor Aamodt & Paul Chow
University of Toronto
Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation Slide 16 of 32
IRP: Additive Operations
where: nA = IWLA current - IWLA measurednB = IWLA current - IWLB measuredn = IWLA measured - IWLB measured
“A B” “(A << nA) (B >> [n-nB])”
IWLA+B current = IWLA measured
n
“A ± B”
B:
A:
For example, assume |A| > |B|, andIWLA+B measured IWLA measured
>> n
Tor Aamodt & Paul Chow
University of Toronto
Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation Slide 17 of 32
IRP: Multiplication
“A • B” “(A << nA) • (B << nB)”
where: nA = IWLA current - IWLA measured
nB = IWLA current - IWLB measured
IWLA•B current = IWLA measured + IWLB measured
Tor Aamodt & Paul Chow
University of Toronto
Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation Slide 18 of 32
IRP: Division
“A / B” “(A >> [ndividend - nA]) / (B << nB)”
nA = IWLA current - IWLA measured
nB = IWLA current - IWLB measured
ndiff = IWLA/B measured - IWLA measured + IWLB measured
ndividend =ndiff , if ndiff 00 , otherwise
Tor Aamodt & Paul Chow
University of Toronto
Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation Slide 19 of 32
IRP-SA: Using ‘Shift Absorption’
Question: Is information discarded unnecessarily here?
y = (a*x[i]<<1) + b*x[i+1]
Consider the following alternative:
Example:
y = (a*x[i] + (b*x[i+1]>>1)) << 1
BUT: Can we really discard most significant bits and get roughly the same answer???? YES!
Tor Aamodt & Paul Chow
University of Toronto
Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation Slide 20 of 32
Architectural Support
Fractional Multiplicationwith internal Left Shift
IWLA+ IWLBA*B:
IWLB
IWLA
A:
B:
Common occurrence (using IRP-SA): A•B << n
n
Tor Aamodt & Paul Chow
University of Toronto
Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation Slide 21 of 32
Experimental Results
Benchmarks
4th Order Cascaded/Parallel IIR Filter (IIR-C, IIR-P)(Normalized) Lattice Filter (LAT, NLAT)128-Point Radix 2 Decimation in Time FFT (FFT-NR, FFT-MW)Levinson-Durbin Recursion (LEVDUR)10x10 Matrix-Multiply (MMUL10)Nonlinear Control (INVPEND)Trig Function (SIN)
Tor Aamodt & Paul Chow
University of Toronto
Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation Slide 22 of 32
SQNR Enhancement: FMLS and/or IRP-SA
-0.5
0
0.5
1
1.5
2
Equi
vale
nt B
its
IIR4-C IIR4-P NLAT LAT FFT-NR FFT-MW INVPEND LEVDUR MMUL10 SIN
IRP-SAFMLSIRP-SA w/ FMLS
Tor Aamodt & Paul Chow
University of Toronto
Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation Slide 23 of 32
What Is The Effect of “Shift Absorption” ?
0
0.2
0.4
0.6
0.8R
elat
ive
Freq
uenc
y
3 left 2 left 1 left none 1 rightFMLS Ouput Shift Distance
Distribution of Fractional Multiply Output Shifts
IRP IRP-SA
Tor Aamodt & Paul Chow
University of Toronto
Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation Slide 24 of 32
Experimental Results:
Rotational Inverted PendulumU of T System Control GroupNon-linear Testbench
Tor Aamodt & Paul Chow
University of Toronto
Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation Slide 25 of 32
Closed-Loop System Response: Rotational Inverted Pendulum 12-bit Controller Comparison
WC : 32.8 dBIRP-SA: 41.1 dBIRP-SA w/ fmls: 48.0 dB
Tor Aamodt & Paul Chow
University of Toronto
Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation Slide 26 of 32
128-Point Radix-2 FFT (Generated by MATLAB RealTime Workshop)
Tor Aamodt & Paul Chow
University of Toronto
Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation Slide 27 of 32
Speedup?Rotational Inverted Pendulum: Fractional Multiply Output Shift Relative Frequencies
Tor Aamodt & Paul Chow
University of Toronto
Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation Slide 28 of 32
…Yup!
Tor Aamodt & Paul Chow
University of Toronto
Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation Slide 29 of 32
Speedup* Using FMLS
1
1.1
1.2
1.3
1.4R
elativ
e Sp
eedu
p
IIR4-
C
IIR4-
P
NLA
T
LAT
FFT-
NR
FFT-
MW
LEV
DU
R
MM
UL1
0
INV
PEN
D
SIN
Limiting8-FMUL = { 4 left thru 3 right }4-FMUL = { 2 left thru 1 right }2-FMUL = { one left, no shift }
Tor Aamodt & Paul Chow
University of Toronto
Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation Slide 30 of 32
SQNR Enhancement for various Output Shift Sets
0
0.5
1
1.5
2
Equi
vale
nt B
its
IIR4-C IIR4-P NLAT LAT FFT-NR FFT-MW LEVDUR MMUL10 INVPEND SIN
Limiting8-FMUL4-FMUL2-FMUL
Tor Aamodt & Paul Chow
University of Toronto
Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation Slide 31 of 32
Summary
The Fractional Multiply with internal Left Shift (FMLS) operation can improve runtime and signal-to-noise performance. Speedups of up to 35% and SQNR enhancement equivalent of up to 2 bits maybe even 4 bits (depending on how you choose to measure it)
Easy VLSI implementation, and easy for compiler to use.
Tor Aamodt & Paul Chow
University of Toronto
Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation Slide 32 of 32
Future Directions
Higher Level Transformations:Automatic Generation of Block-Floating-Point...Quantization Error Feedback…BOTH need signal-flow-graph representation…
therefore probably need a better DSP language than ANSI C
Variable Precision Arithmetic (How much precision does each operation need?)