c Copyright 2012 please consult the authors Notice Changes … · 2013-07-04 · FPGA hype cycle...
Transcript of c Copyright 2012 please consult the authors Notice Changes … · 2013-07-04 · FPGA hype cycle...
![Page 1: c Copyright 2012 please consult the authors Notice Changes … · 2013-07-04 · FPGA hype cycle 2005-6 – Peak 2006-8 – Numerous evals./Realism. 2009 – Turning point? “HPC](https://reader033.fdocuments.us/reader033/viewer/2022042008/5e70c28a958acc7cfb3d376c/html5/thumbnails/1.jpg)
This is the author’s version of a work that was submitted/accepted for pub-lication in the following source:
Warne, David, Kelson, Neil A., Kok, Jonathan, Gurnett, Timothy, & Rueck-ert, Ulrich (2012) Experiences with implementing common mathematicaloperations using field programmable gate arrays. In 16th Biennial Com-putational Techniques and Applications Conference, 23 - 26 September,2012, Queensland University of Technology, Brisbane, Qld. (In Press)
This file was downloaded from: http://eprints.qut.edu.au/54454/
c© Copyright 2012 please consult the authors
Notice: Changes introduced as a result of publishing processes such ascopy-editing and formatting may not be reflected in this document. For adefinitive version of this work, please refer to the published source:
![Page 2: c Copyright 2012 please consult the authors Notice Changes … · 2013-07-04 · FPGA hype cycle 2005-6 – Peak 2006-8 – Numerous evals./Realism. 2009 – Turning point? “HPC](https://reader033.fdocuments.us/reader033/viewer/2022042008/5e70c28a958acc7cfb3d376c/html5/thumbnails/2.jpg)
Queensland University of Technology
CRICOS No. 00213J
Experiences with Implementing Common Mathematical Operations using Field
Programmable Gate Arrays
D. J. Warne, N. A. Kelson, J. Kok, T. Gurnett, U. Rueckert
![Page 3: c Copyright 2012 please consult the authors Notice Changes … · 2013-07-04 · FPGA hype cycle 2005-6 – Peak 2006-8 – Numerous evals./Realism. 2009 – Turning point? “HPC](https://reader033.fdocuments.us/reader033/viewer/2022042008/5e70c28a958acc7cfb3d376c/html5/thumbnails/3.jpg)
Overall objective:• To gain experience in mapping basic scientific operations /
applications partially or fully onto FPGA-based platforms
Talk:• Aims• FPGA overview
– FPGA 101– Challenges– The new HPC programmer needs…– FPGA hype cycle– Current FPGA-based HPC platforms
• Case Studies– Floating point exponentiation, division, and TDMA
![Page 4: c Copyright 2012 please consult the authors Notice Changes … · 2013-07-04 · FPGA hype cycle 2005-6 – Peak 2006-8 – Numerous evals./Realism. 2009 – Turning point? “HPC](https://reader033.fdocuments.us/reader033/viewer/2022042008/5e70c28a958acc7cfb3d376c/html5/thumbnails/4.jpg)
Program aims
Configurable High Performance computing• Co-processing of sci. apps (Host+FPGA systems)• Partial algorithm re-implementation in RC hardware
– Code profiling; – Re-implementation of compute-intensive code segments in an HDL
Configurable High Performance Embedded Computing• Full algorithm implementation in RC hardware
– Re-implementation of entire code base in an HDL
Reconfigurable HPC• Runtime RC hardware changes
![Page 5: c Copyright 2012 please consult the authors Notice Changes … · 2013-07-04 · FPGA hype cycle 2005-6 – Peak 2006-8 – Numerous evals./Realism. 2009 – Turning point? “HPC](https://reader033.fdocuments.us/reader033/viewer/2022042008/5e70c28a958acc7cfb3d376c/html5/thumbnails/5.jpg)
Basic questions of interest
• How can I rethink and recast the algorithm of interest to an HDL?
• How much FPGA resources are used?
• How fast can the FPGA clock be driven?
• How can data be fed to/from the FPGA?
• Power consumption…
![Page 6: c Copyright 2012 please consult the authors Notice Changes … · 2013-07-04 · FPGA hype cycle 2005-6 – Peak 2006-8 – Numerous evals./Realism. 2009 – Turning point? “HPC](https://reader033.fdocuments.us/reader033/viewer/2022042008/5e70c28a958acc7cfb3d376c/html5/thumbnails/6.jpg)
FPGA 101 – “blank slate” computing
![Page 7: c Copyright 2012 please consult the authors Notice Changes … · 2013-07-04 · FPGA hype cycle 2005-6 – Peak 2006-8 – Numerous evals./Realism. 2009 – Turning point? “HPC](https://reader033.fdocuments.us/reader033/viewer/2022042008/5e70c28a958acc7cfb3d376c/html5/thumbnails/7.jpg)
FPGA 101 – “blank slate” computing(2)
![Page 8: c Copyright 2012 please consult the authors Notice Changes … · 2013-07-04 · FPGA hype cycle 2005-6 – Peak 2006-8 – Numerous evals./Realism. 2009 – Turning point? “HPC](https://reader033.fdocuments.us/reader033/viewer/2022042008/5e70c28a958acc7cfb3d376c/html5/thumbnails/8.jpg)
Challenges
![Page 9: c Copyright 2012 please consult the authors Notice Changes … · 2013-07-04 · FPGA hype cycle 2005-6 – Peak 2006-8 – Numerous evals./Realism. 2009 – Turning point? “HPC](https://reader033.fdocuments.us/reader033/viewer/2022042008/5e70c28a958acc7cfb3d376c/html5/thumbnails/9.jpg)
A HPC software/configwaredeveloper needs…
• Scientific software dev. on parallel/HPC systems• Unix and Linux Cluster experience• Writing / maintaining C, C++ and/or Fortran codes• scripting languages and system utils: Make, configure, perl, unix shells,
etc.• Experience in tools for programming, profiling, debugging parallel MPI
and hybrid-parallel codes
AND…
• Proficiency with an HDL and e.g. Xilinx FPGA toolchain• Analyse performance characteristics of scientific HPC codes to
understand energy efficiency and performance implications of algorithmic and hardware architecture trade-offs .
![Page 10: c Copyright 2012 please consult the authors Notice Changes … · 2013-07-04 · FPGA hype cycle 2005-6 – Peak 2006-8 – Numerous evals./Realism. 2009 – Turning point? “HPC](https://reader033.fdocuments.us/reader033/viewer/2022042008/5e70c28a958acc7cfb3d376c/html5/thumbnails/10.jpg)
Job Ad: FPGA EngineerLawrence Berkeley National Lab
Sept / 2012
The CoDEx Project A Hardware/Software Codesign Environment for the Exascale Era
Project GoalsThe next decade will see a rapid evolution of HPC node architectures as power and cooling constraints are limiting increases in microprocessor clock speeds and constraining data movement. Applications and algorithms will need to change and adapt as node architectures evolve. A key element of the strategy as we move forward is the co-design of applications, architectures and programming environments, to navigate the increasingly daunting constraint space for feasible exascale system designs.
![Page 11: c Copyright 2012 please consult the authors Notice Changes … · 2013-07-04 · FPGA hype cycle 2005-6 – Peak 2006-8 – Numerous evals./Realism. 2009 – Turning point? “HPC](https://reader033.fdocuments.us/reader033/viewer/2022042008/5e70c28a958acc7cfb3d376c/html5/thumbnails/11.jpg)
FPGA hype cycle2005-6 – Peak2006-8 – Numerous evals./Realism.2009 – Turning point?“HPC is approaching a X-roads in terms of enabling technologies. .. numerous studies show computing with RC devices is fundamentally superior in terms of speed & power” Source: HPCwire - RC computing research pushes forward. Allan George, Director NSF Centre for HPRC (CHREC)
2010-12• Journal special issues• Monographs e.g. FPGAs 4 HPC (2012)• FP benchmarks continue to improve cf.
multicore microprocessorsH2 2012• Xilinx: Vivado release (AutoESL)• Altera: OpenCL 4 FPGA early access
program launch.
![Page 12: c Copyright 2012 please consult the authors Notice Changes … · 2013-07-04 · FPGA hype cycle 2005-6 – Peak 2006-8 – Numerous evals./Realism. 2009 – Turning point? “HPC](https://reader033.fdocuments.us/reader033/viewer/2022042008/5e70c28a958acc7cfb3d376c/html5/thumbnails/12.jpg)
HPRC / accelerator systems
Research• Novo-G (NSF/CHREC) 288x Altera Stratix III/IV FPGAs
• Maxwell 64x Xilinx Virtex 4 FPGA platform (UK)
• Confetti (Lausanne), Grape (Japan), MPRACE-2 (Germany), etc
Commercial• Convey computers
![Page 13: c Copyright 2012 please consult the authors Notice Changes … · 2013-07-04 · FPGA hype cycle 2005-6 – Peak 2006-8 – Numerous evals./Realism. 2009 – Turning point? “HPC](https://reader033.fdocuments.us/reader033/viewer/2022042008/5e70c28a958acc7cfb3d376c/html5/thumbnails/13.jpg)
Queensland University of Technology
CRICOS No. 00213J
Case studies
Three examples:• FP Exp()• FP Divide()• TDMA
![Page 14: c Copyright 2012 please consult the authors Notice Changes … · 2013-07-04 · FPGA hype cycle 2005-6 – Peak 2006-8 – Numerous evals./Realism. 2009 – Turning point? “HPC](https://reader033.fdocuments.us/reader033/viewer/2022042008/5e70c28a958acc7cfb3d376c/html5/thumbnails/14.jpg)
Queensland University of Technology
CRICOS No. 00213J
Example 1: FP Exponentiation
![Page 15: c Copyright 2012 please consult the authors Notice Changes … · 2013-07-04 · FPGA hype cycle 2005-6 – Peak 2006-8 – Numerous evals./Realism. 2009 – Turning point? “HPC](https://reader033.fdocuments.us/reader033/viewer/2022042008/5e70c28a958acc7cfb3d376c/html5/thumbnails/15.jpg)
CRICOS No. 00213Ja university for the worldrealR
Motivation
Intel Xeon CPU (2.4GHz)
Function Cycles Latency(ns) Throughput(106 ops/s)
Logarithm 196 82 12Exponential 308 128 8
Xilinx Virtex-II FPGA (100MHz)
Function Cycles Latency Throughput(106 ops/s)
Logarithm 11 64 100Exponential 15 85 100
J. Detrey and F. de Dinechin. Parameterized floating-point logarithm and exponential functions for FPGAs. IEEE International Conference on Field-Programmable Technology, Singapore, Decemeber 2008, pp.27-34.
![Page 16: c Copyright 2012 please consult the authors Notice Changes … · 2013-07-04 · FPGA hype cycle 2005-6 – Peak 2006-8 – Numerous evals./Realism. 2009 – Turning point? “HPC](https://reader033.fdocuments.us/reader033/viewer/2022042008/5e70c28a958acc7cfb3d376c/html5/thumbnails/16.jpg)
CRICOS No. 00213Ja university for the worldrealR
Floating Point Representation
![Page 17: c Copyright 2012 please consult the authors Notice Changes … · 2013-07-04 · FPGA hype cycle 2005-6 – Peak 2006-8 – Numerous evals./Realism. 2009 – Turning point? “HPC](https://reader033.fdocuments.us/reader033/viewer/2022042008/5e70c28a958acc7cfb3d376c/html5/thumbnails/17.jpg)
CRICOS No. 00213Ja university for the worldrealR
Floating Point Exponential Algorithm
• Basic Method
• For single precision, Mantissa is split into two 9-bit portions.
• is stored as a 512 line LUT.• is computed using 1st order Taylor series
![Page 18: c Copyright 2012 please consult the authors Notice Changes … · 2013-07-04 · FPGA hype cycle 2005-6 – Peak 2006-8 – Numerous evals./Realism. 2009 – Turning point? “HPC](https://reader033.fdocuments.us/reader033/viewer/2022042008/5e70c28a958acc7cfb3d376c/html5/thumbnails/18.jpg)
CRICOS No. 00213Ja university for the worldrealR
FP Exp() Data Flow
![Page 19: c Copyright 2012 please consult the authors Notice Changes … · 2013-07-04 · FPGA hype cycle 2005-6 – Peak 2006-8 – Numerous evals./Realism. 2009 – Turning point? “HPC](https://reader033.fdocuments.us/reader033/viewer/2022042008/5e70c28a958acc7cfb3d376c/html5/thumbnails/19.jpg)
Queensland University of Technology
CRICOS No. 00213J
Example 2: FP Divide
![Page 20: c Copyright 2012 please consult the authors Notice Changes … · 2013-07-04 · FPGA hype cycle 2005-6 – Peak 2006-8 – Numerous evals./Realism. 2009 – Turning point? “HPC](https://reader033.fdocuments.us/reader033/viewer/2022042008/5e70c28a958acc7cfb3d376c/html5/thumbnails/20.jpg)
CRICOS No. 00213Ja university for the worldrealR
Floating Point Divide Algorithm
Goldschmidt method for mantissa division– Let
– Since
– Or we approximate
1 , [0,1)b x x
2
2 4
(1 ) (1 )(1 )
1 1 1
N N N x N x x
b x x x
21, lim1 1n
nx x
2
0
(1 )1
n
n
NN x
x
2 4 8(1 )(1 )(1 )(1 )1
NN x x x x
x
![Page 21: c Copyright 2012 please consult the authors Notice Changes … · 2013-07-04 · FPGA hype cycle 2005-6 – Peak 2006-8 – Numerous evals./Realism. 2009 – Turning point? “HPC](https://reader033.fdocuments.us/reader033/viewer/2022042008/5e70c28a958acc7cfb3d376c/html5/thumbnails/21.jpg)
CRICOS No. 00213Ja university for the worldrealR
FP Divide Data Flow
N
+1
^2
x
^2
+1
*
*
^2
+1
*
q
![Page 22: c Copyright 2012 please consult the authors Notice Changes … · 2013-07-04 · FPGA hype cycle 2005-6 – Peak 2006-8 – Numerous evals./Realism. 2009 – Turning point? “HPC](https://reader033.fdocuments.us/reader033/viewer/2022042008/5e70c28a958acc7cfb3d376c/html5/thumbnails/22.jpg)
CRICOS No. 00213Ja university for the worldrealR
Logic Fabric Utilisation
Hardware Utilisation
Xilinx Virtex-5 (200MHz, ~130,000 LUTs)
Module LUTs DSPs
exp() 500 2
ln() 1208 0
div() 2580 0
sin()/cos() 3008 0
![Page 23: c Copyright 2012 please consult the authors Notice Changes … · 2013-07-04 · FPGA hype cycle 2005-6 – Peak 2006-8 – Numerous evals./Realism. 2009 – Turning point? “HPC](https://reader033.fdocuments.us/reader033/viewer/2022042008/5e70c28a958acc7cfb3d376c/html5/thumbnails/23.jpg)
Queensland University of Technology
CRICOS No. 00213J
Example 3: FPGA implementation
of a TDMA Solver
![Page 24: c Copyright 2012 please consult the authors Notice Changes … · 2013-07-04 · FPGA hype cycle 2005-6 – Peak 2006-8 – Numerous evals./Realism. 2009 – Turning point? “HPC](https://reader033.fdocuments.us/reader033/viewer/2022042008/5e70c28a958acc7cfb3d376c/html5/thumbnails/24.jpg)
CRICOS No. 00213Ja university for the worldrealR
TDMA
• The Tri-Diagonal Matrix Algorithm (TDMA) is a special case of LU factorisation for solving a linear system of the form.
• Results in a simplified (TDMA or Thomas) algorithmL(1) = L(1)/D(1);
x(1:n) = b(1:n);
for i = 2:n
D(i) = D(i) – L(i-1)*U(i);
x(i) = x(i) – L(i-1)*x(i-1);
L(i) = L(i)/D(i);
End
( 1) 1 ( 1) 1 , 1,i i i ii i i i i ia x a x a x b i n
![Page 25: c Copyright 2012 please consult the authors Notice Changes … · 2013-07-04 · FPGA hype cycle 2005-6 – Peak 2006-8 – Numerous evals./Realism. 2009 – Turning point? “HPC](https://reader033.fdocuments.us/reader033/viewer/2022042008/5e70c28a958acc7cfb3d376c/html5/thumbnails/25.jpg)
CRICOS No. 00213Ja university for the worldrealR
Data Flow for TDMA
• Factorise and Forward Substiution (single iteration)
• Backward Substitution (single iteration)
* *
- -
/
L(i)L(i-1) D(i)U(i)x(i-1)x(i)
x(i) L(i)D(i)
U(i+1) x(i+1) x(i) D(i)
x(i)
*
-
/
![Page 26: c Copyright 2012 please consult the authors Notice Changes … · 2013-07-04 · FPGA hype cycle 2005-6 – Peak 2006-8 – Numerous evals./Realism. 2009 – Turning point? “HPC](https://reader033.fdocuments.us/reader033/viewer/2022042008/5e70c28a958acc7cfb3d376c/html5/thumbnails/26.jpg)
CRICOS No. 00213Ja university for the worldrealR
TDMA Pipeline
SRAM:0
TDMA
L D U X
sel
registers:0
registers:1
sel
BackSub
SRAM:1
L D U X
![Page 27: c Copyright 2012 please consult the authors Notice Changes … · 2013-07-04 · FPGA hype cycle 2005-6 – Peak 2006-8 – Numerous evals./Realism. 2009 – Turning point? “HPC](https://reader033.fdocuments.us/reader033/viewer/2022042008/5e70c28a958acc7cfb3d376c/html5/thumbnails/27.jpg)
CRICOS No. 00213Ja university for the worldrealR
Implementation: TDMA Pipeline
• Ouputs– Factorised L(i),D(i),U(i),X(i)
• Inputs– Row index– newMatrix flag – L(i),D(i),U(i) of Matrix and X(i)
of RHS• Whatever precision (but 32 bit
has been used here)• Behaviour
– Input =>TDMA => input reg block
– Output reg block => BackwardSub => output
– When newMatrix is set the register blocks are swapped
entity TDMA_pipeline is
-- nbits is defined in FP_pkg
generic ( width : integer := nbits;
dim : integer := 5);
port (
clk : in std_logic;
rst : in std_logic;
index : in integer;
newMatrix : in std_logic;
L : in std_logic_vector (width-1 downto 0);
D : in std_logic_vector (width-1 downto 0);
U : in std_logic_vector (width-1 downto 0);
X : in std_logic_vector (width-1 downto 0);
Lf : out std_logic_vector (width-1 downto 0);
Df : out std_logic_vector (width-1 downto 0);
Uf : out std_logic_vector (width-1 downto 0);
Xf : out std_logic_vector (width-1 downto 0);
vld : out std_logic);
end TDMA_pipeline
![Page 28: c Copyright 2012 please consult the authors Notice Changes … · 2013-07-04 · FPGA hype cycle 2005-6 – Peak 2006-8 – Numerous evals./Realism. 2009 – Turning point? “HPC](https://reader033.fdocuments.us/reader033/viewer/2022042008/5e70c28a958acc7cfb3d376c/html5/thumbnails/28.jpg)
CRICOS No. 00213Ja university for the worldrealR
Design: TDMA Pipeline (cont…)
-- port mapping for Backward substitution unit
-- D,U,and X output channels are used to
--produce Xoutput
BS_Unit : BackwardSub port map(
rst => BSrst,
clk => clk,
D => DoutChannel,
U => UoutChannel,
X => XoutChannel,
Xout => Xfinal,
vld => BSvld
);
-- swaps input and output channels between the
-- two storage matrices
swapBuffers : process
begin
wait until clk'event and clk = '0';
if (rst = '1') then
switch <= '0';
else
if (newMatrix = '1') then
switch <= not switch;
end if;
end if;
end process;
-- Controls which set of registers is used for Backward
--Sub input
ControlOutReg : process
begin
wait until clk'event and clk = '1';
case switch is
when '1' => LoutChannel <= Matrix2(dim-index-1,3);
DoutChannel <= Matrix2(dim-index-1,2);
UoutChannel <= Matrix2(dim-index-1,1);
XoutChannel <= Matrix2(dim-index-1,0);
when '0' => LoutChannel <= Matrix1(dim-index-1,3);
DoutChannel <= Matrix1(dim-index-1,2);
UoutChannel <= Matrix1(dim-index-1,1);
XoutChannel <= Matrix1(dim-index-1,0);
when others => LoutChannel <= Matrix2(dim-index-1,3);
DoutChannel <= Matrix2(dim-index-1,2);
UoutChannel <= Matrix2(dim-index-1,1);
XoutChannel <= Matrix2(dim-index-1,0);
end case;
end process;
![Page 29: c Copyright 2012 please consult the authors Notice Changes … · 2013-07-04 · FPGA hype cycle 2005-6 – Peak 2006-8 – Numerous evals./Realism. 2009 – Turning point? “HPC](https://reader033.fdocuments.us/reader033/viewer/2022042008/5e70c28a958acc7cfb3d376c/html5/thumbnails/29.jpg)
CRICOS No. 00213Ja university for the worldrealR
Design: TDMA Pipeline (cont…)
-- port mapping ofr TDMA unit
-- inputs are used to produce L,D,U,and X input cahnnels
TDMA_Unit : TDMA port map(
rst => newMatrix,
clk => clk,
L => L,
D => D,
U => U,
X => X,
Lout => LinChannel,
Dout => DinChannel,
Uout => UinChannel,
Xout => XinChannel,
vld => TDMAvld
);
-- Controls which set of registers is used for TDMA output
ControlInReg : process
Begin
wait until clk'event and clk = '1';
case switch is
when '1' => Matrix1(indexIn,3) <= LinChannel;Matrix1(indexIn,2) <= DinChannel;Matrix1(indexIn,1) <= UinChannel;
Matrix1(indexIn,0) <= XinChannel;
when '0' => Matrix2(indexIn,3) <= LinChannel;
Matrix2(indexIn,2) <= DinChannel;
Matrix2(indexIn,1) <= UinChannel;
Matrix2(indexIn,0) <= XinChannel;
when others => Matrix1(indexIn,3) <= LinChannel;
Matrix1(indexIn,2) <= DinChannel;
Matrix1(indexIn,1) <= UinChannel;
Matrix1(indexIn,0) <= XinChannel;
end case;
end process;
![Page 30: c Copyright 2012 please consult the authors Notice Changes … · 2013-07-04 · FPGA hype cycle 2005-6 – Peak 2006-8 – Numerous evals./Realism. 2009 – Turning point? “HPC](https://reader033.fdocuments.us/reader033/viewer/2022042008/5e70c28a958acc7cfb3d376c/html5/thumbnails/30.jpg)
CRICOS No. 00213Ja university for the worldrealR
alg_block (Simulation)
![Page 31: c Copyright 2012 please consult the authors Notice Changes … · 2013-07-04 · FPGA hype cycle 2005-6 – Peak 2006-8 – Numerous evals./Realism. 2009 – Turning point? “HPC](https://reader033.fdocuments.us/reader033/viewer/2022042008/5e70c28a958acc7cfb3d376c/html5/thumbnails/31.jpg)
CRICOS No. 00213Ja university for the worldrealR
Clock Considerations
• Set out to minimise total clock cycles– Less cycles means less time right?
• Circuit can become too complex – Maximum Combinational path becomes very long– Circuit must operate at a lower clock frequency
• Future Work for TDMA– The TDMA and Backsub should be optimised (instructions
pipelined)– Attempt to reduce the combinational path– Optimise the floating point package (Using IEEE proposed
makes it worse)
![Page 32: c Copyright 2012 please consult the authors Notice Changes … · 2013-07-04 · FPGA hype cycle 2005-6 – Peak 2006-8 – Numerous evals./Realism. 2009 – Turning point? “HPC](https://reader033.fdocuments.us/reader033/viewer/2022042008/5e70c28a958acc7cfb3d376c/html5/thumbnails/32.jpg)
CRICOS No. 00213Ja university for the worldrealR
Performance
• For 1024*16 bytes of data– CPU (gcc –O0)
• 0.2 seconds– CPU (gcc -O2)
• 0.015 seconds– Simulated runtime:
• 0.000672 seconds– Real runtime:
• 0.58 seconds
![Page 33: c Copyright 2012 please consult the authors Notice Changes … · 2013-07-04 · FPGA hype cycle 2005-6 – Peak 2006-8 – Numerous evals./Realism. 2009 – Turning point? “HPC](https://reader033.fdocuments.us/reader033/viewer/2022042008/5e70c28a958acc7cfb3d376c/html5/thumbnails/33.jpg)
CRICOS No. 00213Ja university for the worldrealR
Theory Vs Practice
“In theory, there is no difference between theory
and practice. But, in practice, there is”
~ Albert Einstein
![Page 34: c Copyright 2012 please consult the authors Notice Changes … · 2013-07-04 · FPGA hype cycle 2005-6 – Peak 2006-8 – Numerous evals./Realism. 2009 – Turning point? “HPC](https://reader033.fdocuments.us/reader033/viewer/2022042008/5e70c28a958acc7cfb3d376c/html5/thumbnails/34.jpg)
CRICOS No. 00213Ja university for the worldrealR
Summary
• Pipelined TDMA Solver has been implemented• Data transfer is a major overhead
– Possibly solved by using streaming DMA– Using half precision (16 bit) floats would allow two TDMAs to be
solve in parallel • Future Work
– Improve clock rate on current implementation• Optimise TDMA and backwardSub modules• Improve Floating point circuits
– Re-implement for available hardware e.g. Nallatech Board– Consider implementation using streaming DMA
![Page 35: c Copyright 2012 please consult the authors Notice Changes … · 2013-07-04 · FPGA hype cycle 2005-6 – Peak 2006-8 – Numerous evals./Realism. 2009 – Turning point? “HPC](https://reader033.fdocuments.us/reader033/viewer/2022042008/5e70c28a958acc7cfb3d376c/html5/thumbnails/35.jpg)
Queensland University of Technology
CRICOS No. 00213J
Wrap-up
Raptor-Xpress 64x FPGA system
![Page 36: c Copyright 2012 please consult the authors Notice Changes … · 2013-07-04 · FPGA hype cycle 2005-6 – Peak 2006-8 – Numerous evals./Realism. 2009 – Turning point? “HPC](https://reader033.fdocuments.us/reader033/viewer/2022042008/5e70c28a958acc7cfb3d376c/html5/thumbnails/36.jpg)
CRICOS No. 00213Ja university for the worldrealR
Where to next?
• Continue skill-up in configurable problems– Software/hardware co-processing of sci. apps.– Fully hardware implementation of algorithms in selected areas e.g.
automotive, aerospace avionics and robotics (GAs, path planning, fault tolerance, signal/image/sensor data processing)
• Investigate higher productivity tools– Mitrion-C, DIME-C, Xilinx AutoESL
• Algorithm redevelopment(?)– From Von Neumann to Data Flow
• Develop/Deploy on FPGA clusters for HPC– E.g. NHI/Bielefeld Raptor-x64 & -Xpress multi-FPGA systems
• Future Work: Truly reconfigurable HPC applications?
![Page 37: c Copyright 2012 please consult the authors Notice Changes … · 2013-07-04 · FPGA hype cycle 2005-6 – Peak 2006-8 – Numerous evals./Realism. 2009 – Turning point? “HPC](https://reader033.fdocuments.us/reader033/viewer/2022042008/5e70c28a958acc7cfb3d376c/html5/thumbnails/37.jpg)
CRICOS No. 00213Ja university for the worldrealR
Questions?