MMA Forum London November 2013 Richard Firminger presentation
Richard Dorrance November 4, 2011
description
Transcript of Richard Dorrance November 4, 2011
Click to edit Master title style
High Speed 3D Tomographyon CPU, GPU, and FPGA
Nicolas GAC, Stéphane Mancini, Michel Desvignes, Dominique Houzet
Reconfigurable MPSoC versus GPU:Performance, Power and Energy Evaluation
Diana Göhringer, Matthias Birk, Yves Dasse-Tiyo,Nicole Ruiter, Michael Hübner, Jürgen Becker
Richard DorranceNovember 4, 2011
Literature Review
Click to edit Master title style
Review
Computed Tomography
Tomography
Basis for CAT scan, MRI, PET, SPECT, etc.
Cross-sectional imagingtechnique using transmissionor reflection data frommultiple angles
Computed Tomography (CT):A form of tomographic reconstruction on computers
3
Cross-Sections by X-Ray Projections
Project X-ray through biological tissue;measure total absorption of ray by tissue
Projection Pθ(t) is the Radontransform of object functionf(x,y):
Total set of projections calledsinogram
4
, cos sinP t f x y x y t dxdy
Phantom and Sinogram
5
Shepp-Logan Phantom
CT Reconstruction
Restore image from projection data
Inverse Radon transform
Most common algorithm is filtered backprojection– “Smear” each projection over image plane
Accuracy of reconstruction depends on the number of detectors and projection angles
6
Original 4 Angles 16 Angles 64 Angles 256 Angles
Note on Filtering
7
No Filtering With Filtering
FBP Algorithm
Input: sinogram sino(θ, N) Output: image img(x,y)
for each θfilter sino(θ,*)for each x
for each yn = x cos θ + y sin θimg(x,y) = sino(θ, n) + img(x,y)
O(N3) algorithm– But highly parallelizable, given sufficient memory
bandwidth; not computationally intensive
8
Click to edit Master title style
High Speed 3D Tomographyon CPU, GPU, and FPGA
Nicolas GAC, Stéphane Mancini, Michel Desvignes, Dominique Houzet
3PA-PET (Pipelined, Prefetch, Parallelized)
10
Algorithms
11
Hardware
CPU– Desktop PC: Pentium 4 (3.2 GHz)– Workstation: bi-Xeon Dual Core (3.0 GHz)
GPU– Nvidia GeForce 8800 GTS (1.2 GHz, 96 Cores)
FPGA– Virtex 4 (200 MHz)
ASIC– Projected/Extrapolated (1.2 GHz)
12
CPU vs. GPU vs. FPGA vs. ASIC
13
w/ Proper Normalization
Hardware Algorithm # of PE [cycles/px] [cycles/px*PE]
Pentium 4 STIR 1 34,505.21 34,505.21
Pentium 4 VBI-flt(v1) 1 169,580.85 169,580.85
Pentium 4 VBI-flt(v2) 1 53,943.45 53,943.45
Pentium 4 VBI-flt(v3) 1 7,750.50 7,750.50
Xeon (Dual Core) STIR 1 16,682.94 16,682.94
Xeon (Dual Core) VBI-flt(v3) 1 3,400.53 3,400.53
Xeon (Dual Core) VBI-flt(v3) 2 1,694.45 3,388.90
Xeon (Dual Core) VBI-flt(v3) 4 854.49 3,417.97
GPU VBI-flt(v4) 96 115.09 11,049.11
GPU VBI-flt(v5) 96 58.13 5,580.36
FPGA VBI-fix 1 484.41 484.41
FPGA VBI-fix 4 149.97 599.89
FPGA VBI-fix 8 101.92 815.35
ASIC VBI-fix 1 580.12 580.12
ASIC VBI-fix 4 248.79 995.16
ASIC VBI-fix 8 156.95 1,255.58
ASIC VBI-fix 40 31.39 1,255.58
14
Click to edit Master title style
Reconfigurable MPSoC versus GPU:Performance, Power and Energy Evaluation
Diana Göhringer, Matthias Birk, Yves Dasse-Tiyo,Nicole Ruiter, Michael Hübner, Jürgen Becker
RAMPSoC
Runtime adaptive multi-processor system-on-chip– ROACH/iBOB-like system from a group out of Germany
16
3D Ultrasound Computed Tomography
Mammography for earlybreast cancer detection
3D USCT works on thesame principles asregular CT scans
17
Hardware
CPU– AMD Athlon 64 3200+ (2.2 GHz, 1 GB RAM)
GPU– Nvidia Tesla C2050 (1.15 GHz, 448 Cores)
FPGA– Xilinx Virtex-4FX100 (125 MHz)
18
CPU vs. GPU vs. FPGA
19
Hardware # of PE [cycles/img] [cycles/img*PE] [W] [1/J]
Athlon 64 1 330,000.00 330,000.00 177 37
GPU 448 3,714.50 1,664,096.00 270 1147
FPGA 8 18,000.00 144,000.00 3.61 1924
References
1. N. GAC, et al., “High Speed 3D Tomography on CPU, GPU, and FPGA,” EURASIP Journal on Embedded Systems, vol. 2008, Article ID 930250, 12 pages, 2008.
2. D. Göhringer, et al., “Reconfigurable MPSoC versus GPU: Performance, power and energy evaluation,” INDIN‘11, pp.848-853, 26-29 July 2011.
3. A. C. Kak and M. Slaney, Principles of Computerized Tomographic Imaging, IEEE Press, 1988.
4. J. Hsieh, Computerized Tomography: Principles, Design, Artifacts, and Recent Advancements, SPIE & Wiley, 2009.
20