Byte-Sized Potential: Can Compassion & Citizenship Go Viral?
0507036
-
Upload
meraz-rizel -
Category
Education
-
view
304 -
download
0
description
Transcript of 0507036
REFERENCE:
PUBLISHED BY THE IEEE COMPUTER SOCIETY, JULY 2008
Presented by: Md. Merazul Islam 0507036
Dept. of CSE, KUET
WARP PROCESSING ? Dynamically optimize the software to
improve execution time and energy consumption.
A new architecture implementing with both H/W & S/W.
Transform binary kernel into FPGA circuit. Fully dynamic and generate entire
coprocessing circuits beyond functional units. It can also works with multiple processors.
Md. Merazul Islam, Dept. of CSE, KUET
FPGA CIRCUIT ? Field Programmable Gate Array:
Programmable. FPGA do Bit Manipulation Fast. FPGAs aren't Part of Mainstream Computing. Supports any compiler, any language,
multiple sources etc.
Figure:In the CAD-oriented FPGA, the configurable logic block inputs and outputs are directly connected to the switch matrices.
Md. Merazul Islam, Dept. of CSE, KUET
µPI$
D$
FPGA
Profiler
Dynamic Part. Module (DPM)
Time Energy
SW Only
HW/ SW
Partitioned application executes faster with lower energy consumption
55
WARP ARCHITECTUREProfile application to determine critical regions
22
Profiler
Initially execute application in software only
11
µPI$
D$
Partition critical regions to hardware
33
Dynamic Part.
Module (DPM)
Program configurable logic & update software binary
44
FPGA
Md. Merazul Islam, Dept. of CSE, KUET
µPI$D$
(FPGA)
Profiler
DPM(CAD)
WARP PROCESSING
STEPS BinaryBinary
Decompilation
BinaryHW Bit stream
RT Synthesis
PartitioningBinary Updater
BinaryUpdated Binary
BinaryStd. HW Binary
JIT FPGA Compilation
JIT FPGA Compilation
Tech. Mapping/Packing
Placement
Logic Synthesis
Routing
Md. Merazul Islam, Dept. of CSE, KUET
WARP PROCESSING
STEPS Dynamic Binary Translation Decompilation:
Recover high-level information lost during compilation.
Utilize sophisticated decompilation methods.
RT Synthesis: Converts decompiled CDFG to Boolean
expressions. Detects read/write, memory access pattern,
memory read/write ordering.
discover loops, if-else, etc.
reduce operation sizes, etc.
reroll loops, etc.
Md. Merazul Islam, Dept. of CSE, KUET
WARP PROCESSING
STEPS Logic Synthesis: Optimize hardware circuit created during RT synthesis.
Technology Mapping/Packing: Decompose hardware circuit into basic logic
gates. Traverse logic network combining nodes to form
single-output. Placement: Identify critical path, placing
critical nodes in center of configurable logic fabric.
Routing:Find a path within FPGA to connect source and
sinks of each net.Represent routing nets between CLBs as routing
between SMs. Md. Merazul Islam, Dept. of CSE, KUET
RESULTS Execution Time and Memory Requirements
(a) a commercial FPGA CAD tool running on a desktop workstation (b) the Riverside Dynamic CAD tools on the same workstation, and (c) the RDCAD tools on a lean 40- MHz ARM7 processor.
size time a 120MB
3min
b 3.6MB .108s
c 3.6MB
1.11s
Md. Merazul Islam, Dept. of CSE, KUET
SPEEDUP COMPARISON
[a] Comparison of software execution on a digital signal processor (DSP) and warped execution on a warp processor to a 200-MHz ARM9 on single threaded applications.
[b] Comparison of multithreaded application speedups on various 400-MHz ARM11-based multiprocessors and warp processors.
Md. Merazul Islam, Dept. of CSE, KUET
CONCLUSION
Warp processing shows the technique’s & opening the door to new challenges.
Speed up 2X-100X or even more. 20X less memory usage. 10% more routing resource usage. 38%-94% power reduction.
In the near future, we expect warp processors to achieve speedups much greater than an order of magnitude.
Md. Merazul Islam, Dept. of CSE, KUET