0507036

10
REFERENCE: PUBLISHED BY THE IEEE COMPUTER SOCIETY, JULY 2008 Presented by: Md. Merazul Islam 0507036 Dept. of CSE, KUET

description

my first seminar slide.

Transcript of 0507036

Page 1: 0507036

REFERENCE:

PUBLISHED BY THE IEEE COMPUTER SOCIETY, JULY 2008

Presented by: Md. Merazul Islam 0507036

Dept. of CSE, KUET

Page 2: 0507036

WARP PROCESSING ? Dynamically optimize the software to

improve execution time and energy consumption.

A new architecture implementing with both H/W & S/W.

Transform binary kernel into FPGA circuit. Fully dynamic and generate entire

coprocessing circuits beyond functional units. It can also works with multiple processors.

Md. Merazul Islam, Dept. of CSE, KUET

Page 3: 0507036

FPGA CIRCUIT ? Field Programmable Gate Array:

Programmable. FPGA do Bit Manipulation Fast. FPGAs aren't Part of Mainstream Computing. Supports any compiler, any language,

multiple sources etc.

Figure:In the CAD-oriented FPGA, the configurable logic block inputs and outputs are directly connected to the switch matrices.

Md. Merazul Islam, Dept. of CSE, KUET

Page 4: 0507036

µPI$

D$

FPGA

Profiler

Dynamic Part. Module (DPM)

Time Energy

SW Only

HW/ SW

Partitioned application executes faster with lower energy consumption

55

WARP ARCHITECTUREProfile application to determine critical regions

22

Profiler

Initially execute application in software only

11

µPI$

D$

Partition critical regions to hardware

33

Dynamic Part.

Module (DPM)

Program configurable logic & update software binary

44

FPGA

Md. Merazul Islam, Dept. of CSE, KUET

Page 5: 0507036

µPI$D$

(FPGA)

Profiler

DPM(CAD)

WARP PROCESSING

STEPS BinaryBinary

Decompilation

BinaryHW Bit stream

RT Synthesis

PartitioningBinary Updater

BinaryUpdated Binary

BinaryStd. HW Binary

JIT FPGA Compilation

JIT FPGA Compilation

Tech. Mapping/Packing

Placement

Logic Synthesis

Routing

Md. Merazul Islam, Dept. of CSE, KUET

Page 6: 0507036

WARP PROCESSING

STEPS Dynamic Binary Translation Decompilation:

Recover high-level information lost during compilation.

Utilize sophisticated decompilation methods.

RT Synthesis: Converts decompiled CDFG to Boolean

expressions. Detects read/write, memory access pattern,

memory read/write ordering.

discover loops, if-else, etc.

reduce operation sizes, etc.

reroll loops, etc.

Md. Merazul Islam, Dept. of CSE, KUET

Page 7: 0507036

WARP PROCESSING

STEPS Logic Synthesis: Optimize hardware circuit created during RT synthesis.

Technology Mapping/Packing: Decompose hardware circuit into basic logic

gates. Traverse logic network combining nodes to form

single-output. Placement: Identify critical path, placing

critical nodes in center of configurable logic fabric.

Routing:Find a path within FPGA to connect source and

sinks of each net.Represent routing nets between CLBs as routing

between SMs. Md. Merazul Islam, Dept. of CSE, KUET

Page 8: 0507036

RESULTS Execution Time and Memory Requirements

(a) a commercial FPGA CAD tool running on a desktop workstation (b) the Riverside Dynamic CAD tools on the same workstation, and (c) the RDCAD tools on a lean 40- MHz ARM7 processor.

size time a 120MB

3min

b 3.6MB .108s

c 3.6MB

1.11s

Md. Merazul Islam, Dept. of CSE, KUET

Page 9: 0507036

SPEEDUP COMPARISON

[a] Comparison of software execution on a digital signal processor (DSP) and warped execution on a warp processor to a 200-MHz ARM9 on single threaded applications.

[b] Comparison of multithreaded application speedups on various 400-MHz ARM11-based multiprocessors and warp processors.

Md. Merazul Islam, Dept. of CSE, KUET

Page 10: 0507036

CONCLUSION

Warp processing shows the technique’s & opening the door to new challenges.

Speed up 2X-100X or even more. 20X less memory usage. 10% more routing resource usage. 38%-94% power reduction.

In the near future, we expect warp processors to achieve speedups much greater than an order of magnitude.

Md. Merazul Islam, Dept. of CSE, KUET