Evaluation of Offset Assignment Heuristics

64
Evaluation of Offset Assignment Heuristics Johnny Huynh, Jose Nelson Amaral, Paul Berube University of Alberta, Canada Sid-Ahmed-Ali Touati Universite de Versailles, France

description

Evaluation of Offset Assignment Heuristics. Johnny Huynh, Jose Nelson Amaral, Paul Berube University of Alberta, Canada Sid-Ahmed-Ali Touati Universite de Versailles, France. Outline. Background Traditional Approach to Offset Assignment Simple Offset Assignment - PowerPoint PPT Presentation

Transcript of Evaluation of Offset Assignment Heuristics

Page 1: Evaluation of Offset Assignment Heuristics

Evaluation of Offset Assignment Heuristics

Johnny Huynh, Jose Nelson Amaral, Paul BerubeUniversity of Alberta, Canada

Sid-Ahmed-Ali TouatiUniversite de Versailles, France

Page 2: Evaluation of Offset Assignment Heuristics

Outline

• Background• Traditional Approach to Offset Assignment

• Simple Offset Assignment• Address-Register Assignment

• Improving the Problem Model• Optimal Address-Code Generation• Memory Layout Permutations

• Evaluating Current Heuristics• Methodology• Results

• Conclusions and Future Work

Page 3: Evaluation of Offset Assignment Heuristics

Outline

• Background• Traditional Approach to Offset Assignment

• Simple Offset Assignment• Address-Register Assignment

• Improving the Problem Model• Optimal Address-Code Generation• Memory Layout Permutations

• Evaluating Current Heuristics• Methodology• Results

• Conclusions and Future Work

Page 4: Evaluation of Offset Assignment Heuristics

Background

• Digital Signal Processors (DSPs) have few general purpose registers

• Program variables kept in memory• Address Registers (AR) used to access

variables• After a variable is accessed, the AR can be

auto-incremented (or decremented) by one word in the same cycle.

Page 5: Evaluation of Offset Assignment Heuristics

Processor Model

• Texas Instruments TMS320C54X DSP family:• Accumulator-based DSP• 8 Address Registers• Initializing an address register requires 2 cycles of

overhead• Explicit address computations require 1 cycle of

overhead• Using auto-increment (or auto-decrement) has no

overhead.

Page 6: Evaluation of Offset Assignment Heuristics

Processor ModelExample: add ‘A’ and ‘B’, store in accumulator

$AR0 = &A

$ACC = *$AR0

$AR0 = $AR0 + 2

$ACC += *$AR0

$AR0 = &A$ACC = *$AR0++$ACC += *$AR0

Explicit address computationAuto-Increment

A C B A B C0x1000 0x1001 0x1002 0x1000 0x1001 0x1002

Page 7: Evaluation of Offset Assignment Heuristics

Processor ModelExample: add ‘A’ and ‘B’, store in accumulator

$AR0 = &A

$ACC = *$AR0

$AR0 = $AR0 + 2

$ACC += *$AR0

$AR0 = &A$ACC = *$AR0++$ACC += *$AR0

Explicit address computationAuto-Increment

A C B A B C0x1000 0x1001 0x1002 0x1000 0x1001 0x1002

Page 8: Evaluation of Offset Assignment Heuristics

The Offset-Assignment Problem

• Given k address registers and a basic block accessing n variables, find a memory layout that minimizes address-computation overhead.

• How should the variables be placed in memory?• Which register should access each variable?

Page 9: Evaluation of Offset Assignment Heuristics

Outline

• Background• Traditional Approach to Offset Assignment

• Simple Offset Assignment• Address-Register Assignment

• Improving the Problem Model• Optimal Address-Code Generation• Memory Layout Permutations

• Evaluating Current Heuristics• Methodology• Results

• Conclusions and Future Work

Page 10: Evaluation of Offset Assignment Heuristics

Traditional Approach to Offset AssignmentAccess

Sequence

Address RegisterAssignment

Sub-SequenceSub-Sequence Sub-Sequence

Sub-Layout

Simple OffsetAssignment

Sub-Layout

Simple OffsetAssignment

Sub-Layout

Simple OffsetAssignment

Basic BlockGenerate

Access Sequence

Address-ComputationOverhead

Address-CodeGeneration

Page 11: Evaluation of Offset Assignment Heuristics

Traditional Approach:Simple Offset Assignment (SOA)• In 1992, Bartley introduced the simplest form of the offset

assignment problem:

Given a single address register and basic block with n variables, find a memory layout that minimizes overhead.

• Equivalent to finding a maximum weight path cover (NP-complete)• Many researchers have proposed heuristics for this problem:

• Liao et. al. (1996)• Leupers and Marwedel (1996)• Sugino et. al. (1996)

Page 12: Evaluation of Offset Assignment Heuristics

Simple Offset Assignment (SOA)•Fix the access sequence

•Assume only one address register (k = 1)

•Find an ordering of variables in memory (memory layout) that has minimum overhead.

AB

D

FC

E

22

2

2

Ex.

Access Sequence: ‘a d b e c f b e c f a d’

Memory Layout:

Page 13: Evaluation of Offset Assignment Heuristics

Simple Offset Assignment (SOA)• Create Access Graph G = (V, E)

• V = variables

• weight of edge is the frequency of consecutive accesses

• A path defines a memory layout -- Find the Maximum Weight Path Cover

• NP-Complete!

AB

D

FC

E

22

2

2

Ex.

Access Sequence: ‘a d b e c f b e c f a d’

Memory Layout:

Page 14: Evaluation of Offset Assignment Heuristics

Simple Offset Assignment (SOA)• Create Access Graph G = (V, E)

• V = variables

• weight of edge is the frequency of consecutive accesses

• A path defines a memory layout -- Find the Maximum Weight Path Cover

• NP-Complete!

AB

D

FC

E

22

2

2

Ex.

Access Sequence: ‘a d b e c f b e c f a d’

Memory Layout: d a f c e b

Page 15: Evaluation of Offset Assignment Heuristics

Traditional Approach:General Offset Assignment (GOA)• Problem presented by Liao et. al. in 1996.• Given k address registers, and a basic block with n variables, find

an assignment of variables to address registers that minimizes the total overhead of all registers.

• This problem formulation is more accurately described as Address-Register Assignment (ARA).

• Consists of SOA problems, and is at least NP-hard.• Many researchers have proposed heuristics for address-register

assignment:• Leupers and Marwedel (1996)• Sugino et. al. (1996)• Zhuang et. al. (2003)

Page 16: Evaluation of Offset Assignment Heuristics

General Offset Assignment (GOA)

• Fix the access sequence• Allow multiple address registers (k>1)• Find an ordering of variables in memory

(memory layout) that has minimum overhead.

• Assign each variable to an address register to form access sub-sequences.

AB

D

FC

E

22

2

2

Ex.

Access Sequence: ‘a d b e c f b e c f a d’

Sub-sequence1: ‘a b c b c a’

Sub-sequence2: ‘d e f e f d’

Page 17: Evaluation of Offset Assignment Heuristics

General Offset Assignment (GOA)

AB

D

FC

E2

2

Ex.

Access Sequence: ‘a d b e c f b e c f a d’

Sub-sequence1: ‘a b c b c a’

Sub-sequence2: ‘d e f e f d’

• Each sub-sequence can be viewed as an independent SOA problem.

• Solve each sub-sequence as independent SOA problems.

• More appropriate to call this problem the Address Register Assignment (ARA) problem.

• Requires solving SOA instances, so is at least NP-hard.

Page 18: Evaluation of Offset Assignment Heuristics

General Offset Assignment (GOA)

AB

D

FC

E2

2

Ex.

Access Sequence: ‘a d b e c f b e c f a d’

Memory Layouts: a b c d e f

• Each sub-sequence can be viewed as an independent SOA problem.

• Solve each sub-sequence as independent SOA problems.

• More appropriate to call this problem the Address Register Assignment (ARA) problem.

• Requires solving SOA instances, so is at least NP-hard.

Page 19: Evaluation of Offset Assignment Heuristics

Address-Code Generation

• Recall that variables are assigned to address registers.

• There is nothing left to decide – each address register has a defined sequence of accesses.

• Imposes a restriction that all access to a variable is done by a single address register.

AB

D

FC

E2

2

Ex.

Access Sequence: ‘a d b e c f b e c f a d’

Memory Layouts: a b c d e f

AR0 AR1

Page 20: Evaluation of Offset Assignment Heuristics

Address-Code Generation

• Recall that variables are assigned to address registers.

• There is nothing left to decide – each address register has a defined sequence of accesses.

• Imposes a restriction that all access to a variable is done by a single address register.

AB

D

FC

E2

2

Ex.

Access Sequence: ‘a d b e c f b e c f a d’

Memory Layouts: a b c d e f

AR0 AR1

Page 21: Evaluation of Offset Assignment Heuristics

Address-Code Generation

• Recall that variables are assigned to address registers.

• There is nothing left to decide – each address register has a defined sequence of accesses.

• Imposes a restriction that all access to a variable is done by a single address register.

AB

D

FC

E2

2

Ex.

Access Sequence: ‘a d b e c f b e c f a d’

Memory Layouts: a b c d e f

AR0 AR1

Page 22: Evaluation of Offset Assignment Heuristics

Address-Code Generation

• Recall that variables are assigned to address registers.

• There is nothing left to decide – each address register has a defined sequence of accesses.

• Imposes a restriction that all access to a variable is done by a single address register.

AB

D

FC

E2

2

Ex.

Access Sequence: ‘a d b e c f b e c f a d’

Memory Layouts: a b c d e f

AR0 AR1

Page 23: Evaluation of Offset Assignment Heuristics

Address-Code Generation

• Recall that variables are assigned to address registers.

• There is nothing left to decide – each address register has a defined sequence of accesses.

• Imposes a restriction that all access to a variable is done by a single address register.

AB

D

FC

E2

2

Ex.

Access Sequence: ‘a d b e c f b e c f a d’

Memory Layouts: a b c d e f

AR0 AR1

Page 24: Evaluation of Offset Assignment Heuristics

Address-Code Generation

• Recall that variables are assigned to address registers.

• There is nothing left to decide – each address register has a defined sequence of accesses.

• Imposes a restriction that all access to a variable is done by a single address register.

AB

D

FC

E2

2

Ex.

Access Sequence: ‘a d b e c f b e c f a d’

Memory Layouts: a b c d e f

AR0 AR1

Page 25: Evaluation of Offset Assignment Heuristics

Address-Code Generation

• Recall that variables are assigned to address registers.

• There is nothing left to decide – each address register has a defined sequence of accesses.

• Imposes a restriction that all access to a variable is done by a single address register.

AB

D

FC

E2

2

Ex.

Access Sequence: ‘a d b e c f b e c f a d’

Memory Layouts: a b c d e f

AR0 AR1

Page 26: Evaluation of Offset Assignment Heuristics

Address-Code Generation

• Recall that variables are assigned to address registers.

• There is nothing left to decide – each address register has a defined sequence of accesses.

• Imposes a restriction that all access to a variable is done by a single address register.

AB

D

FC

E2

2

Ex.

Access Sequence: ‘a d b e c f b e c f a d’

Memory Layouts: a b c d e f

AR0 AR1

*Requires Explicit Address Computations

Page 27: Evaluation of Offset Assignment Heuristics

‘a d b e c f b e c f a d’

‘a b c b c a’ ‘d e f e f d’

[a, b, c] [d, e, f]

Simple OffsetAssignment

Simple OffsetAssignment

Address Register Assignment

Sub-sequence and memory layout accessed by AR0

Sub-sequence and memory layout accessed by AR1

Traditional Approach to Offset Assignment

Page 28: Evaluation of Offset Assignment Heuristics

Outline

• Background• Traditional Approach to Offset Assignment

• Simple Offset Assignment• Address-Register Assignment

• Improving the Problem Model• Optimal Address-Code Generation• Memory Layout Permutations

• Evaluating Current Heuristics• Methodology• Results

• Conclusions and Future Work

Page 29: Evaluation of Offset Assignment Heuristics

Optimal Address-Code Generation

• Given a fixed access sequence and memory layout, it is possible to generate optimal addressing-code in polynomial time:

• Minimum-Cost Circulation (Gebotys, 1997)

• Minimum-Weight Perfect Matching (Udayanarayanan, 2000)

Page 30: Evaluation of Offset Assignment Heuristics

Optimal Address-Code Generation•Build a network-flow graph

•Vertices represent variable accesses

•For each access ai that occurs before another aj, there is an edge (ai,aj) (not all shown the graph).

•Edges represent an opportunity for a register to access variables.

•Each unit flow represents the accesses performed by an address register.

•Optimal Address-Code is found by finding a minimum-cost circulation.

Acc

ess

Se

qu

en

ce

Memory Layout

FEDACBAR2

AR1

D

A

F

C

E

B

F

C

E

B

D

A

a3

a5

a7

a9

a11

a12

a1

a2

a4

a6

a8

a10

S

T Capacity = number of ARs

Cost = initialization overhead

Outbound edges from S

Cost = 0

Inbound edges to T

Cost = 0

Edge costs

Dependent on distance

Between variables accessed

All vertices require

one unit of flow

Page 31: Evaluation of Offset Assignment Heuristics

Traditional Approach to Offset AssignmentAccess

Sequence

Address RegisterAssignment

Sub-Sequence

Sub-Layout

Simple OffsetAssignment

Address-ComputationOverhead

Address-CodeGeneration

Sub-Sequence

Sub-Layout

Simple OffsetAssignment

Sub-Sequence

Sub-Layout

Simple OffsetAssignment

NP-Hard

NP-Complete

Solved, but not used!

Page 32: Evaluation of Offset Assignment Heuristics

Memory Layout Permutations (MLP)• Since optimal address-code generation

algorithms exist, they can be applied after a memory layout is formed (by traditional approaches).

• However, the traditional approach generates multiple sub-layouts that were originally assumed to be independent.

• How is a single memory layout formed from a set of sub-layouts?

Page 33: Evaluation of Offset Assignment Heuristics

Memory Layout Permutations

• Let Mi be a memory sub-layout.

• Let Mir be the reciprocal of Mi

• Given an access sequence and m memory sub-layouts, arrange {(M1|M1

r),…,(Mm|Mmr)}, such

that overhead is minimum when the sub-layouts are placed contiguously in memory.

spermuation unique 2

)2)(!( are therelayouts,-sub ''given

mmm

Page 34: Evaluation of Offset Assignment Heuristics

Memory Layout Permutations

Example:

‘a d b e c f b e c f a d’

‘a b c b c a’ ‘d e f e f d’

{a, b, c} {d, e, f}

[a, b, c, d, e, f], [f, e, d, c, b, a][c, b, a, d, e, f], [f, e, d, a, b, c][a, b, c, f, e, d], [d, e, f, c, b, a][c, b, a, f, e, d], [d, e, f, a, b, c]

Simple OffsetAssignment

Simple OffsetAssignment

Address Register Assignment

Memory Layout Permutations

This is an optimal address register assignment

These are optimal simple offset assignments

All possible Memory Layout Permutations (all have cost > 4)

Optimal Layout: {b, c, a, d, e, f} with cost = 4 is not found

Page 35: Evaluation of Offset Assignment Heuristics

Outline

• Background• Traditional Approach to Offset Assignment

• Simple Offset Assignment• Address-Register Assignment

• Improving the Problem Model• Optimal Address-Code Generation• Memory Layout Permutations

• Evaluating Current Heuristics• Methodology• Results

• Conclusions and Future Work

Page 36: Evaluation of Offset Assignment Heuristics

Experimental MethodologyEvaluating the Solution Space

• Testcases are DSP code kernels from the UTDSP benchmark suite.

• Use gcc to obtain access sequences.• The quality of a memory layout is evaluated

using the minimum-cost circulation technique.• The entire solution space is found for each

access sequence, to be used as a point of reference.

Basic Block

Compile with gcc

AccessSequence

Distribution of Overheads

1

10

100

1000

10000

100000

1000000

5 6 7 8 9 10 11 12 13

Overhead (Cycles)

Frequency (Layouts)

Compute Overhead of All Layouts using Minimum-Cost FlowKernel Accesses Variables Possible #

of layouts

iir_arr 21 8 20,160

iir_arr_swp 33 12 239,500,800

latnrm_arr_swp 30 10 1,824,400

latnrm_ptr 30 10 1,824,400

latnrm_ptr_swp 30 10 1,824,400

Page 37: Evaluation of Offset Assignment Heuristics

Experimental MethodologyEvaluating Current Heuristics

• Identified and implemented three Address-Register Assignment heuristic algorithms:

• Leupers• Sugino• Zhuang

Leupers Sugino Zhuang

Liao Leupers ALOMA OFU B&B

Access Sequence

Sub-Sequences

Sub-Layouts

Memory Layout Permutations

Memory Layouts

Compute Overheadfor each layout

via Minimum-Cost Circulation

Distribution ofOverhead values

Page 38: Evaluation of Offset Assignment Heuristics

Experimental MethodologyEvaluating Current Heuristics

• Identified and implemented five Simple Offset Assignment heuristic algorithms:

• Liao• Leupers• ALOMA• Order-First Use (OFU)• Branch and Bound (B&B)

Leupers Sugino Zhuang

Liao Leupers ALOMA OFU B&B

Access Sequence

Sub-Sequences

Sub-Layouts

Memory Layout Permutations

Memory Layouts

Compute Overheadfor each layout

via Minimum-Cost Circulation

Distribution ofOverhead values

Page 39: Evaluation of Offset Assignment Heuristics

Experimental MethodologyEvaluating Current Heuristics

• Each combination of ARA and SOA algorithm generates a set of sub-layouts.

• All possible memory layout permutations are generated, forming a set of memory layouts.

• Each memory layout is evaluated using the Minimum-Cost Circulation technique.

Leupers Sugino Zhuang

Liao Leupers ALOMA OFU B&B

Access Sequence

Sub-Sequences

Sub-Layouts

Memory Layout Permutations

Memory Layouts

Compute Overheadfor each layout

via Minimum-Cost Circulation

Distribution ofOverhead values

Page 40: Evaluation of Offset Assignment Heuristics

Results

• The 15 combinations of algorithms produce 15 distributions overhead values.

• The distributions are aggregated into one distribution.

• The aggregate distributions represent the solution space of all current algorithms.

Page 41: Evaluation of Offset Assignment Heuristics

Results

• Memory layouts have a significant impact on overhead.

• Some layouts have 100% higher overhead than the minimum.

• Over 99% of all layouts have an overhead that is 50% higher than the minimum.

Page 42: Evaluation of Offset Assignment Heuristics

Results

• Memory layouts produced by traditional approaches have a large range of possible overhead values -- sometimes the same as the entire solution space itself.

• In some cases, no combination of ARA and SOA heuristics can produce an optimal layout.

Page 43: Evaluation of Offset Assignment Heuristics

Results

• Memory layouts produced by traditional approaches have a large range of possible overhead values -- sometimes the same as the entire solution space itself.

• In some cases, no combination of ARA and SOA heuristics can produce an optimal layout.

Page 44: Evaluation of Offset Assignment Heuristics

Distribution of Overhead ValuesTestcase: iir_arr_swp -- infinite impulse response filter

Overhead (cycles) Exhaustive Algorithmic

6 144 0

7 19557 72

8 1514917 2240

9 21757157 6516

10 90478895 10496

11 104101226 2565

12 21628904 0

Average Overhead 10.51 9.6

Page 45: Evaluation of Offset Assignment Heuristics

1

10

100

1000

10000

100000

1000000

10000000

100000000

1000000000

6 7 8 9 10 11 12

Overhead (cycles)

Frequency

Exhaustive Solution SpaceTestcase: iir_arr_swp -- infinite impulse response filter

Page 46: Evaluation of Offset Assignment Heuristics

Algorithmic Solution SpaceTestcase: iir_arr_swp -- infinite impulse response filter

0

2000

4000

6000

8000

10000

12000

6 7 8 9 10 11 12

Overhead (cycles)

Frequency

Page 47: Evaluation of Offset Assignment Heuristics

Efficiency of SOA Algorithms

• For each SOA algorithm, combine with each of the 5 ARA algorithms to generate 5 distributions of overhead values.

• The distributions can be aggregated to form a single distribution.

Leupers Sugino Zhuang

Liao Leupers ALOMA OFU B&B

Access Sequence

Sub-Sequences

Sub-Layouts

Memory Layout Permutations

Memory Layouts

Compute Overheadfor each layout

via Minimum-Cost Circulation

Distribution ofOverhead values

Page 48: Evaluation of Offset Assignment Heuristics

Efficiency of SOA Algorithms

• For each SOA algorithm, combine with each of the 5 ARA algorithms to generate 5 distributions of overhead values.

• The distributions can be aggregated to form a single distribution.

Leupers Sugino Zhuang

Liao Leupers ALOMA OFU B&B

Access Sequence

Sub-Sequences

Sub-Layouts

Memory Layout Permutations

Memory Layouts

Compute Overheadfor each layout

via Minimum-Cost Circulation

Distribution ofOverhead values

Page 49: Evaluation of Offset Assignment Heuristics

Efficiency of SOA Algorithms

• For each SOA algorithm, combine with each of the 5 ARA algorithms to generate 5 distributions of overhead values.

• The distributions can be aggregated to form a single distribution.

Leupers Sugino Zhuang

Liao Leupers ALOMA OFU B&B

Access Sequence

Sub-Sequences

Sub-Layouts

Memory Layout Permutations

Memory Layouts

Compute Overheadfor each layout

via Minimum-Cost Circulation

Distribution ofOverhead values

Page 50: Evaluation of Offset Assignment Heuristics

Efficiency of SOA Algorithms

• For each SOA algorithm, combine with each of the 5 ARA algorithms to generate 5 distributions of overhead values.

• The distributions can be aggregated to form a single distribution.

Leupers Sugino Zhuang

Liao Leupers ALOMA OFU B&B

Access Sequence

Sub-Sequences

Sub-Layouts

Memory Layout Permutations

Memory Layouts

Compute Overheadfor each layout

via Minimum-Cost Circulation

Distribution ofOverhead values

Page 51: Evaluation of Offset Assignment Heuristics

Efficiency of SOA Algorithms

• For each SOA algorithm, combine with each of the 5 ARA algorithms to generate 5 distributions of overhead values.

• The distributions can be aggregated to form a single distribution.

Leupers Sugino Zhuang

Liao Leupers ALOMA OFU B&B

Access Sequence

Sub-Sequences

Sub-Layouts

Memory Layout Permutations

Memory Layouts

Compute Overheadfor each layout

via Minimum-Cost Circulation

Distribution ofOverhead values

Page 52: Evaluation of Offset Assignment Heuristics

Overhead (cycles) Liao Leupers Sugino B&B OFU

6 0 0 0 0 0

7 6 6 10 6 44

8 293 293 357 293 1004

9 960 960 1187 960 2448

10 2154 2154 2124 2154 1910

11 619 619 354 619 354

12 0 0 0 0 0

Efficiency of SOA AlgorithmsTestcase: iir_arr_swp -- infinite impulse response filter

Page 53: Evaluation of Offset Assignment Heuristics

Efficiency of SOA AlgorithmsTestcase: iir_arr_swp -- infinite impulse response filter

0

500

1000

1500

2000

2500

3000

6 7 8 9 10 11

Overhead (cycles)

Fre

qu

en

cy

Liao

Leupers

Sugino

BNB

OFU

Page 54: Evaluation of Offset Assignment Heuristics

Efficiency of ARA Algorithms

• For each ARA algorithm, combine with each of the 3 SOA algorithms to generate 3 distributions of overhead values.

• The distributions can be aggregated to form a single distribution.

Leupers Sugino Zhuang

Liao Leupers ALOMA OFU B&B

Access Sequence

Sub-Sequences

Sub-Layouts

Memory Layout Permutations

Memory Layouts

Compute Overheadfor each layout

via Minimum-Cost Circulation

Distribution ofOverhead values

Page 55: Evaluation of Offset Assignment Heuristics

Efficiency of ARA Algorithms

• For each ARA algorithm, combine with each of the 3 SOA algorithms to generate 3 distributions of overhead values.

• The distributions can be aggregated to form a single distribution.

Leupers Sugino Zhuang

Liao Leupers ALOMA OFU B&B

Access Sequence

Sub-Sequences

Sub-Layouts

Memory Layout Permutations

Memory Layouts

Compute Overheadfor each layout

via Minimum-Cost Circulation

Distribution ofOverhead values

Page 56: Evaluation of Offset Assignment Heuristics

Efficiency of ARA Algorithms

• For each ARA algorithm, combine with each of the 3 SOA algorithms to generate 3 distributions of overhead values.

• The distributions can be aggregated to form a single distribution.

Leupers Sugino Zhuang

Liao Leupers ALOMA OFU B&B

Access Sequence

Sub-Sequences

Sub-Layouts

Memory Layout Permutations

Memory Layouts

Compute Overheadfor each layout

via Minimum-Cost Circulation

Distribution ofOverhead values

Page 57: Evaluation of Offset Assignment Heuristics

Efficiency of ARA AlgorithmsTestcase: iir_arr_swp -- infinite impulse response filter

Overhead (cycles) Leupers Sugino Zhuang

6 0 0 0

7 2 61 9

8 204 1483 553

9 2089 1018 3408

10 4740 126 5630

11 2565 0 0

12 0 0 0

Page 58: Evaluation of Offset Assignment Heuristics

Efficiency of ARA AlgorithmsTestcase: iir_arr_swp -- infinite impulse response filter

0

1000

2000

3000

4000

5000

6000

6 7 8 9 10 11 12

Overhead (Cycles)

Fre

qu

ency Leupers

Sugino

Zhuang

Page 59: Evaluation of Offset Assignment Heuristics

Evaluating Offset Assignment Algorithms• There is low variability between SOA algorithms -- may

be attributed to small problem sizes.• The choice of ARA algorithm has more impact on

overhead. Much of the variability attributed to the different number of address registers used.

• For all combinations of SOA and ARA algorithms, the permutation of sub-layouts affects the overhead.

Page 60: Evaluation of Offset Assignment Heuristics

Outline

• Background• Traditional Approach to Offset Assignment

• Simple Offset Assignment• Address-Register Assignment

• Improving the Problem Model• Optimal Address-Code Generation• Memory Layout Permutations

• Evaluating Current Heuristics• Methodology• Results

• Conclusions and Future Work

Page 61: Evaluation of Offset Assignment Heuristics

Conclusions

• The objective is to minimize address-computation overhead.

• Given a fixed access sequence and memory layout, the minimum-cost circulation (MCC) technique can minimize overhead.

• Offset assignment algorithms should be evaluated with MCC.

• Offset assignment still has a significant impact on overhead.

• To be effective, current offset assignment algorithms (ARA,SOA) must address the Memory Layout Permutation problem.

Page 62: Evaluation of Offset Assignment Heuristics

Future Work

• A new algorithm is needed to generate memory layouts that will minimize overhead as computed by the Minimum-Cost Flow technique.

• Address-computation overhead must be minimized for loop bodies and for variables that are live between basic blocks and procedures.

Page 63: Evaluation of Offset Assignment Heuristics

References

• Gebotys, C.: DSP address optimization using a minimum cost circulation technique. Proceedings of the 1997 IEEE/ACM International Conference on Computer-Aided Design. 100-103.

• Leupers, R., Marwedel, P.: Algorithms for address assignment in DSP code generation. Proceedins of the 1996 IEEE/ACM International Conference on Computer-Aided Design. 109-112.

• Liao, S., Devadas, S., Keutzer, K., Tjiang, S., Wang, A.: Storage assignment to decrease code size. ACM Transactions of Programming Languages and Systems 18(3) (1996). 235-253.

• Sugino, N., Iimuro, S., Nishihara, A., Jujii, N.: DSP code optimization utilizing memory addressing operation. IEICE Transaction Fundamentals 8 (1996). 1217-1223.

• Zhuang, X., Lau, C., Pande, S.: Storage assignment optimizations through variable coalescence for embedded processors. Proceedings of the 2003 ACM SIGPLAN Conference on Language, Compiler, and Tools for Embedded Systems. 220-231.

• Bartley, D.H.: Optimizing stack frame accesses for processors with restricted addressing modes. Software – Practice & Experience 22(2) (2001). 158-172.

Page 64: Evaluation of Offset Assignment Heuristics

Questions?