Seok-Won Seong and Prabhat Mishra University of Florida IEEE Transaction on Computer Aided Design of...

25
Seok-Won Seong and Prabhat Mishra University of Florida IEEE Transaction on Computer Aided Design of Intigrated Systems April 2008, Vol 27, No. 4 Rahul Sridharan 1 of 25

Transcript of Seok-Won Seong and Prabhat Mishra University of Florida IEEE Transaction on Computer Aided Design of...

Page 1: Seok-Won Seong and Prabhat Mishra University of Florida IEEE Transaction on Computer Aided Design of Intigrated Systems April 2008, Vol 27, No. 4 Rahul.

Seok-Won Seong and Prabhat MishraUniversity of Florida

IEEE Transaction on Computer Aided Design of Intigrated Systems April 2008, Vol 27, No. 4

Rahul Sridharan1 of 25

Page 2: Seok-Won Seong and Prabhat Mishra University of Florida IEEE Transaction on Computer Aided Design of Intigrated Systems April 2008, Vol 27, No. 4 Rahul.

Motivation Background

◦ Code compression using Bitmasks Challenges in Bitmask-based approach Application-Aware Code Compression

◦ Mask Selection◦ Bitmask-aware Dictionary Selection◦ Code Compression Algorithm

Results Conclusion

2 of 25

Page 3: Seok-Won Seong and Prabhat Mishra University of Florida IEEE Transaction on Computer Aided Design of Intigrated Systems April 2008, Vol 27, No. 4 Rahul.

Bitmask-based code compression◦Addresses issue of memory constraints in

Embedded Systems improving power and performance

◦Constraints code size Application-Aware code compression

algorithm◦ Improve compression efficiency without

introducing decompression penalty

3 of 25

Page 4: Seok-Won Seong and Prabhat Mishra University of Florida IEEE Transaction on Computer Aided Design of Intigrated Systems April 2008, Vol 27, No. 4 Rahul.

Compressed Code(Memory)

DecompressionEngine

Processor(Fetch and Execute)

Application Program (Binary)

CompressionAlgorithm

Static Encoding(Offline)

Dynamic Decoding(Online)

Compressed program sizeCompression ratio

Original program size

4 of 25

Page 5: Seok-Won Seong and Prabhat Mishra University of Florida IEEE Transaction on Computer Aided Design of Intigrated Systems April 2008, Vol 27, No. 4 Rahul.

Format for Uncompressed Code Format for Compressed Code

Uncompressed Data(32 Bits)

Decision(1 Bit)

Decision(1 Bit)

# of Bit Changes

DictionaryIndex

Location(5 Bits)

Location(5 Bits)… …

Dictionary based◦ Frequency based Dictionary-selection

Format for Uncompressed Code (32 Bit Code) Format for Compressed Code

Uncompressed Data(32 Bits)

Decision

(1 Bit) Dictionary IndexDecision(1 Bit)

Hamming Distance based◦ Remembering Mismatches

Bit-mask based

5 of 25

Page 6: Seok-Won Seong and Prabhat Mishra University of Florida IEEE Transaction on Computer Aided Design of Intigrated Systems April 2008, Vol 27, No. 4 Rahul.

32-bit instructions Format for uncompressed code

Format for compressed code

Uncompressed Data(32 Bits)

Decision(1 Bit)

Decision(1 Bit)

Number of Masks

DictionaryIndex

…MaskType

LocationMask

PatternMaskType

LocationMask

Pattern

Location to apply the bitmask

Actual mask patternType of the mask e.g., 2-bit, 4-bit etc.

6 of 25

Page 7: Seok-Won Seong and Prabhat Mishra University of Florida IEEE Transaction on Computer Aided Design of Intigrated Systems April 2008, Vol 27, No. 4 Rahul.

0000 00001000 00100000 00100100 00100100 11100101 00100000 11000100 00101100 00000000 0000

Original Program Compressed Program Dictionary

0 1 00 0 00 11 10 0 11 10 00 1 10 0 10 11 10 0 01 01 10 0 10 11 00 1 10 0 00 11 00 1 0

Index Entry

0 0000 0000

1 0100 0010

0 – Compressed1 – Not Compressed

0 – Bit Mask Used1 – No Bit Mask Used

Bit Mask Position Bit Mask Value

7 of 25

Page 8: Seok-Won Seong and Prabhat Mishra University of Florida IEEE Transaction on Computer Aided Design of Intigrated Systems April 2008, Vol 27, No. 4 Rahul.

Selection of appropriate mask pattern◦ Larger bitmask generates more matches

4-bit mask can handle up to 16 mismatches 8-bit mask can handle up to 256 mismatches

◦ Larger bitmask incurs higher cost 4-bit mask costs 7 bits 8-bit mask costs 10 bits

Efficient Dictionary Selection◦ Frequency-based selection not always optimum

Need for efficient masking and dictionary selection schemes to improve efficiency

8 of 25

Page 9: Seok-Won Seong and Prabhat Mishra University of Florida IEEE Transaction on Computer Aided Design of Intigrated Systems April 2008, Vol 27, No. 4 Rahul.

Frequency-based DS

CR = 97.5%

Spanning-based DS

CR = 87.5%

9 of 25

Page 10: Seok-Won Seong and Prabhat Mishra University of Florida IEEE Transaction on Computer Aided Design of Intigrated Systems April 2008, Vol 27, No. 4 Rahul.

Bitmask Selection

Bitmask-Aware Dictionary Selection◦Nondeterministic polynomial-time-hard

problem

Code Compression Algorithm◦Based on the combination of the two

approaches

10 of 25

Page 11: Seok-Won Seong and Prabhat Mishra University of Florida IEEE Transaction on Computer Aided Design of Intigrated Systems April 2008, Vol 27, No. 4 Rahul.

How many bitmask patterns are needed? Which of them are profitable? Fixed and sliding bitmask patterns

Mask Fixed Sliding

1 Bit X

2 Bits X X

3 Bits X

4 Bits X X

5 Bits X

6 Bits X

7 Bits X

8 Bits X X

BitChanges

Size of Mask Pattern

1Bit

2Bits

4Bits

8Bits

16Bits

32Bits

32Bits 165 100 59 42 35 32

16Bits 84 51 30 21 17

8Bits 43 26 15 10

4Bits 22 13 7

2Bits 11 6

1Bit 5

11 of 25

Page 12: Seok-Won Seong and Prabhat Mishra University of Florida IEEE Transaction on Computer Aided Design of Intigrated Systems April 2008, Vol 27, No. 4 Rahul.

Bits needed to indicate particular location◦ Size of mask◦ Type of mask

No. of bitmask patterns needed◦ Up to two mask patterns

Minimum cost to store three bitmasks is 27-31 bits for a 32-bit vector

Not very profitable Which combinations are profitable?

◦ Eleven possibilities 1s, 2s, 2f, 3s, 4s, 4f, 5s, 6s, 7s, 8s, 8f

◦ Select one/two from eleven possibilities Number of combinations can be further reduced

12 of 25

Page 13: Seok-Won Seong and Prabhat Mishra University of Florida IEEE Transaction on Computer Aided Design of Intigrated Systems April 2008, Vol 27, No. 4 Rahul.

Benchmarks are compiled for TI TMS320C6x(1s, 4f) and (2f, 2s) provide the best compression

s

(1s, 4f) (1s, 4f)(2s, 2f)

13 of 25

Page 14: Seok-Won Seong and Prabhat Mishra University of Florida IEEE Transaction on Computer Aided Design of Intigrated Systems April 2008, Vol 27, No. 4 Rahul.

Factors of 32 (1, 2, 4 and 8) produce better results◦ Since they can be applied cost-effectively on fixed locations

8-bit fixed/sliding is not helpful◦ Probability of more than 4 consecutive changes is low◦ Two smaller masks perform better than a larger one◦ 4-bit sliding does not perform better than 4-bit fixed

Two bitmasks provide better results than a single one Choose two from four bitmasks: (1s, 2f, 2s, 4s)

Mask Fixed Sliding

1 Bit X

2 Bits X X

4 Bits X

14 of 25

Page 15: Seok-Won Seong and Prabhat Mishra University of Florida IEEE Transaction on Computer Aided Design of Intigrated Systems April 2008, Vol 27, No. 4 Rahul.

Dictionary Selection

Dynamic Static

Frequency Spanning Bit Savings

Select most frequently occurring binary patterns

Select patterns to ensure uniform coverage of all patterns based on hamming distance.

Select patterns based on bit savings due to self and mask-matched repetitions

15 of 25

Page 16: Seok-Won Seong and Prabhat Mishra University of Florida IEEE Transaction on Computer Aided Design of Intigrated Systems April 2008, Vol 27, No. 4 Rahul.

16 of 25

Page 17: Seok-Won Seong and Prabhat Mishra University of Florida IEEE Transaction on Computer Aided Design of Intigrated Systems April 2008, Vol 27, No. 4 Rahul.

A = 0+10 = 10 B = 7+15 = 22 C = 7+15 = 22 D = 0+5 = 5 E = 0+15 = 15 F = 7+20 = 27 G =14+10 = 24

A(0)

B(7)

C(7)

D(0)

E(0)F(7)

G(14)

5

105

10

10

5

Node Weight: number of bits saved due to frequency of the patternEdge Weight: number of bits saved due to use of the bitmask based matchTotal weight: node weight + all edge weights (connected to the node)

17 of 25

Page 18: Seok-Won Seong and Prabhat Mishra University of Florida IEEE Transaction on Computer Aided Design of Intigrated Systems April 2008, Vol 27, No. 4 Rahul.

A = 0+10 = 10 B = 7+15 = 22 D = 0+5 = 5 G =14+10 = 24

A(0)

B(7)

D(0)

G(14)

5

5

10

Node Weight: number of bits saved due to frequency of the patternEdge Weight: number of bits saved due to use of the bitmask based matchTotal weight: node weight + all edge weights (connected to the node)

Continues until the dictionary is full or the graph is empty18 of 25

Page 19: Seok-Won Seong and Prabhat Mishra University of Florida IEEE Transaction on Computer Aided Design of Intigrated Systems April 2008, Vol 27, No. 4 Rahul.

19 of 25

Page 20: Seok-Won Seong and Prabhat Mishra University of Florida IEEE Transaction on Computer Aided Design of Intigrated Systems April 2008, Vol 27, No. 4 Rahul.

Experimental Setup◦ Benchmarks: TI and MediaBench◦ Architectures: Sparc, TI TMS320C6x, MIPS

Results◦ BCC: Bitmask-based code compression

Customized encodings for different architectures Effects of dictionary size selection Comparison with existing techniques

◦ ACC: Application-aware code compression Bitmask selection Dictionary selection

20 Of 25

Page 21: Seok-Won Seong and Prabhat Mishra University of Florida IEEE Transaction on Computer Aided Design of Intigrated Systems April 2008, Vol 27, No. 4 Rahul.

•Encoding 1 (one 8-bit mask) •Encoding 2 (two 4-bit masks) •Encoding 3 (4-bit and 8-bit masks)

Encoding2 outperforms others

21 of 25

Page 22: Seok-Won Seong and Prabhat Mishra University of Florida IEEE Transaction on Computer Aided Design of Intigrated Systems April 2008, Vol 27, No. 4 Rahul.

Outperforms other dictionary-based techniques by 15% Higher decompression bandwidth than existing compression

techniques

Smaller compression ratio is better

Compression Method Target Architecture Compression RatioDecompression

BandwidthParallel

DecompressionWolfe and Chanin Hoffman Coding

MIPS 73% 8 bits No

IBM CodePack PowerPC 60% 8 bits NoSAMC MIPS 57% 6-8 bits No

V2F TMS320C6x 70-82% 14.5-64 bits NoMCSSC TMS320C6x 75% 8 bits Yes

Prakash et al (Hamming Distance)

TMS320C6x 76-80% N/A Yes

Ros and Sutton (Hamming Distance)

TMS320C6x, Itanium 72-80% N/A Yes

Our ApproachMIPS, SPARC,

TMS320C6x55%-65% 32-64 bits YesBitmask Approach

22 of 25

Page 23: Seok-Won Seong and Prabhat Mishra University of Florida IEEE Transaction on Computer Aided Design of Intigrated Systems April 2008, Vol 27, No. 4 Rahul.

BitSavings approach outperforms bothfrequency- and spanning-based techniques

23 of 25

Page 24: Seok-Won Seong and Prabhat Mishra University of Florida IEEE Transaction on Computer Aided Design of Intigrated Systems April 2008, Vol 27, No. 4 Rahul.

BCC generates 15-20% improvement over other techniques ACC outperforms BCC by another 5-10%

BCC: Bitmask-based Code CompressionACC: Application-aware Code Compression

24 of 25

Page 25: Seok-Won Seong and Prabhat Mishra University of Florida IEEE Transaction on Computer Aided Design of Intigrated Systems April 2008, Vol 27, No. 4 Rahul.

???

25 of 25