Post-compiler Software Optimization for Reducing...

Post on 11-Jul-2020

4 views 0 download

Transcript of Post-compiler Software Optimization for Reducing...

Post-compiler Software Optimization for

Reducing Energy

Eric Schulte ∗ Jonathan Dorn † Stephen Harding ∗

Stephanie Forrest ∗ Westley Weimer †

∗Department of Computer ScienceUniversity of New Mexico

Albuquerque, NM 87131-0001

†Department of Computer ScienceUniversity of Virginia

Charlottesville, VA 22904-4740

March 4, 2014

Post-compiler Software Optimization for Reducing Energy Introduction 0

IntroductionNSA Datacenter in Bluffdale, Utah

65megawatts

year

(nationalgeographic.com)

Post-compiler Software Optimization for Reducing Energy Introduction 1

Introduction

-Oe energy optimization flag

-Oe . . . trying to minimizethe amount of energy theprogram will use

Faster = lower energy

Post-compiler Software Optimization for Reducing Energy Introduction 2

Introduction

-Oe energy optimization flag

-Oe . . . trying to minimizethe amount of energy theprogram will use

Faster = lower energy

Post-compiler Software Optimization for Reducing Energy Introduction 2

Introduction

-Oe energy optimization flag

-Oe . . . trying to minimizethe amount of energy theprogram will use

Faster = lower energy

Post-compiler Software Optimization for Reducing Energy Introduction 2

Outline

Introduction

Technical Approach

Experimental Evaluation

Conclusion

Post-compiler Software Optimization for Reducing Energy Introduction 3

Problem Statement

Optimizing complex non-functional properties

properties × hardware × environment

properties memory, network, energy, etc. . .

hardware architectures, processors, memory stack, etc. . .

environment variables, load, etc. . .

Every program transformation requires

a-priori reasoning

manual implementation

guaranteed correctness

Post-compiler Software Optimization for Reducing Energy Introduction 4

Our Solution

Genetic optimization algorithm

I empirically guided (guess and check)

I automated evolutionary search

I relaxed semantics

Applied to PARSEC benchmarks

I reduces energy consumption by 20% on average

I maintain functionality on withheld tests

Post-compiler Software Optimization for Reducing Energy Introduction 5

Related Work

Extends combines and leverages

I profile guided optimization

I genetic programming

I superoptimization

I profiling

I testing

Post-compiler Software Optimization for Reducing Energy Introduction 6

TechniquePost-compiler, test-driven, Genetic Optimization Algorithm

Post-compiler

source .s GOA .s .exe

Post-compiler Software Optimization for Reducing Energy Introduction 7

TechniquePost-compiler, test-driven, Genetic Optimization Algorithm

Test driven

Use test cases to exercise program

I evaluate functionality

I measure runtime properties

Post-compiler Software Optimization for Reducing Energy Introduction 7

TechniquePost-compiler, test-driven, Genetic Optimization Algorithm

Genetic

movl $0, -8(%rbp)

.L4:

addl $1, -4(%rbp)

.L3:

movl -4(%rbp), %eax

cmpl -28(%rbp), %eax

jl .L5

cmpl $0, -8(%rbp)

je .L6

movl $0, -8(%rbp)

.L4:

addl $1, -4(%rbp)

.L3:

movl -4(%rbp), %eax

cmpl -28(%rbp), %eax

jl .L5

Post-compiler Software Optimization for Reducing Energy Introduction 7

TechniquePost-compiler, test-driven, Genetic Optimization Algorithm

Optimization algorithmIteratively improve performance (energy) over time

500

550

600

650

700

750

800

850

900

0 5e4 10e4 15e4 20e4 25e4 30e4

Joul

es

Iterations

Swaptions Fitness Over Time

Post-compiler Software Optimization for Reducing Energy Introduction 7

TechniquePost-compiler, test-driven, Genetic Optimization Algorithm

Benefits

I environment-specific adaptation

I hardware-specific adaptation

I exploit hidden HW complexities

(Mdf / CC-BY-SA-3.0)

Post-compiler Software Optimization for Reducing Energy Introduction 7

Outline

Introduction

Technical Approach

Experimental Evaluation

Conclusion

Post-compiler Software Optimization for Reducing Energy Technical Approach 8

GOA Overview

Assembler Fitness Function Workload

Population

Mutate

Profile

Fitness

Minimize

Executable

Search

Post-compiler Software Optimization for Reducing Energy Technical Approach 9

GOA Overview

Assembler Fitness Function Workload

Population

Mutate

Profile

Fitness

Minimize

Executable

Search

Post-compiler Software Optimization for Reducing Energy Technical Approach 9

GOA Overview

Assembler Fitness Function Workload

Population

Mutate

Profile

Fitness

Minimize

Executable

Search

Post-compiler Software Optimization for Reducing Energy Technical Approach 9

GOA Overview

Assembler Fitness Function Workload

Population

Mutate

Profile

Fitness

Minimize

Executable

Search

Post-compiler Software Optimization for Reducing Energy Technical Approach 9

GOA Overview

Assembler Fitness Function Workload

Population

Mutate

Profile

Fitness

Minimize

Executable

Search

Post-compiler Software Optimization for Reducing Energy Technical Approach 9

GOA Overview

Assembler Fitness Function Workload

Population

Mutate

Profile

Fitness

Minimize

Executable

Search

Post-compiler Software Optimization for Reducing Energy Technical Approach 9

GOA Overview

Assembler Fitness Function Workload

Population

Mutate

Profile

Fitness

Minimize

Executable

Search

Post-compiler Software Optimization for Reducing Energy Technical Approach 9

GOA Overview

Assembler Fitness Function Workload

Population

Mutate

Profile

Fitness

Minimize

Executable

Search

Post-compiler Software Optimization for Reducing Energy Technical Approach 9

Program MutationAssembler Fitness Function Workload

Population

Mutate

Profile

Fitness

Minimize

Executable

Software Representation

movl $0, -8(%rbp)

.L4:

addl $1, -4(%rbp)

.L3:

movl -4(%rbp), %eax

cmpl -28(%rbp), %eax

jl .L5

cmpl $0, -8(%rbp)

je .L6

movl $0, -8(%rbp)

.L4:

addl $1, -4(%rbp)

.L3:

movl -4(%rbp), %eax

cmpl -28(%rbp), %eax

jl .L5

Mutation OperationsCopy Delete Swap Two Point Crossover

Post-compiler Software Optimization for Reducing Energy Technical Approach 10

ProfilingAssembler Fitness Function Workload

Population

Mutate

Profile

Fitness

Minimize

Executable

HardwarePerformanceCounters

Post-compiler Software Optimization for Reducing Energy Technical Approach 11

ProfilingAssembler Fitness Function Workload

Population

Mutate

Profile

Fitness

Minimize

Executable

$ perf stat -- ./blackscholes 1 input /tmp/output

6,864,315,342 cycles

5,062,293,918 instructions

2,944,060,039 r533f00

1,113,084,780 cache-references

1,122,960 cache-misses

3.227585368 seconds time elapsed

Post-compiler Software Optimization for Reducing Energy Technical Approach 11

Fitness FunctionAssembler Fitness Function Workload

Population

Mutate

Profile

Fitness

Minimize

Executable

energy

time= Cconst+Cins

ins

cycle+Cflops

flops

cycle+Ctca

tca

cycle+Cmem

mem

cycle

Post-compiler Software Optimization for Reducing Energy Technical Approach 12

Steady State Genetic AlgorithmAssembler Fitness Function Workload

Population

Mutate

Profile

Fitness

Minimize

Executable

Details

I population size: 210

I 218 fitness evaluations

I ∼ 16 hour runtime per optimization

Post-compiler Software Optimization for Reducing Energy Technical Approach 13

MinimizationAssembler Fitness Function Workload

Population

Mutate

Profile

Fitness

Minimize

ExecutableDelta Debugging

5358 c5358

< .L808:

---

> addl %ebx , %ecx

5416 c5416

< addl %ebx , %ecx

---

> .L808:

5463 c5463

< .L970:

---

> .byte 0x33

5651 d5650

< .loc 1 457 0 is_stmt 0 discriminator 2

5841 d5839

< addq %rdx , %r14

6309 c6307

< xorpd %xmm1 , %xmm7

---

> cmpq %r13 , %rdi

6413 a6412

> cmpl %ecx , %esi

Post-compiler Software Optimization for Reducing Energy Technical Approach 14

MinimizationAssembler Fitness Function Workload

Population

Mutate

Profile

Fitness

Minimize

ExecutableDelta Debugging

5358 c5358

< .L808:

---

> addl %ebx , %ecx

5416 c5416

< addl %ebx , %ecx

---

> .L808:

5463 c5463

< .L970:

---

> .byte 0x33

5651 d5650

< .loc 1 457 0 is_stmt 0 discriminator 2

5841 d5839

< addq %rdx , %r14

6309 c6307

< xorpd %xmm1 , %xmm7

---

> cmpq %r13 , %rdi

6413 a6412

> cmpl %ecx , %esi

Post-compiler Software Optimization for Reducing Energy Technical Approach 14

MinimizationAssembler Fitness Function Workload

Population

Mutate

Profile

Fitness

Minimize

ExecutableDelta Debugging

5358 c5358

< .L808:

---

> addl %ebx , %ecx

5416 c5416

< addl %ebx , %ecx

---

> .L808:

5463 c5463

< .L970:

---

> .byte 0x33

5651 d5650

< .loc 1 457 0 is_stmt 0 discriminator 2

5841 d5839

< addq %rdx , %r14

6309 c6307

< xorpd %xmm1 , %xmm7

---

> cmpq %r13 , %rdi

6413 a6412

> cmpl %ecx , %esi

Post-compiler Software Optimization for Reducing Energy Technical Approach 14

MinimizationAssembler Fitness Function Workload

Population

Mutate

Profile

Fitness

Minimize

ExecutableDelta Debugging

5358 c5358

< .L808:

---

> addl %ebx , %ecx

5416 c5416

< addl %ebx , %ecx

---

> .L808:

5463 c5463

< .L970:

---

> .byte 0x33

5651 d5650

< .loc 1 457 0 is_stmt 0 discriminator 2

5841 d5839

< addq %rdx , %r14

6309 c6307

< xorpd %xmm1 , %xmm7

---

> cmpq %r13 , %rdi

6413 a6412

> cmpl %ecx , %esi

Post-compiler Software Optimization for Reducing Energy Technical Approach 14

Outline

Introduction

Technical Approach

Experimental Evaluation

Conclusion

Post-compiler Software Optimization for Reducing Energy Experimental Evaluation 15

Benchmark Applications

C/C++ ASMProgram Lines of Code Description

blackscholes 510 7,932 Finance modelingbodytrack 14,513 955,888 Human video trackingfacesim no alternate inputsferret 15,188 288,981 Image search enginefluidanimate 11,424 44,681 Fluid dynamics animationfreqmine 2,710 104,722 Frequent itemset miningraytrace no testable outputswaptions 1,649 61,134 Portfolio pricingvips 142,019 132,012 Image transformationx264 37,454 111,718 MPEG-4 video encoder

total 225,467 1,707,068

Table : PARSEC benchmark applications.

Post-compiler Software Optimization for Reducing Energy Experimental Evaluation 16

Hardware Platforms

AMD Server Intel Desktop

Post-compiler Software Optimization for Reducing Energy Experimental Evaluation 17

Energy Model

energy

time= Cconst+Cins

ins

cycle+Cflops

flops

cycle+Ctca

tca

cycle+Cmem

mem

cycle

Intel AMDCoefficient Description (4-core) (48-core)

Cconst constant power draw 31.530 394.74Cins instructions 20.490 -83.68Cflops floating point ops. 9.838 60.23Ctca cache accesses -4.102 -16.38Cmem cache misses 2962.678 -4209.09

Table : Energy model coefficients.

Post-compiler Software Optimization for Reducing Energy Experimental Evaluation 18

Results: Energy Reduction

0%

20%

40%

60%

80%

100%

blackscholes

bodytrack

ferret

fluidanimate

freqmine

swaptions

vipsx264

average

Energy Reduction

AMDIntel

Post-compiler Software Optimization for Reducing Energy Experimental Evaluation 19

Results: Runtime and Energy Reduction

-10%

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

blackscholes

bodytrack

ferret

fluidanimate

freqmine

swaptions

vipsx264

average

AMD Energy and Runtime Reduction

EnergyRuntime

Post-compiler Software Optimization for Reducing Energy Experimental Evaluation 20

Functionality on Withheld Tests

Program AMD Intel

blackscholes 100% 100%bodytrack 92% 100%ferret 100% 100%fluidanimate 6% 31%freqmine 100% 100%swaptions 100% 100%vips 100% 100%x264 27% 100%

Post-compiler Software Optimization for Reducing Energy Experimental Evaluation 21

AnecdotesBlackscholes

I 90% less energy

I removed redundant outer loop

I modified semantics

Swaptions

I 42% less energy

I improved branch prediction

I hardware specific

Vips

I 20% less energy

I substitution of memory access for calculation

I resource trade-off

Post-compiler Software Optimization for Reducing Energy Experimental Evaluation 22

AnecdotesBlackscholes

I 90% less energy

I removed redundant outer loop

I modified semantics

Swaptions

I 42% less energy

I improved branch prediction

I hardware specific

Vips

I 20% less energy

I substitution of memory access for calculation

I resource trade-off

Post-compiler Software Optimization for Reducing Energy Experimental Evaluation 22

AnecdotesBlackscholes

I 90% less energy

I removed redundant outer loop

I modified semantics

Swaptions

I 42% less energy

I improved branch prediction

I hardware specific

Vips

I 20% less energy

I substitution of memory access for calculation

I resource trade-off

Post-compiler Software Optimization for Reducing Energy Experimental Evaluation 22

Outline

Introduction

Technical Approach

Experimental Evaluation

Conclusion

Post-compiler Software Optimization for Reducing Energy Conclusion 23

Caveats

Limitations and Generality

I experimental evaluationI energy reductionI GCC-produced assemblerI PARSEC benchmarks

I some benchmarks show no improvement

I requires high-quality test cases

I may change program behavior

Post-compiler Software Optimization for Reducing Energy Conclusion 24

Conclusion

1. optimize complex runtime properties (energy)

2. leverages particulars of hardware, and environment

3. reveal compiler inefficiencies

4. find efficiencies, e.g., loop elimination

5. transformations presented as ASM diff

Post-compiler Software Optimization for Reducing Energy Conclusion 25

Resources

Genetic Optimization Algorithm

GOA toolinghttps://github.com/eschulte/goa

reproduce resultshttps://github.com/eschulte/goa/tree/asplos2014

Eric Schulte

email eschulte@cs.unm.edu

homepage https://cs.unm.edu/~eschulte

Post-compiler Software Optimization for Reducing Energy Conclusion 26

Backup Slides

Post-compiler Software Optimization for Reducing Energy Backup 27

Genetic Algorithm

Parameters

Parameter Genprog GOA

population size 40 210

evaluations 400 218

selection fitness proportionate tournament of 2runtime minutes hours

Post-compiler Software Optimization for Reducing Energy Backup 28

Software Mutational Robustness

0

10

20

30

40

50

bubble-sort

insertion-sort

merge-sort

quick-sort

0

10

20

30

40

50

printtokens

schedulesed space

tcas

0

20

40

60

80

100

bzip2 1.0.2

– ccrypt 1.2

– imagemagick 6.5.2

jansson 1.3

leukocyte

lighttpd 1.4.15

nullhttpd 0.5.0

oggenc 1.0.1

– potion 40b5f03

redis 1.3.4

tiff3.8.2

vyquon 335426d

grep

C ASM

Post-compiler Software Optimization for Reducing Energy Backup 29

Program Syntactic Space

Specification(acceptable)

Original Program

Test Suite

0

2

1

4

3

5

Neutral Mutants

Equivalent Mutant

Killed Mutants

Post-compiler Software Optimization for Reducing Energy Backup 30