The University of Texas at Austin Lizy Kurian John, LCA, UT Austin1 What Programming...

30
Lizy Kurian John, LCA, UT Austin 1 The University of Texas at Austin What Programming Language/Compiler Researchers should Know about Computer Architecture Lizy Kurian John Department of Electrical and Computer Engineering The University of Texas at Austin

Transcript of The University of Texas at Austin Lizy Kurian John, LCA, UT Austin1 What Programming...

Page 1: The University of Texas at Austin Lizy Kurian John, LCA, UT Austin1 What Programming Language/Compiler Researchers should Know about Computer Architecture.

Lizy Kurian John, LCA, UT Austin

1

The University of Texas at Austin

What Programming Language/Compiler Researchers should Know about Computer Architecture

Lizy Kurian John

Department of Electrical and Computer EngineeringThe University of Texas at Austin

Page 2: The University of Texas at Austin Lizy Kurian John, LCA, UT Austin1 What Programming Language/Compiler Researchers should Know about Computer Architecture.

Lizy Kurian John, LCA, UT Austin

2

Somebody once said

“Computers are dumb actors and compilers/programmers are the master playwrights.”

Page 3: The University of Texas at Austin Lizy Kurian John, LCA, UT Austin1 What Programming Language/Compiler Researchers should Know about Computer Architecture.

Lizy Kurian John, LCA, UT Austin

3

Computer Architecture Basics

ISAs RISC vs CISC Assembly language coding Datapath (ALU) and controller Pipelining Caches Out of order execution

Hennessy and Patterson architecture books

Page 4: The University of Texas at Austin Lizy Kurian John, LCA, UT Austin1 What Programming Language/Compiler Researchers should Know about Computer Architecture.

Lizy Kurian John, LCA, UT Austin

4

Basics ILP DLP TLP Massive parallelism SIMD/MIMD VLIW Performance and Power metrics

Hennessy and Patterson architecture booksASPLOS, ISCA, Micro, HPCA

Page 5: The University of Texas at Austin Lizy Kurian John, LCA, UT Austin1 What Programming Language/Compiler Researchers should Know about Computer Architecture.

Lizy Kurian John, LCA, UT Austin

5

The Bottomline

Programming Language choice affects performance and powereg: Java

Compilers affect Performance and Power

Page 6: The University of Texas at Austin Lizy Kurian John, LCA, UT Austin1 What Programming Language/Compiler Researchers should Know about Computer Architecture.

Lizy Kurian John, LCA, UT Austin

6

A Java Hardware Interpreter

Radhakrishnan, Ph. D 2000 (ISCA2000, ICS2001) This technique used by Nazomi Communications,

Parthus (Chicory Systems)

Java class file

Native executabl

e

FetchHardware bytecode translator

Decode Execute

bytecodes

Native machine instructions

Page 7: The University of Texas at Austin Lizy Kurian John, LCA, UT Austin1 What Programming Language/Compiler Researchers should Know about Computer Architecture.

Lizy Kurian John, LCA, UT Austin

7

HardInt Performance4-way performance

44.8

109.

3 149.

7

934.

1

911.

7

60.4

135.

9

85.2 12

7.7

492.

2

71.0

133.

7

221.

5

989.

4

867.

8

59.8

108.

8 146.

2

146.

1

321.

9

16.0

27.7

28.8

250.

2

120.

0

0

50

100

150

200

250

300

350

400

db javac jess mpeg mtrt

ex

ecuti

on c

ycle

s (

millions)

J DK 1.1.6 Interpreter JDK 1.1.6 J IT JDK 1.2 Interpreter JDK 1.2 J IT Hard-Int

• Hard-Int performs consistently better than the interpreter

• In JIT mode, significant performance boost in 4 of 5 applications.

Page 8: The University of Texas at Austin Lizy Kurian John, LCA, UT Austin1 What Programming Language/Compiler Researchers should Know about Computer Architecture.

Lizy Kurian John, LCA, UT Austin

8

Compiler and PowerA

B

D

F

C

E

A

B

D

F

A

B

D

F

C CE

E

DDG Peak Power = 3 Energy = 6

Peak Power = 2 Energy = 6

Cycle 1

Cycle 2

Cycle 3

Cycle 4

Cycle 1

Cycle 2

Cycle 3

Cycle 4

Page 9: The University of Texas at Austin Lizy Kurian John, LCA, UT Austin1 What Programming Language/Compiler Researchers should Know about Computer Architecture.

Lizy Kurian John, LCA, UT Austin

9

Valluri et al 2001 HPCA workshop

Quantitative Study Influence of state-of-the-art optimizations

on energy and power of the processor examined

Optimizations studied Standard –O1 to –O4 of DEC Alpha’s cc compiler Four individual optimizations – simple basic-

block instruction scheduling, loop unrolling, function inlining, and aggressive global scheduling

Page 10: The University of Texas at Austin Lizy Kurian John, LCA, UT Austin1 What Programming Language/Compiler Researchers should Know about Computer Architecture.

Lizy Kurian John, LCA, UT Austin

10

Standard Optimizations on Power

Benchmark opt level Energy Exec Time Insts Avg Power IPCO0 100 100 100 100 100O1 74.48 81.55 81.52 91.33 99.96O2 75.13 81.44 82.04 92.25 100.73O3 75.13 81.44 82.04 92.25 100.73O4 79.01 82.77 86.11 95.45 104.03O0 100 100 100 100 100O1 66.2 64.13 68.94 103.23 107.5O2 62.62 61.31 63.01 102.14 102.78O3 62.62 61.31 63.01 102.14 102.78O4 63.67 62.19 63.75 102.38 102.51O0 100 100 100 100 100O1 81.32 83.66 83.18 97.2 99.42O2 79.6 75.97 82.97 104.78 109.21O3 79.6 75.97 82.97 104.78 109.21O4 85.71 77.89 90.96 110.05 116.78

compress

go

li

Page 11: The University of Texas at Austin Lizy Kurian John, LCA, UT Austin1 What Programming Language/Compiler Researchers should Know about Computer Architecture.

Lizy Kurian John, LCA, UT Austin

11

Somebody once said

“Computers are dumb actors and compilers/programmers are the master playwrights.”

Page 12: The University of Texas at Austin Lizy Kurian John, LCA, UT Austin1 What Programming Language/Compiler Researchers should Know about Computer Architecture.

Lizy Kurian John, LCA, UT Austin

12

A large part of modern out of order processors

is hardware that could have been eliminated if a good compiler existed.

Page 13: The University of Texas at Austin Lizy Kurian John, LCA, UT Austin1 What Programming Language/Compiler Researchers should Know about Computer Architecture.

Lizy Kurian John, LCA, UT Austin

13

Let me get more arrogant

A large part of modern out of order processors was designed because

computer architects thought compiler writers could not do a good job.

Page 14: The University of Texas at Austin Lizy Kurian John, LCA, UT Austin1 What Programming Language/Compiler Researchers should Know about Computer Architecture.

Lizy Kurian John, LCA, UT Austin

14

Value Prediction

Is a slap on your face

Shen and Lipasti

Page 15: The University of Texas at Austin Lizy Kurian John, LCA, UT Austin1 What Programming Language/Compiler Researchers should Know about Computer Architecture.

Lizy Kurian John, LCA, UT Austin

15

Value Locality

Likelihood that an instruction’s computed result or a similar predictable result will occur soon

Observation – a limited set of unique values constitute majority of values produced and consumed during execution

Page 16: The University of Texas at Austin Lizy Kurian John, LCA, UT Austin1 What Programming Language/Compiler Researchers should Know about Computer Architecture.

Lizy Kurian John, LCA, UT Austin

16

Load Value Locality

Page 17: The University of Texas at Austin Lizy Kurian John, LCA, UT Austin1 What Programming Language/Compiler Researchers should Know about Computer Architecture.

Lizy Kurian John, LCA, UT Austin

17

Causes of value locality

Data redundancy – many 0s, sparse matrices, white space in files, empty cells in spread sheets

Program constants – Computed branches – base address for

jump tables is a run-time constant Virtual function calls – involve code to

load a function pointer – can be constant

Page 18: The University of Texas at Austin Lizy Kurian John, LCA, UT Austin1 What Programming Language/Compiler Researchers should Know about Computer Architecture.

Lizy Kurian John, LCA, UT Austin

18

Causes of value locality

Memory alias resolution – compiler conservatively generates code – may contain stores that alias with loads

Register spill code – stores and subsequent loads

Convergent algorithms – convergence in parts of algorithms before global convergence

Polling algorithms

Page 19: The University of Texas at Austin Lizy Kurian John, LCA, UT Austin1 What Programming Language/Compiler Researchers should Know about Computer Architecture.

Lizy Kurian John, LCA, UT Austin

19

2 Extremist Views

Anything that can be done in hardware should be done in hardware.

Anything that can be done in software should be done in software.

Page 20: The University of Texas at Austin Lizy Kurian John, LCA, UT Austin1 What Programming Language/Compiler Researchers should Know about Computer Architecture.

Lizy Kurian John, LCA, UT Austin

20

What do we need?

The Dumb actor

Or the

The defiant actor – who pays very little attention to the script

Page 21: The University of Texas at Austin Lizy Kurian John, LCA, UT Austin1 What Programming Language/Compiler Researchers should Know about Computer Architecture.

Lizy Kurian John, LCA, UT Austin

21

Challenging all compiler writers

The last 15 years was the defiant actor’s era

What about the next 15? TLP, Multithreading, Parallelizing compilers – It’s time for a lot more dumb acting from the architect’s side.

And it’s time for some good scriptwriting from the compiler writer’s side.

Page 22: The University of Texas at Austin Lizy Kurian John, LCA, UT Austin1 What Programming Language/Compiler Researchers should Know about Computer Architecture.

Lizy Kurian John, LCA, UT Austin

22

The University of Texas at Austin

BACKUP

Page 23: The University of Texas at Austin Lizy Kurian John, LCA, UT Austin1 What Programming Language/Compiler Researchers should Know about Computer Architecture.

Lizy Kurian John, LCA, UT Austin

23

Compiler Optimzations

cc - Native C compiler on Dec Alpha 21064 running OSF1 operating system

gcc – Used to study the effect of individual optimizations

Page 24: The University of Texas at Austin Lizy Kurian John, LCA, UT Austin1 What Programming Language/Compiler Researchers should Know about Computer Architecture.

Lizy Kurian John, LCA, UT Austin

24

Std Optimizations Levels on cc

-O0 – No optimizations performed-O1 – Local optimizations such as CSE, copy

propagation, IVE etc-O2 – Inline expansion of static procedures

and global optimizations such as loop unrolling, instruction scheduling

-O3 – Inline expansion of global procedures-O4 – s/w pipelining, loop vectorization etc

Page 25: The University of Texas at Austin Lizy Kurian John, LCA, UT Austin1 What Programming Language/Compiler Researchers should Know about Computer Architecture.

Lizy Kurian John, LCA, UT Austin

25

Std Optimizations Levels on gcc

-O0 – No optimizations performed-O1 – Local optimizations such as CSE, copy

propagation, dead-code elimination etc-O2 – aggressive instruction scheduling-O3 – Inlining of procedures

Almost same optimizations in each level of cc and gcc

In cc and gcc, optimizations that increase ILP are in levels -O2, -O3, and -O4

cc used where ever possible, gcc used used where specific hooks are required

NOTE:

Page 26: The University of Texas at Austin Lizy Kurian John, LCA, UT Austin1 What Programming Language/Compiler Researchers should Know about Computer Architecture.

Lizy Kurian John, LCA, UT Austin

26

Individual Optimizations

Four gcc optimizations, all optimizations applied on top -O1

-fschedule-insns – local register allocation followed by basic-block list scheduling

-fschedule-insns2 – Postpass scheduling done

-finline-functions – Integrated all simple functions into their callers

-funroll-loops – Perform the optimization of loop unrolling

Page 27: The University of Texas at Austin Lizy Kurian John, LCA, UT Austin1 What Programming Language/Compiler Researchers should Know about Computer Architecture.

Lizy Kurian John, LCA, UT Austin

27

Some observations

Energy consumption reduces when # of instructions is reduced, i.e., when the total work done is less, energy is less

Power dissipation is directly proportional to IPC

Page 28: The University of Texas at Austin Lizy Kurian John, LCA, UT Austin1 What Programming Language/Compiler Researchers should Know about Computer Architecture.

Lizy Kurian John, LCA, UT Austin

28

Observations (contd.)

Function inlining was found to be good for both power and energy

Unrolling was found to be good for energy consumption but bad for power dissipation

Page 29: The University of Texas at Austin Lizy Kurian John, LCA, UT Austin1 What Programming Language/Compiler Researchers should Know about Computer Architecture.

Lizy Kurian John, LCA, UT Austin

29

MMX/SIMD

Automatic usage of SIMD ISA still difficult 10+ years after introduction of MMX.

Page 30: The University of Texas at Austin Lizy Kurian John, LCA, UT Austin1 What Programming Language/Compiler Researchers should Know about Computer Architecture.

Lizy Kurian John, LCA, UT Austin

30

Standard Optimizations on Power (Contd)

Benchmark opt level Energy Exec Time Insts Avg Power IPCO0 100 100 100 100 100O1 97.38 100.24 92.49 97.15 92.27O2 97.69 99.38 92.49 98.3 93.07O3 97.69 99.38 92.49 98.3 93.07O4 98.31 99.27 92.84 99.02 93.51O0 100 100 100 100 100O1 42.09 51.04 33.21 82.46 65.06O2 40.99 47.52 33.1 86.28 69.67O3 40.99 46.37 33.1 87.65 71.38O0 100 100 100 100 100O1 30.1 36.64 20.01 82.15 5463O2 28.93 34.01 19.05 85.06 56.01O3 28.93 34.01 19.05 85.06 56.01

su2cor

swim

saxpy