University of Maryland Towards Automated Tuning of Parallel Programs Jeffrey K. Hollingsworth...

14
University of Maryland Towards Automated Tuning of Parallel Programs Jeffrey K. Hollingsworth [email protected] Department of Computer Science University of Maryland, College Park, MD 20 742

Transcript of University of Maryland Towards Automated Tuning of Parallel Programs Jeffrey K. Hollingsworth...

University of Maryland

Towards Automated Tuning of Parallel Programs

Jeffrey K. Hollingsworth

[email protected] of Computer Science

University of Maryland, College Park, MD 20742

University of Maryland

Why Automated Performance Tuning?

Software commonly have parameters that impact their performance.

Optimal parameter values are usually variable and un-predictable.

Automated Parameter tuning can be used for adaptive parameter tuning

in complex software.

University of Maryland

Automated Performance Tuning

Goal: Maximize achieved performance

Problems:– Large number of parameters to tune– Shape of objective function unknown– Multiple libraries and coupled applications– Analytical model may not be available

Requirements:– Runtime tuning for long running programs– Don’t try too many configurations– Avoid gradients

University of Maryland

Active Harmony

Runtime performance optimization– Can also support training runs

Automatic library selection (code)– Monitor library performance– Switch library if necessary

Automatic performance tuning (parameter)– Monitor system performance– Adjust runtime parameters

Hooks for Compiler Frameworks– Working to integrate USC/ISI Chill– Looking at others too

University of Maryland

Example: Cluster Based Web Server

3-tier system Harmony Provides

– Parameter updates for DB, and App Severs

TPC-W Benchmark– Transactional web benchmark – Mimic operations of an e-commerce site– Uses Java implementation from Univ. of

Wisconsin– Performance metrics

• Web Interaction Per Second (WIPS)

University of Maryland

Cluster-Based Web Service Tuning

  Best configuration after 200 iterations

Browsing

Shopping

Ordering

Improvements compared to the default

configuration

 

15% 

16% 

5%

0102030405060

Browsing Shopping Ordering

Workload Applied

Perf

orm

ance

(W

IPS)

Original configuration Best configuration for Browsing

Best configuration for Shopping Best configuration for Ordering

University of Maryland

Importance of various parameters

0

50

100

150

200

250

300

PROXY C

ache

Mem

AJP A

ccep

t Cou

nt

MYSQL

Net B

uffe

r Len

gth

AJP M

ax P

roce

ssor

s

PROXY M

ax O

bject

in M

emor

y

HTTP B

uffe

r Size

HTTP A

ccep

t Cou

nt

MYSQL

Delaye

d Que

ue

PROXY M

in Obj

ect

MYSQL

Max

Con

nect

ions

Parameter

Se

nsi

tivity

Shopping Ordering

University of Maryland

A Bit More About Harmony Search

Pre-execution– Sensitivity Discovery Phase– Used to Order not Eliminate search

dimensions

Online– Use Parallel Rank Order Search

• Different configurations on different nodes

University of Maryland

Benefits of Searching in ParallelPRO vs. Nelder-Mead (HPL)

40

45

50

55

60

65

70

75

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34

Iterations

% P

eak

PRO-minNelder-Mead-min

PRO vs. Nelder Mead for POP

30

35

40

45

50

55

60

65

70

75

80

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34

Iterations

Tim

e (s

ec)

Nelder-Mead

PRO-min

PRO-max

Performance for Two Programs– High Performance Linpack– POP Ocean Code

University of Maryland10

Integrating Compiler Transformations

University of Maryland11

Performance of Matrix Multiply ActiveHarmony + CHiLL

vs ATLAS and MKL

0500

100015002000250030003500400045005000

0 1000 2000 3000

Matrix Size

MF

lop

s

Native

Simple Search

Harmony + Chill

Atlas

MKL

University of Maryland

Must Coordinate Auto Tuners

Problem: Warring auto tuning systems

• Multiple components “auto tuning” at once

• Tuning based on multiple changes at once

Solution:– Need some level of coordination– Possible Answer:

• Exposing different tuning systems– Part of PERI Auto-tuning Effort

University of Maryland

Conclusion

Active Harmony– An infrastructure for runtime tuning– Automatic tuning and environment

adaptation– It works !

Auto Tuning Integration– Need to integrate multiple tools

• Compilers, runtime, application

University of Maryland

Acknowledgements

Coding and Experiments– I-Hsin Chung (IBM Watson)– Vahid Tabatabaee– Ananta Tiwari

Chill Integration Effort– Marry Hall (Utah)– Chun Chen (USC/ISI)– Jacqueline Chame (USC/ISI)

Funding– DOE – PERC/PERI– NSF– LTS (DoD)