Case studies in Optimizing High Performance Computing Software Jan Westerholm High performance...

23
Case studies in Optimizing High Performance Computing Software Jan Westerholm High performance computing Department of Information Technologies Faculty of Technology / Åbo Akademi

Transcript of Case studies in Optimizing High Performance Computing Software Jan Westerholm High performance...

Page 1: Case studies in Optimizing High Performance Computing Software Jan Westerholm High performance computing Department of Information Technologies Faculty.

Case studies in Optimizing

High Performance Computing Software

Jan WesterholmHigh performance computing

Department of Information Technologies

Faculty of Technology / Åbo Akademi University

Page 2: Case studies in Optimizing High Performance Computing Software Jan Westerholm High performance computing Department of Information Technologies Faculty.

FINHPC / Åbo Akademi Objectives

• Sub-project in FINHPC• Three year duration 01.07.2005-30.06.2008• Objective: to improve code individuals and research

groups have written and are running on CSC machines– faster code, with in many cases exactly the same

numerical results as before– ability to run bigger problems

• Work approach: apply well known techniques from computer science

• Faster programs may imply better quality for results• Better throughput for everybody

Page 3: Case studies in Optimizing High Performance Computing Software Jan Westerholm High performance computing Department of Information Technologies Faculty.

FINHPC / Åbo AkademiLimitations

• We will use:– parallelization techniques– code optimization

• cache utilization (particularly L2-cache)• microprocessor pipeline continuity• data blocking: grid scan order

– introduction of new data structures– replacement of very simple algorithms

• sorting (quicksort instead of bubble sort)

– open source libraries

Page 4: Case studies in Optimizing High Performance Computing Software Jan Westerholm High performance computing Department of Information Technologies Faculty.

FINHPC / Åbo AkademiLimitations

• We will not:– introduce better physics, chemistry, etc.– replace chosen basic numerical technique – replace individual algorithms unless they

are clearly modularized (matrix inversion as library routine)

Page 5: Case studies in Optimizing High Performance Computing Software Jan Westerholm High performance computing Department of Information Technologies Faculty.

3 case studies

• Lattice-Boltzmann fluid simulation : 3DQ19

• Protein covariance analysis: Covana

• Fusion reactor simulation: Elmfire

Page 6: Case studies in Optimizing High Performance Computing Software Jan Westerholm High performance computing Department of Information Technologies Faculty.

3DQ19: Lattice Boltzmann

fluid mechanics• Jyväskylä University / Jussi Timonen,

Keijo Mattila; ÅA / Anders Gustafsson

• Physical background: – phase space distribution simulated in time– Boltzmann's equation: drift term and

collision term– physical quantities = moments of

distribution

Page 7: Case studies in Optimizing High Performance Computing Software Jan Westerholm High performance computing Department of Information Technologies Faculty.

3DQ19: Program Profiling

Flat profile: % cumulative self self totaltime seconds seconds calls ms/call ms/call name33.96 43.65 43.65 50 873.00 1230.10 everything2to1()30.79 83.22 39.57 50 791.40 1148.50 everything1to2()27.79 118.93 35.71 49000000 0.00 0.00

relaxation_BGK()2.30 121.89 2.96

shmem_msgs_available1.19 123.42 1.53 100 15.30 15.30 send_west()1.11 124.85 1.43 100 14.30 14.30 send_east()0.82 125.91 1.06 recv_message0.45 126.49 0.58

sock_msg_avail_on_fd0.37 126.97 0.48 100 4.80 4.80 per_bound_xslice()0.33 127.40 0.43 1 430.00 430.00 init_fluid()0.31 127.80 0.40 1 400.00 400.00 local_profile_y()0.23 128.10 0.30

socket_msgs_available0.19 128.34 0.24 1 240.00 240.00 calc_mass()0.04 128.39 0.05 net_recv0.03 128.43 0.04 1 40.00 40.00 allocation()0.02 128.46 0.03 main

Page 8: Case studies in Optimizing High Performance Computing Software Jan Westerholm High performance computing Department of Information Technologies Faculty.

3DQ19: Optimizations

• Parallelization: well done already!

• Code optimization– blocking: grid scan order– anti-dependency: make blocks of code

independent– deep fluid: mark those grid points which do

not have solids as neighbours

Page 9: Case studies in Optimizing High Performance Computing Software Jan Westerholm High performance computing Department of Information Technologies Faculty.

3DQ19: Blocking

Page 10: Case studies in Optimizing High Performance Computing Software Jan Westerholm High performance computing Department of Information Technologies Faculty.

3DQ19: Results on three parallel systems

Athlon 1800 IBMSC AMD64everything1to2(): 18,8 19,48 10,06everything2to1(): 19,34 18,78

10,52send_west(): 8,4 0,68 1,96send_east(): 8,31 1,17 3,14Total time (s): 55,15 40,28

25,76Time gained (s): 27,48 14,13

14,76Speed up (%): 33% 26% 36%

Page 11: Case studies in Optimizing High Performance Computing Software Jan Westerholm High performance computing Department of Information Technologies Faculty.

2nd case study: Covana Protein Covariance

analysis• Institute of Medical Technology, University of

Tampere / Mauno Vihinen, Bairong Chen; ÅA / André Norrgård

• Biological background– physico-chemical groups of amino acids– protein function from structure

• pair and triple correlations between amino acids

• web server for covariance analysis

Page 12: Case studies in Optimizing High Performance Computing Software Jan Westerholm High performance computing Department of Information Technologies Faculty.

Covana: Protein covariance

analysis• Protein sequences: calculate correlations

between columns of amino acids

• Typical size• 50-150 sequences (rows)• 300-1500 amino acids in a sequence (columns)

>Q9XW32_CAEEL/9-307IDVTKPTFLLTFYSIHGTFALVFNILGIFLIMK-NPKIVKMYKGFMINMQ-ILSLLADAQTTLLMQPVYILPIIGGYTNGLLWQVFR----LSSHIQMAMF---LLLLY---------LQVASIVCAIVTKYHVVSNIGKLSDRSI-LFWIF---VIVYHGCAFVITGFFSVS-CLARQ--EEENLIK------T-KFPNAISVFTLEN--VAIYDLQVN---KWMMITTILFAFMLTSSIVISFY--FSVRLLKTLPSKRNTISARSFRGHQIAVTSLM-AQAT-VPFLVL---IIP--IGTIVYLFVHVLP------NAQ-----EISNIMMAV--YSFHASLST---FVMIISTPQY

Page 13: Case studies in Optimizing High Performance Computing Software Jan Westerholm High performance computing Department of Information Technologies Faculty.

Covana: Code optimization

• Effective data structures: dynamic memory allocation

• Effective generic algorithms: sorting• Avoid recalculations

Page 14: Case studies in Optimizing High Performance Computing Software Jan Westerholm High performance computing Department of Information Technologies Faculty.

Covana: Run time

Runtime

0

50

100

150

200

250

1 4 5 6 7 8 9 10 11 12 13 14 15 24 31

Version

Tim

e (s

)

Runtime

Page 15: Case studies in Optimizing High Performance Computing Software Jan Westerholm High performance computing Department of Information Technologies Faculty.

Covana: Results

– Runtime:• Original : 227.8 s• Final Version : 2.0 s• Improvement : 112 times faster

– Computer memory usage:• Original : 3250 MB • Final Version : 37 MB• Improvement : 88 times less.

– Disk space usage:• Original : 277 MB• Final version : 21 MB• Improvement : 13 times less.

Page 16: Case studies in Optimizing High Performance Computing Software Jan Westerholm High performance computing Department of Information Technologies Faculty.

3rd study case: ELMFIRE Tokamak fusion reactor

simulation

• Jukka Heikkinen, Salomon Janhunen, Timo Kiviniemi / Advanced Energy Systems / HUT; ÅA / Artur Signell

• Physical background: – particle simulation with averaged

gyrokinetic Larmor orbits– turbulence and plasma modes

Page 17: Case studies in Optimizing High Performance Computing Software Jan Westerholm High performance computing Department of Information Technologies Faculty.

Elmfire: Tokamak fusion reactor simulation

• Goal 1: Computer platform independence– replacing proprietary library routines for random

number generation with open source routines– replacing proprietary library routines for distributed

solution of sparse linear systems with open source library routines

• Goal 2: Scalability– Elmfire ran on at most 8 processors– new data structures for sparse matrices were

invented, which make element updates efficient

Page 18: Case studies in Optimizing High Performance Computing Software Jan Westerholm High performance computing Department of Information Technologies Faculty.

Elmfire

Small problem12M particles, 8 processors

0

100

200

300

400

500

600

700

IBMSC: Orig Sepeli: Orig Sepeli: AVL Sepeli: AVL +hash

Program version

Tim

e (s

)

Page 19: Case studies in Optimizing High Performance Computing Software Jan Westerholm High performance computing Department of Information Technologies Faculty.

Elmfire

Big problem (60 times bigger than the small problem)166M particles, 64 processors

0

500

1000

1500

2000

2500

3000

3500

IBMSC: Orig Sepeli: Orig Sepeli: AVL Sepeli: AVL +hash

Program version

Tim

e (s

)

Page 20: Case studies in Optimizing High Performance Computing Software Jan Westerholm High performance computing Department of Information Technologies Faculty.

Conclusions

• Software can be improved!– modern microprocessor architecture is

taken into account: • cache utilization• pipeline

– use of well-established computer science methods

Page 21: Case studies in Optimizing High Performance Computing Software Jan Westerholm High performance computing Department of Information Technologies Faculty.

Conclusions

• In 1 case out 3, a clear impact on run time was made

• In 2 cases out of 3, previously intractable results can now be obtained

• Are these three cases representative of code running on CSC machines?– the next two cases are under study!

Page 22: Case studies in Optimizing High Performance Computing Software Jan Westerholm High performance computing Department of Information Technologies Faculty.

What have we learnt?

• Computer scientists with minimal prior knowledge of e.g. physical sciences can contribute to HPC

• Are supercomputers needed to the extent they are used today at CSC?

• Interprocess communication often a bottleneck– Parallel computing with 1000 processors may

become routine in the future for certain types of problems

• Who should do the coding? – Code for production use (intensive cycles of use,

maintainability) should be outsourced?

Page 23: Case studies in Optimizing High Performance Computing Software Jan Westerholm High performance computing Department of Information Technologies Faculty.

Co-workers:

• Mats Aspnäs, Ph.D

• Anders Gustafsson, M.Sc.

• Artur Signell, M.Sc.

• André Norrgård

THANK YOU!