Performance Analysis of HPC with Lmbench Didem Unat Supervisor: Nahil Sobh July 22 nd 2005...

46
Performance Analysis of HPC with Lmbench Didem Unat Supervisor: Nahil Sobh July 22 nd 2005 netfiles.uiuc.edu/dunat2/www

description

Compiler & optimization issues The GNU C compiler is used for all the resources but copper IBM xlc compiler was used on copper. All of the benchmarks were compiled with optimization -O except the benchmarks that calculate clock speed and the context switch times

Transcript of Performance Analysis of HPC with Lmbench Didem Unat Supervisor: Nahil Sobh July 22 nd 2005...

Page 1: Performance Analysis of HPC with Lmbench Didem Unat Supervisor: Nahil Sobh July 22 nd 2005 netfiles.uiuc.edu/dunat2/www.

Performance Analysis of HPC with Lmbench

Didem Unat Supervisor: Nahil Sobh

July 22nd 2005

netfiles.uiuc.edu/dunat2/www

Page 2: Performance Analysis of HPC with Lmbench Didem Unat Supervisor: Nahil Sobh July 22 nd 2005 netfiles.uiuc.edu/dunat2/www.

Lmbench: Micro-Benchmark Suite

• Simple, portable benchmarks• Compares different Unix systems

performance• Measures latency and bandwidth • Only analyzes performance of

processor, memory, network, file system and disk

• Free software

Page 3: Performance Analysis of HPC with Lmbench Didem Unat Supervisor: Nahil Sobh July 22 nd 2005 netfiles.uiuc.edu/dunat2/www.

Compiler & optimization issues

• The GNU C compiler is used for all the resources but copper

• IBM xlc compiler was used on copper. • All of the benchmarks were compiled with

optimization -O except the benchmarks that calculate clock speed and the context switch times

Page 4: Performance Analysis of HPC with Lmbench Didem Unat Supervisor: Nahil Sobh July 22 nd 2005 netfiles.uiuc.edu/dunat2/www.

Metrics in the Benchmark

Bandwidth • Pipe/ TCP• Cached file read• Memory copy• Memory read/write

Latency• System call• Signal handling• Process creation• Basic CPU operations• Context switching• Inter process communication• File and VM system• Memory read latencies

Page 5: Performance Analysis of HPC with Lmbench Didem Unat Supervisor: Nahil Sobh July 22 nd 2005 netfiles.uiuc.edu/dunat2/www.

Metrics in the Benchmark

Bandwidth • Pipe/ TCP• Cached file read• Memory copy• Memory read/write

Latency• System call• Signal handling• Process creation• Basic CPU operations• Context switching• Inter process communication• File and VM system• Memory read latencies

Page 6: Performance Analysis of HPC with Lmbench Didem Unat Supervisor: Nahil Sobh July 22 nd 2005 netfiles.uiuc.edu/dunat2/www.

Inter Process Communication Bandwidth

• Transfers 64 MB of data in 64 KB chunks

through• Unix Pipe • Unix sockets • TCP/IP sockets 0

500

1000

1500

2000

2500

3000

Pipe AF Unix TCP

W Co Cu Hg

MB/sec

Page 7: Performance Analysis of HPC with Lmbench Didem Unat Supervisor: Nahil Sobh July 22 nd 2005 netfiles.uiuc.edu/dunat2/www.

Inter Process Communication Bandwidth

• Transfers 64 MB of data in 64 KB chunks

through• Unix Pipe • Unix sockets • TCP/IP sockets 0

500

1000

1500

2000

2500

3000

Pipe AF Unix TCP

W Co Cu Hg

MB/sec

W

Co

Page 8: Performance Analysis of HPC with Lmbench Didem Unat Supervisor: Nahil Sobh July 22 nd 2005 netfiles.uiuc.edu/dunat2/www.

Metrics in the Benchmark

Bandwidth • Pipe/ TCP• Cached file read• Memory copy• Memory read/write

Latency• System call• Signal handling• Process creation• Basic CPU operations• Context switching• File and VM system• Inter process communication • Memory read latencies

Page 9: Performance Analysis of HPC with Lmbench Didem Unat Supervisor: Nahil Sobh July 22 nd 2005 netfiles.uiuc.edu/dunat2/www.

Cached file read• A reread benchmark, intended to be used

on a file that is in memory • File reread :

copies data from the kernel’s file system page into the processor’s buffer

• Mmap reread :

maps the entire file (8 MB) into process’s address space

Page 10: Performance Analysis of HPC with Lmbench Didem Unat Supervisor: Nahil Sobh July 22 nd 2005 netfiles.uiuc.edu/dunat2/www.
Page 11: Performance Analysis of HPC with Lmbench Didem Unat Supervisor: Nahil Sobh July 22 nd 2005 netfiles.uiuc.edu/dunat2/www.
Page 12: Performance Analysis of HPC with Lmbench Didem Unat Supervisor: Nahil Sobh July 22 nd 2005 netfiles.uiuc.edu/dunat2/www.
Page 13: Performance Analysis of HPC with Lmbench Didem Unat Supervisor: Nahil Sobh July 22 nd 2005 netfiles.uiuc.edu/dunat2/www.
Page 14: Performance Analysis of HPC with Lmbench Didem Unat Supervisor: Nahil Sobh July 22 nd 2005 netfiles.uiuc.edu/dunat2/www.

Metrics in the Benchmark

Bandwidth • Pipe/TCP• Cached file read• Memory copy• Memory read/write

Latency• System call• Signal handling• Process creation• Basic CPU operations• Context switching• File and VM system• Inter process communication • Memory read latencies

Page 15: Performance Analysis of HPC with Lmbench Didem Unat Supervisor: Nahil Sobh July 22 nd 2005 netfiles.uiuc.edu/dunat2/www.

Memory copy• Measures how fast the system

can bcopy data• Bcopy copies n bytes from string

source to string destination• An 8 MB to 8 MB copy, does not

fit in the cache• Kernel bcopy and C library bcopy• C library bcopy shown in the

next slide

Page 16: Performance Analysis of HPC with Lmbench Didem Unat Supervisor: Nahil Sobh July 22 nd 2005 netfiles.uiuc.edu/dunat2/www.
Page 17: Performance Analysis of HPC with Lmbench Didem Unat Supervisor: Nahil Sobh July 22 nd 2005 netfiles.uiuc.edu/dunat2/www.

Metrics in the Benchmark

Bandwidth • Pipe/TCP• Cached file read• Memory copy• Memory read/write

Latency• System call• Signal handling• Process creation• Basic CPU operations• Context switching• File and VM system• Inter process communication • Memory read latencies

Page 18: Performance Analysis of HPC with Lmbench Didem Unat Supervisor: Nahil Sobh July 22 nd 2005 netfiles.uiuc.edu/dunat2/www.

Memory read/writeRead• Measures the time to read data into

the processor• An unrolled loop that sums up a series

of integers

Write• Measures the time to write data to

memory• An unrolled loop that stores a value

into an integer

Page 19: Performance Analysis of HPC with Lmbench Didem Unat Supervisor: Nahil Sobh July 22 nd 2005 netfiles.uiuc.edu/dunat2/www.
Page 20: Performance Analysis of HPC with Lmbench Didem Unat Supervisor: Nahil Sobh July 22 nd 2005 netfiles.uiuc.edu/dunat2/www.
Page 21: Performance Analysis of HPC with Lmbench Didem Unat Supervisor: Nahil Sobh July 22 nd 2005 netfiles.uiuc.edu/dunat2/www.

12

3

Page 22: Performance Analysis of HPC with Lmbench Didem Unat Supervisor: Nahil Sobh July 22 nd 2005 netfiles.uiuc.edu/dunat2/www.
Page 23: Performance Analysis of HPC with Lmbench Didem Unat Supervisor: Nahil Sobh July 22 nd 2005 netfiles.uiuc.edu/dunat2/www.
Page 24: Performance Analysis of HPC with Lmbench Didem Unat Supervisor: Nahil Sobh July 22 nd 2005 netfiles.uiuc.edu/dunat2/www.
Page 25: Performance Analysis of HPC with Lmbench Didem Unat Supervisor: Nahil Sobh July 22 nd 2005 netfiles.uiuc.edu/dunat2/www.

Metrics in the Benchmark

Bandwidth • Pipe/ TCP• Cached file read• Memory copy• Memory read/write

Latency• System call• Signal handling• Process creation• Basic CPU operations• Context switching• Inter process communication• File and VM system• Memory read latencies

Page 26: Performance Analysis of HPC with Lmbench Didem Unat Supervisor: Nahil Sobh July 22 nd 2005 netfiles.uiuc.edu/dunat2/www.

Operating System Entry/ Signal Handling / Process Creation Costs

• Process-related latencies

• System Call null call, null I/O, stat, open/close

• Signal Handling signal installation, signal handling

• Process Creation fork + exit, fork + execve, fork +

/bin/sh -c

Page 27: Performance Analysis of HPC with Lmbench Didem Unat Supervisor: Nahil Sobh July 22 nd 2005 netfiles.uiuc.edu/dunat2/www.
Page 28: Performance Analysis of HPC with Lmbench Didem Unat Supervisor: Nahil Sobh July 22 nd 2005 netfiles.uiuc.edu/dunat2/www.

Metrics in the Benchmark

Bandwidth • Pipe/ TCP• Cached file read• Memory copy• Memory read/write

Latency• System call• Signal handling• Process creation• Basic CPU operations• Context switching• Inter process communication• File and VM system• Memory read latencies

Page 29: Performance Analysis of HPC with Lmbench Didem Unat Supervisor: Nahil Sobh July 22 nd 2005 netfiles.uiuc.edu/dunat2/www.
Page 30: Performance Analysis of HPC with Lmbench Didem Unat Supervisor: Nahil Sobh July 22 nd 2005 netfiles.uiuc.edu/dunat2/www.
Page 31: Performance Analysis of HPC with Lmbench Didem Unat Supervisor: Nahil Sobh July 22 nd 2005 netfiles.uiuc.edu/dunat2/www.

Metrics in the Benchmark

Bandwidth • Pipe/ TCP• Cached file read• Memory copy• Memory read/write

Latency• System call• Signal handling• Process creation• Basic CPU operations• Context switching• Inter process communication• File and VM system• Memory read latencies

Page 32: Performance Analysis of HPC with Lmbench Didem Unat Supervisor: Nahil Sobh July 22 nd 2005 netfiles.uiuc.edu/dunat2/www.

Context Switching• The time to save the state of one process and

restore the state of another process

• The processes are connected in a ring of Unix pipes

• A token is passed from process to process

• The process allocates an array and sums the array

• Context-switch time doesn't include the overhead of doing the work.

• Two parameters: number and size of processes

Page 33: Performance Analysis of HPC with Lmbench Didem Unat Supervisor: Nahil Sobh July 22 nd 2005 netfiles.uiuc.edu/dunat2/www.
Page 34: Performance Analysis of HPC with Lmbench Didem Unat Supervisor: Nahil Sobh July 22 nd 2005 netfiles.uiuc.edu/dunat2/www.

Metrics in the Benchmark

Bandwidth • Pipe/ TCP• Cached file read• Memory copy• Memory read/write

Latency• System call• Signal handling• Process creation• Basic CPU operations• Context switching• Inter process communication• File and VM system• Memory read latencies

Page 35: Performance Analysis of HPC with Lmbench Didem Unat Supervisor: Nahil Sobh July 22 nd 2005 netfiles.uiuc.edu/dunat2/www.

Interprocess Communication Latencies• Passing a small message back and forth

between two processes

• The time reported is one round trip

• Message size: a byte or a word

• Metrics: Pipe, Unix Socket, UDP and TCP , RPC/UDP-TCP, TCP connection latency

Page 36: Performance Analysis of HPC with Lmbench Didem Unat Supervisor: Nahil Sobh July 22 nd 2005 netfiles.uiuc.edu/dunat2/www.
Page 37: Performance Analysis of HPC with Lmbench Didem Unat Supervisor: Nahil Sobh July 22 nd 2005 netfiles.uiuc.edu/dunat2/www.

Metrics in the Benchmark

Bandwidth • Pipe/ TCP• Cached file read• Memory copy• Memory read/write

Latency• System call• Signal handling• Process creation• Basic CPU operations• Context switching• Inter process communication• File and VM system• Memory read latencies

Page 38: Performance Analysis of HPC with Lmbench Didem Unat Supervisor: Nahil Sobh July 22 nd 2005 netfiles.uiuc.edu/dunat2/www.

File & VM System• File create/ delete creates a number of small files in the current

working directory and then removes the files

• Mmap latency : costs of mmapping and unmmapping varying file sizes

• Prot fault : the time to catch a protection fault • Page fault : the cost of page faulting pages from a file

• 100 fd selct : the time to do a select on n file descriptors

Page 39: Performance Analysis of HPC with Lmbench Didem Unat Supervisor: Nahil Sobh July 22 nd 2005 netfiles.uiuc.edu/dunat2/www.
Page 40: Performance Analysis of HPC with Lmbench Didem Unat Supervisor: Nahil Sobh July 22 nd 2005 netfiles.uiuc.edu/dunat2/www.

Metrics in the Benchmark

Bandwidth • Pipe/ TCP• Cached file read• Memory copy• Memory read/write

Latency• System call• Signal handling• Process creation• Basic CPU operations• Context switching• Inter process communication • File and VM system• Memory read latencies

Page 41: Performance Analysis of HPC with Lmbench Didem Unat Supervisor: Nahil Sobh July 22 nd 2005 netfiles.uiuc.edu/dunat2/www.

Memory Latencies

• Measures memory read latency for varying memory sizes and strides

• The size of the array starts from 512 bytes

• The stride varies from 16 to 1024

• Does not include the instruction execution time

Page 42: Performance Analysis of HPC with Lmbench Didem Unat Supervisor: Nahil Sobh July 22 nd 2005 netfiles.uiuc.edu/dunat2/www.
Page 43: Performance Analysis of HPC with Lmbench Didem Unat Supervisor: Nahil Sobh July 22 nd 2005 netfiles.uiuc.edu/dunat2/www.
Page 44: Performance Analysis of HPC with Lmbench Didem Unat Supervisor: Nahil Sobh July 22 nd 2005 netfiles.uiuc.edu/dunat2/www.

Conclusion the best has problems IPC bandwidth

Co W, Cu

Cashed I/O bandwidth

W Co, Hg

Memory R/W Bandwidth

W Co, Hg

Process Creation

Cu Co

CPU ops W , Co, Hg Cu

Network Lat W Co, Cu

Memory Lat W, Co Cu

Page 45: Performance Analysis of HPC with Lmbench Didem Unat Supervisor: Nahil Sobh July 22 nd 2005 netfiles.uiuc.edu/dunat2/www.

THANK YOU !

Have a nice weekend !

Page 46: Performance Analysis of HPC with Lmbench Didem Unat Supervisor: Nahil Sobh July 22 nd 2005 netfiles.uiuc.edu/dunat2/www.

References

• “Lmbench – Tools for Performance Analysis” http://www.bitmover.com/lmbench/

• Larry McVoy and Carl Staelin, “Lmbench: Portable tools for performance analysis”

http://www.usenix.org/publications/library/proceedings/ sd96/full_papers/mcvoy.pdf

• Carl Staelin, “Lmbench:an extensible micro-benchmark suite”

http://www.hpl.hp.com/techreports/2004/HPL-2004-213.html