File System Benchmarking Advanced Research Computing.

55
File System Benchmarking Advanced Research Computing

Transcript of File System Benchmarking Advanced Research Computing.

Page 1: File System Benchmarking Advanced Research Computing.

File System BenchmarkingAdvanced Research Computing

Page 2: File System Benchmarking Advanced Research Computing.

Advanced Research Computing

Outline• IO benchmarks– What is benchmarked– Micro-benchmarks– Synthetic benchmarks

• Benchmarks results for– Shelter NFS server, client on hokiespeed– NetApp FAS 3240 server, client on hokiespeed and

blueridge

2

Page 3: File System Benchmarking Advanced Research Computing.

Advanced Research Computing 3

IO BENCHMARKING

Page 4: File System Benchmarking Advanced Research Computing.

Advanced Research Computing

IO Benchmarks• Micro-benchmarks– Measure one basic operation in isolation

• Read and write throughput: dd, IOzone, IOR• Metadata operations (file create, stat, remove): mdtest

– Good for: tuning an operation, system acceptance• Synthetic benchmarks:– Mix of operations that model real applications– Useful if they are good models of real applications– Examples:

• Kernel build, kernel tar and untar• NAS BT-IO

4

Page 5: File System Benchmarking Advanced Research Computing.

Advanced Research Computing

IO Benchmark pitfalls• Not measuring want you want to measure– masking of the results by various caching and

buffering mechanisms• Examples of different behaviors– Sequential bandwidth vs random IO bandwidth;– Direct IO bandwidth vs bandwidth in the presence

of the page cache (in the latter case an fsync is needed)

– Caching of file attributes: stat-ing a file on the same node on which the file has been written

5

Page 6: File System Benchmarking Advanced Research Computing.

Advanced Research Computing

What is benchmarked• What we measure is the combined effect of:– native file system on the NFS server (shelter)– NFS server performance which depends on factors such

as enabling/disabling write-delay and the number of server threads• Too few threads: client retries several times• Too many threads: server thrashing

– network between the compute cluster and the NFS server

– NFS client and mount options• Synchronous or asynchronous• Enable/disable attribute caching

6

Page 7: File System Benchmarking Advanced Research Computing.

Advanced Research Computing

• IOZone – measure read/write bandwidth– Historical benchmark ability to test multiple

readers/writers• dd – measure read/write bandwidth– Tests file write/read

• mdtest – metadata operations per second– file/directory create/stat/remove

Micro-benchmarks

Page 8: File System Benchmarking Advanced Research Computing.

Advanced Research Computing

• Measures the rate of the operations of file/directory– create, stat, remove

• Mdtest creates a tree of files and directories• Parameters used– tree depth z = 1– branching factor b = 3– number of files/directories per tree node: I = 256– Stat run by another node than the create node: N = 1– Number of repeats of the run: i = 5

Mdtest – metada test

Page 9: File System Benchmarking Advanced Research Computing.

Advanced Research Computing

• tar-untar-rm – measure time– Test large number of small file creation/deletion– Test filesystem metadata creation/deletion

• NAS BT-IO – bandwidth and time doing IO – Solve a block tri-diagonal linear system arising

from the discretization of Navier-Stokes equations

Synthetic benchmarks

Page 10: File System Benchmarking Advanced Research Computing.

Advanced Research Computing

• Run on 1 to 32 nodes.• Tarball size: 890M• Total directories: 4732– Max directory depth: 10

• Total files: 75984– Max file size: 919 kB– <= 1k: 14490– <= 10k: 40190– <=100k: 20518– <= 1M: 786

Kernel source tar-untar-rm

Page 11: File System Benchmarking Advanced Research Computing.

Advanced Research Computing

• Test mechanism – BT is a simulated CFD application that uses an implicit algorithm to

solve 3-dimensional compressible Navier-Stokes equations. The finite differences solution to the problem is based on an Alternating Direction Implicit (ADI) approximate factorization that decouples the x, y and z dimensions. The resulting systems are Block-Tridiagonal of 5x5 blocks and are solved sequentially along each dimension.

– BT-I/O is test of different parallel I/O techniques in BT– Reference - http://www.nas.nasa.gov/assets/pdf/techreports/1999/nas-99-011.pdf

• What it measures– Multiple cores I/O with a single large file (blocking MPI calls

mpi_file_write_at_all and mpi_file_read_at_all)– I/O timing percentage, Total data written, I/O data rate

NAS BT-I/O

Page 12: File System Benchmarking Advanced Research Computing.

Advanced Research Computing 12

SHELTER NFS RESULTS

Page 13: File System Benchmarking Advanced Research Computing.

Advanced Research Computing

• Run on 1 to 32 nodes• Two block size – 1MB and 4MB• Three file sizes – 1GB, 5GB, 15GB

Block size File size Average Median Stdev

1M 1G 8.01 6.10 4.58

1M 5G 7.75 5.95 4.52

1M 15G 5.74 5.60 0.34

4M 4G 11.17 11.80 2.87

4M 20G 15.71 12.70 10.68

4M 60G 14.60 10.50 9.22

dd throughput (MB/sec)

Page 14: File System Benchmarking Advanced Research Computing.

Advanced Research Computing

1 2 30

2

4

6

8

10

12

14

16

18

1M4M

File size of 1G/4G, 5G/20G, 15G/60G

Band

wid

th (M

B/s)

dd throughput (MB/sec)

Page 15: File System Benchmarking Advanced Research Computing.

Advanced Research Computing

0 2 4 6 8 10 12 1485,000

90,000

95,000

100,000

105,000

110,000

115,000

120,000

Average iozone Throughput

Write Child

Write Parent

Threads

Thro

ughp

ut (K

B/s)

IOZone write throughput

Page 16: File System Benchmarking Advanced Research Computing.

Advanced Research Computing

Write Read

Child 112033.186666667 9973.435

Parent 112032.34 9973.43

10,000

30,000

50,000

70,000

90,000

110,000

Write vs. Read (32 GB, 1 Thread)

Thro

ughp

ut (K

B/s)

IOZone write vs read (single thread)

Page 17: File System Benchmarking Advanced Research Computing.

Advanced Research Computing

0 5 10 15 20 25 30 350

500

1000

1500

2000

2500

Mdtest: IO per second for directory and file creation, mdtest -z 1 -b 3 -I 256

directory create

file create

Number of Nodes

IO o

peati

ons

Mdtest file/directory create rate

Page 18: File System Benchmarking Advanced Research Computing.

Advanced Research Computing

Mdtest file/directory remove rate

0 5 10 15 20 25 30 350

500

1000

1500

2000

2500

3000

Mdtest: IO per second for directory and file removal,

mdtest -z 1 -b 3 -I 256

directory remove

file re-move

Number of Nodes

IO o

peati

ons

Page 19: File System Benchmarking Advanced Research Computing.

Advanced Research Computing

Mdtest file/directory stat rate

0 5 10 15 20 25 30 350

100000

200000

300000

400000

500000

600000

700000

Mdtest: IO per second for directory and file stat, mdtest -z 1 -b 3 -I 256

direc-tory stat

file stat

Number of Nodes

IO o

peati

ons

Page 20: File System Benchmarking Advanced Research Computing.

Advanced Research Computing

Tar-untar-rm time (sec)tar Real User Sys

Average 781.27 1.35 10.41Median 1341.72 1.66 13.08

Standard deviation 644.16 0.44 3.39

untar Real User SysAverage 1214.82 1.51 18.02Median 1200.13 1.51 17.90

Standard deviation 99.03 0.06 0.62

rm Real User SysAverage 227.48 0.22 3.91Median 216.28 0.22 3.87

Standard deviation 64.21 0.02 0.16

Page 21: File System Benchmarking Advanced Research Computing.

Advanced Research Computing

Attribute Class C Class D

Problem Size 162 x 162 x 162 408 x 408 x 408

Iterations 200 250

Number of Processes 4 361

I/O timing percentage 13.44 91.66

Total data written in a single file (MB) 6802.44 135834.62

I/O data rate (MB/sec) 94.99 73.45

Data written or read at every I/O instance into a single file per processor (MB/core)

42.5 7.5

BT-IO Results

Page 22: File System Benchmarking Advanced Research Computing.

Advanced Research Computing 22

NETAPP FAS 3240 RESULTS

Page 23: File System Benchmarking Advanced Research Computing.

Advanced Research Computing

Server and Clients • NAS server: NetApp FAS 3240 • Clients running on two clusters– Hokiespeed– Blueridge

• Hokiespeed: Linux kernel compile, tar-untar and rm tests have been run with: – nodes spread uniformly over racks, and – consecutive nodes (rack-packed)

• Blueridge: Linux kernel compile, tar-untar, and rm tests have been run on consecutive nodes

23

Page 24: File System Benchmarking Advanced Research Computing.

Advanced Research Computing

IOzone read and write throughput (KB/s)

1 3 6 12101000

101500

102000

102500

103000

103500

104000

104500

IOZone write throughput

Write Child Throughput

Write Parent Throughput

# threads1 3 6 12

113800

114000

114200

114400

114600

114800

115000

115200

115400

115600

IOZone Read throughput

Read Child Throughput

Read Parent Throughput

# threads

Hokiespeed

Page 25: File System Benchmarking Advanced Research Computing.

Advanced Research Computing

• Two node placement policies– packed on a rack– spread across racks

• Direct IO was used• Two operations: read and write• Two block sizes – 1MB and 4MB• Three file sizes – 1GB, 5GB, 15 GB• Results show throughput in MB/s

dd bandwidth (MB/sec)

Page 26: File System Benchmarking Advanced Research Computing.

Advanced Research Computing

dd read throughput (MB/sec), 1MB blocks

1 2 4 8 16 32 640

100

200

300

400

500

600

1G

5G

15G

# nodes

Nodes spread

Hokiespeed

Nodes packed

1 2 4 8 16 32 640

100

200

300

400

500

600

700

1G

5G

15G

# nodes1 2 4 8 16 32 64

0.00

100.00

200.00

300.00

400.00

500.00

600.00

700.00

1G

5G

15G

# nodes

BlueRidge

Nodes packed

Page 27: File System Benchmarking Advanced Research Computing.

Advanced Research Computing

dd read throughput (MB/sec), 4 MB blocks

Nodes spread

Hokiespeed

Nodes packed

1 2 4 8 16 32 640

50

100

150

200

250

300

350

400

1G

5G

15G

# nodes

1 2 4 8 16 32 640

100

200

300

400

500

600

700

1G5G15G

# nodes1 2 4 8 16 32 64

0.00

100.00

200.00

300.00

400.00

500.00

600.00

700.00

800.00

1G

5G

15G

# nodes

BlueRidge

Nodes packed

Page 28: File System Benchmarking Advanced Research Computing.

Advanced Research Computing

dd write throughput (MB/sec), 1MB blocks

Nodes spread

Hokiespeed

Nodes packed

1 2 4 8 16 32 640

50

100

150

200

250

300

350

400

1G5G15G

# nodes

1 2 4 8 16 32 640

50

100

150

200

250

300

350

400

1G

5G

15G

# nodes1 2 4 8 16 32 64

0.00

50.00

100.00

150.00

200.00

250.00

300.00

350.00

400.00

1G

5G

15G

# nodes

BlueRidge

Nodes packed

Page 29: File System Benchmarking Advanced Research Computing.

Advanced Research Computing

dd write throughput (MB/sec), 4MB blocks

Nodes spread

Hokiespeed

Nodespacked

1 2 4 8 16 32 640

50

100

150

200

250

300

350

400

1G

5G

15G

# nodes

1 2 4 8 16 32 640

50

100

150

200

250

300

350

400

1G5G15G

# nodes 1 2 4 8 16 32 640.00

50.00

100.00

150.00

200.00

250.00

300.00

350.00

400.00

1G

5G

15G

# nodes

BlueRidge

Nodespacked

Page 30: File System Benchmarking Advanced Research Computing.

Advanced Research Computing

• Two node placement policies– packed on a rack– spread across racks

• Operations– Compile: make –j 12– Tar creations and extraction – Remove directory tree read and write

• Results show throughput in MB/s

Linux Kernel tests

Page 31: File System Benchmarking Advanced Research Computing.

Advanced Research Computing

Linux Kernel compile time (sec)

Nodes packed

nodes real user sys

1 733 5001 1116

2 1546 5086 1233

4 3189 5146 1273

8 6343 5219 1317

16 9476 5251 1366

32 10012 5255 1339

nodes real user sys

1 817 4968 10962 990 5014 11384 993 5223 11718 939 5143 1167

16 1318 5112 119832 2561 5087 118364 4985 5111 1209

Nodes spread

nodes real user sys1 694 4589 9512 1092 4572 9934 2212 4631 10388 4451 4691 1073

16 5636 4716 109832 5999 4702 111164 6609 4699 1089

BlueRidge Hokiespeed

Nodes packed

Page 32: File System Benchmarking Advanced Research Computing.

Advanced Research Computing

Tar extraction time (sec)

Nodes packed

Nodes spread

nodes real user Sys

1 167 1.0 9.5

2 172 0.98 9.5

4 177 1.06 9.6

8 202 1.03 9.7

16 312 1.09 10.2

32 421 1.18 11.9

nodes real user sys

1 143 1.05 9.5

2 125 0.98 9.4

4 144 1.04 9.8

8 149 1.04 9.8

16 216 1.08 10.4

32 399 1.23 12.5

64 809 1.42 15.0

Hokiespeed nodes real user sys

1 98 0.6 6.6

2 103 0.6 6.6

4 106 0.6 6.5

8 130 0.7 7.1

16 217 0.8 9.1

32 406 1.2 13

64 818 1.1 14

Nodes packed

BlueRidge

Page 33: File System Benchmarking Advanced Research Computing.

Advanced Research Computing

Rm execution time (sec)

Nodes packed

Nodes spread

nodes real user sys

1 20 0.12 2.5

2 21 0.15 2.7

4 25 0.16 2.8

8 33 0.17 2.8

16 123 0.22 3.7

32 284 0.24 4.0

64 650 0.27 4.4

nodes real user sys

1 21 0.14 2.84

2 22 0.14 2.82

4 22 0.15 2.80

8 47 0.18 3.30

16 135 0.21 3.85

32 248 0.23 4.01

64 811 0.27 4.54

nodes real user sys

1 19.21 0.07 1.69

2 19.14 0.10 1.69

4 26.68 0.11 1.98

8 63.75 0.16 3.16

16 152.59 0.22 4.24

32 324.90 0.26 4.98

64 699.04 0.25 5.06

Nodes packed

Hokiespeed BlueRidge

Page 34: File System Benchmarking Advanced Research Computing.

Advanced Research Computing

Uplink switch traffic, runs on hokiespeed

Nodes packed

Nodes spread

Page 35: File System Benchmarking Advanced Research Computing.

Advanced Research Computing

Mdtest file/directory create rate

0 10 20 30 40 50 60 700

2000

4000

6000

8000

10000

12000

14000

directory create file create

# nodes

0 10 20 30 40 50 60 700

500

1000

1500

2000

2500

3000

3500

4000

4500

directory create file create

# nodes

IO ops/sec for mdtest –z 1 –b 3 –I 256 –i 10 –N 1

Hokiespeed BlueRidge

Page 36: File System Benchmarking Advanced Research Computing.

Advanced Research Computing

Mdtest file/directory remove rate

0 10 20 30 40 50 60 700

1000

2000

3000

4000

5000

6000

7000

8000

9000

10000

directory remove file remove

# nodes

Hokiespeed BlueRidge

0 10 20 30 40 50 60 700

1000

2000

3000

4000

5000

6000

7000

8000

9000

10000

directory remove file remove

# nodes

IO ops/sec for mdtest –z 1 –b 3 –I 256 –i 10 –N 1

Page 37: File System Benchmarking Advanced Research Computing.

Advanced Research Computing

Mdtest file/directory stat rate

Hokiespeed BlueRidge

IO ops/sec for mdtest –z 1 –b 3 –I 256 –i 10 –N 1

0 10 20 30 40 50 60 700

100000

200000

300000

400000

500000

600000

700000

directory stat file stat

# nodes

0 10 20 30 40 50 60 700

100000

200000

300000

400000

500000

600000

700000

directory stat file stat

# nodes

Page 38: File System Benchmarking Advanced Research Computing.

Advanced Research Computing

Class D

Iterations 250 (I/O after every 5 steps)

Number of jobs 50

Total data size (written/read) (TB) 6.5 (50 files of 135GB each)

System HokieSpeed BlueRidge

Nodes per job 3 4

Total number of cores 1800 3200

Average I/O timing in hours 5.175 5.85 5.3 5.5

Average I/O timing (percentage of total time) 92.6 93.4 92.7 96.6

Average Mop/s/process 80.6 72 79.6 44.5

Average I/O rate per node (MB/s) 2.44 2.15 2.34 1.71

Total I/O rates (MB/s) 357.64 323.02 359.8 343.42

NAS BT IO results

Page 39: File System Benchmarking Advanced Research Computing.

Advanced Research Computing

Uplink switch traffic for BT-IO on hokiespeed

• The boxes indicate the three NAS BT IO runs

• Red is write• Green is read

1 2

3

Page 40: File System Benchmarking Advanced Research Computing.

Advanced Research Computing 40

EMC ISILON X400 RESULTS

Page 41: File System Benchmarking Advanced Research Computing.

Advanced Research Computing

• Runs on BlueRidge– no special node placement policy

• Direct IO was used• Two operations: read and write• Two block sizes – 1MB and 4MB• Three file sizes – 1GB, 5GB, 15 GB• Results show throughput in MB/s

dd bandwidth (MB/sec)

Page 42: File System Benchmarking Advanced Research Computing.

Advanced Research Computing

dd read throughput (MB/sec), 1MB blocks

EMC Isilon

1 2 4 8 16 32 640

100

200

300

400

500

600

700

1G

5G

15G

# nodes

NetApp

1 2 4 8 16 32 640.00

200.00

400.00

600.00

800.00

1000.00

1200.00

1400.00

1G5G15G

# nodes

Page 43: File System Benchmarking Advanced Research Computing.

Advanced Research Computing

dd read throughput (MB/sec), 4 MB blocksIsilon

1 2 4 8 16 32 640.00

100.00

200.00

300.00

400.00

500.00

600.00

700.00

800.00

1G

5G

15G

# nodes

NetApp

1 2 4 8 16 32 640.00

200.00

400.00

600.00

800.00

1000.00

1200.00

1400.00

1G5G15G

# nodes

Page 44: File System Benchmarking Advanced Research Computing.

Advanced Research Computing

dd write throughput (MB/sec), 1MB blocks

Isilon

1 2 4 8 16 32 640.00

50.00

100.00

150.00

200.00

250.00

300.00

350.00

400.00

1G

5G

15G

# nodes

NetApp

1 2 4 8 16 32 640.00

100.00

200.00

300.00

400.00

500.00

600.00

700.00

1G5G15G

# nodes

Page 45: File System Benchmarking Advanced Research Computing.

Advanced Research Computing

dd write throughput (MB/sec), 4MB blocks

Isilon

1 2 4 8 16 32 640.00

50.00

100.00

150.00

200.00

250.00

300.00

350.00

400.00

1G

5G

15G

# nodes

NetApp

1 2 4 8 16 32 640.00

100.00

200.00

300.00

400.00

500.00

600.00

700.00

800.00

900.00

1G5G15G

# nodes

Page 46: File System Benchmarking Advanced Research Computing.

Advanced Research Computing

• Runs on BlueRidge– no special node placement policy

• Direct IO was used• Operations– Compile: make –j 12– Tar creations and extraction – Remove directory tree read and write

• Results show throughput in MB/s

Linux Kernel tests

Page 47: File System Benchmarking Advanced Research Computing.

Advanced Research Computing

Linux Kernel compile time (sec)

nodes real user sys

1 694 4589 951

2 1092 4572 993

4 2212 4631 1038

8 4451 4691 1073

16 5636 4716 1098

32 5999 4702 1111

64 6609 4699 1089

NetApp Isilon

nodes real user sys

1 701 4584 957

2 1094 4558 989

4 2228 4631 1038

8 4642 4713 1084

16 5860 4723 1107

32 6655 4754 1120

64 7181 4760 1113

Page 48: File System Benchmarking Advanced Research Computing.

Advanced Research Computing

Tar creation time (sec)

Isilon NetApp

nodes real user sys

1 32 0.50 4.45

2 32 0.51 4.54

4 32 0.47 4.39

8 32 0.48 4.38

16 33 0.49 4.28

32 35 0.49 4.19

64 57 0.51 4.20

nodes real user sys

1 30 0.51 4.50

2 30 0.49 4.46

4 34 0.50 4.51

8 41 0.51 4.45

16 62 0.54 4.51

32 116 0.60 4.83

64 238 0.89 7.10

Page 49: File System Benchmarking Advanced Research Computing.

Advanced Research Computing

Tar extraction time (sec)

Isilon

nodes real user sys

1 98 0.6 6.6

2 103 0.6 6.6

4 106 0.6 6.5

8 130 0.7 7.1

16 217 0.8 9.1

32 406 1.2 13

64 818 1.1 14

NetApp

nodes real user sys

1 230 0.65 10.1

2 234 0.62 10.3

4 237 0.63 10.4

8 255 0.64 10.5

16 300 0.67 10.9

32 431 0.74 11.8

64 754 0.87 14.1

Page 50: File System Benchmarking Advanced Research Computing.

Advanced Research Computing

Rm execution time (sec)

nodes real user sys

1 19.2 0.07 1.69

2 19.1 0.10 1.69

4 26.7 0.11 1.98

8 63.7 0.16 3.16

16 152 0.22 4.24

32 324 0.26 4.98

64 699 0.25 5.06

Isilon NetApp

nodes real user sys

1 110 0.23 4.76

2 113 0.24 4.80

4 124 0.24 4.82

8 158 0.24 4.85

16 234 0.25 4.93

32 340 0.26 4.99

64 655 0.26 5.27

Page 51: File System Benchmarking Advanced Research Computing.

Advanced Research Computing

IOZone write throughput (KB/s) Isilon

Direct IO/BlueRidge Buffered IO/BlueRidge

1 3 6 120

20000

40000

60000

80000

100000

120000

140000

Write Child Throughput

Write Parent Throughput

# threads1 3 6 1285000.00

90000.00

95000.00

100000.00

105000.00

110000.00

115000.00

120000.00

Write Child Throughput

Write Parent Throughput

# threads

Page 52: File System Benchmarking Advanced Research Computing.

Advanced Research Computing

IOZone read throughput (KB/s) Isilon

1 3 6 120

20000

40000

60000

80000

100000

120000

Read Child Throughput

Read Parent Throughput

# threads1 3 6 120.00

1000000.00

2000000.00

3000000.00

4000000.00

5000000.00

6000000.00

7000000.00

8000000.00

Read Child Throughput

Read Parent Throughput

# threads

Direct IO/BlueRidge Buffered IO/BlueRidge

Page 53: File System Benchmarking Advanced Research Computing.

Advanced Research Computing

Iozone write throughput (KB/s)

1 3 6 12101000

101500

102000

102500

103000

103500

104000

104500

Write Child Throughput

Write Parent Throughput

# threads

NetApp/HokieSpeed

1 3 6 1285000.00

90000.00

95000.00

100000.00

105000.00

110000.00

115000.00

120000.00

Write Child Throughput

Write Parent Throughput

# threads

Isilon/BlueRidge

Page 54: File System Benchmarking Advanced Research Computing.

Advanced Research Computing

IOzone read throughput (KB/s)

1 3 6 12113800

114000

114200

114400

114600

114800

115000

115200

115400

115600

Read Child Throughput

Read Parent Throughput

# threads

1 3 6 120.00

1000000.00

2000000.00

3000000.00

4000000.00

5000000.00

6000000.00

7000000.00

8000000.00

Read Child Throughput

Read Parent Throughput

# threads

NetApp/HokieSpeed Isilon/BlueRidge

Page 55: File System Benchmarking Advanced Research Computing.

Advanced Research Computing

Thank you.