B ENCHMARK ON D ELL 2950+MD1000 ATLAS Tier2/Tier3 workshop Wenjing wu AGLT2 / University of...
-
Upload
noah-hodge -
Category
Documents
-
view
212 -
download
0
description
Transcript of B ENCHMARK ON D ELL 2950+MD1000 ATLAS Tier2/Tier3 workshop Wenjing wu AGLT2 / University of...
BENCHMARK ON DELL 2950+MD1000
ATLAS Tier2/Tier3 workshop
Wenjing wuAGLT2 / University of Michigan
2008/05/27
DELL 2950+4 MD1000
2
CURRENT SETUP
2950 HARDWARE EQUIPMENT Chassis Model: PowerEdge 2950
2 CPUS: Quad core, Intel Xeon CPU [email protected] Model 15 Stepping 11
Memory : 16GB DDR II SDRAM, Memory Speed: 667 MHz NIC :
Broadcom NetXtreme II BCM5708 Gigabit Ethernet Myricom 10G-PCIE-8A-C
Raid controllers PERC 5/E Adapter Version 5.1.1-0040 (Slot 1 PCI-e 8x) PERC 5/E Adapter Version 5.1.1-0040 (Slot 2 PCI-e 4x) PERC 6/E Adapter Firmware version 6.0.2-0002 (Slot 1 PCI-e 8x) (extra
700$) PERC 6/E Adapter Firmware version 6.0.2-0002(Slot 2 PCI-e 4x) (extra 700$)
Storage Enclosures 4 MD1000 (each has15 SATA-II 750GB disks)
2950 SOFTWARE EQUIPMENT
OS Scientific Linux CERN SLC release 4.5 (Beryllium) Kernel version: 2.6.20-20UL3smp (current 2.6.20-
20UL5smp) Version Report
BIOS Version : 1.5.1 (current 2.2.6) BMC Version : 1.33 (current 2.0.5) DRAC 5 Version : 1.14 (current 1.33)
BENCHMARK TOOL Benchmark tool: iozone (iozone-3.279-
1.el4.rf.x86_64) Raid configuration tool: omconfig (srvadmin-
omacore-5.2.0-460.i386) Soft Raid: mdadm (mdadm-2.6.1-4.x86_64)
METRICS OF BENCHMARK Controller Level (both perc5/perc6)
raid setup (R0, R5,R50,R6,R60) Read and write policy (ra, ara,nra, wb, wt, fwb) Threshold of both Controllers Stripe size (8KB,16KB,32KB,64KB, 128KB, 256KB,512Kb,1024KB)
Perc5 support maximum 128KB stripe size, perc6 support maximum 1024KB stripe size
Kernel tuning (2.6.20-20UL3smp) read Ahead size Request queue length IO scheduler
File System tuning (xfs) inode size su/sw size internal/external log device
GENERAL PRINCIPLE FOR BENCHMARK There are various factors which would impact
the benchmark result, to measure one, we are trying to fix the other affecting factors on a best value we have got or we anticipate..
We need to benchmark different IO patterns (sequence read/write random read/write/mix workload)
In all, we need a benchmark for all best options for our Dell2950.
CONTROLLER LEVELraid setup (R5,R50,R6,R60)Read and write policy (ra, ara,nra, wb, wt,
fwb)Threshold of Controller(perc5/perc6)Stripe size (8KB,16KB,32KB,64KB, 128KB,
256KB,512Kb,1024KB)Perc5 support maximum 128KB stripe size, perc6
support maximum 1024KB stripe size
PERC5 VS PERC6System setup:• Controller=perc6/perc5• PCI slots= both pci express x4 and x8• raid=r60/r6/r50• stripe size =128KB• read=ra, write=wb• OS kernel= 2.6.20-20UL3smp• readAhead size=10240Blocks=5MB• nr_queue=128 queue_depth=128• IO_scheduler=deadlinefile system options:• su=0 , sw=0• isize=256, bsize=4096, • log=internal bsize=4096 iozone options:• filesize=32GB, ram size=16GB• record size=512KB , multiple threadsMeasure: perc 5/6
READ
1 2 3 4 5 6 7 8 9 10 11 120
200
400
600
800
1000
1200
1400
1600
1800
2000
perc5E vs perc6E read
p52r50-stripe128p62r50-stripe128p62r60-stripe128p62r60-stripe512
Number of threads
perf
orm
ance
MB/
s
WRITE
1 2 3 4 5 6 7 8 9 10 11 120
200
400
600
800
1000
1200
perc5E vs perc6E write
p52r50-stripe128p62r50-stripe128p62r60-stripe128p62r60-stripe512
Threads number
perf
orm
ance
MB/
s
RAID SETUPSystem setup:• Controller=perc5 /perc6 • PCI slots= both pci express x4 and x8• stripe size =128KB• OS kernel= 2.6.20-20UL3smp• readAhead size=10240Blocks=5MB• nr_queue=128 queue_depth=128• IO_scheduler=deadlinefile system options:• su=0 , sw=0• isize=256, bsize=4096, • log=internal bsize=4096 iozone options:• filesize=32GB, ram size=16GB• record size=512KB, multiple threadsMeasure: different raid (r5, r50,r6,r60)
WRITE
1 2 3 4 5 6 7 8 9 10 11 120
200
400
600
800
1000
1200
write performance
p5-4r5p5-2r5p5-2r50p6-2r50p6-r6p6-2r60
number of threads
perf
orm
ance
MB/
s
SOFT RAID ON PERC5Soft raid 0 over 2 r5:Soft raid stripe size should be the same as the
hard raid5 stripe size(128KB)Soft raid 0 over 2 r50:Soft raid stripe size should be the same as the
hard raid5 stripe size(128KB)
WRITE
1 2 3 4 5 6 7 8 9 10 11 120
100
200
300
400
500
600
700
800
write performance
p5-2r5p5-sr02r5p5-2r50p5-sr02r50
number of threads
perf
orm
ance
MB/
s
READ
1 2 3 4 5 6 7 8 9 10 11 120
200
400
600
800
1000
1200
1400
1600
1800
2000
read performance
p5-4r5p5-2r5p5-2r50p6-2r50p6-r6p6-2r60
number of parallel threads
perf
orm
ance
MB?
S
READ
1 2 3 4 5 6 7 8 9 10 11 120
200
400
600
800
1000
1200
1400
1600
1800
read performance
p5-2r5p5-sr02r5p5-2r50p5-sr02r50
number of parallel threads
perf
orm
ance
MB?
S
READ AND WRITE POLICYSystem setup:• Controller=perc5• PCI slots= both pci express x4 and x8• raid=r50• stripe size =128KB• OS kernel= 2.6.20-20UL3smp• readAhead size=10240Blocks=5MB• nr_queue=128 queue_depth=128• IO_scheduler=deadlinefile system options:• su=0 , sw=0• isize=256, bsize=4096, • log=internal bsize=4096 iozone options:• filesize=32GB, ram size=16GB• different record sizeMeasure: different policies (ra, nra,ara, wb,wt,fwb)
WRITE
32 64 128 256 512 1024 2048 4096 81920
50000
100000
150000
200000
250000
300000
350000
400000
450000
write and read policies Write
nrafwbrawtarawb
record size/ kB
perf
orm
ance
MB/
s
READ
32 64 128 256 512 1024 2048 4096 8192740000
745000
750000
755000
760000
765000
770000
775000
write and read policies Read
nrafwbrawtarawb
record size/ kB
perf
orm
ance
MB/
s
PERC5 THRESHOLDSystem setup:• Controller=perc5• Pci slots= pci express x8• raid=r0• stripe size =128KB• read=ra, write=wb• OS kernel= 2.6.20-20UL3smp• readAhead size=10240Blocks=5MB• nr_queue=128 queue_depth=128• IO_scheduler=deadlinefile system options:• su=0 , sw=0• isize=256, bsize=4096, • log=internal bsize=4096 iozone options:• filesize=32GB, ram size=16GB• record size=512KB
Measure single controller with different number of disks.(4-30disks)
PERC5 THRESHOLD
4 8 10 15 18 20 24 300
100000
200000
300000
400000
500000
600000
700000
800000
900000
raid0 with numbers of disks
readwrite
number of disks
perf
orm
ance
MB/
s
PERC 6 THRESHOLDSystem setup:• Controller=perc6• Pci slots= pci express x8• raid=r60• stripe size =512KB• read=ra, write=wb• OS kernel= 2.6.20-20UL3smp• readAhead size=10240Blocks=5MB• nr_queue=512 queue_depth=128• IO_scheduler=deadlinefile system options:• su=0 , sw=0• isize=256, bsize=4096, • log=internal bsize=4096 iozone options:• filesize=32GB, ram size=16GB• record size=512KB
Measure single controller with different number of disks.(8, 12,24,30,45)
PERC 6 THRESHOLD
8 12 18 24 30 450
100
200
300
400
500
600
700
800
900
1000
writere-writereadre-read
number of disks
perf
orm
ance
MB/
s
STRIPE SIZESystem setup:• Controller=perc6• PCI slots= both pci express x4 and x8• raid=r60• stripe size =(64,128,256,512,1024)KB• read=ra, write=wb• OS kernel= 2.6.20-20UL3smp• readAhead size=10240Blocks=5MB• nr_queue=512 queue_depth=128• IO_scheduler=deadlinefile system options:• su=0 , sw=0• isize=256, bsize=4096, • log=internal bsize=4096 iozone options:• filesize=32GB, ram size=16GB• record size=512KB , multiple threadsMeasure: different stripe size (64,128,256,512,1024)KB
R60 –STRIPE SIZE
1 2 3 4 5 6 7 8 9 10 11 120
200000
400000
600000
800000
1000000
1200000
1400000
1600000
1800000
2000000read
r6-128r60-64r60-128r60-256r60-512r60-1024
number of threads
perf
orm
ance
MB/
s
R60-STRIPE SIZE
1 2 3 4 5 6 7 8 9 10 11 120
100000
200000
300000
400000
500000
600000
700000
800000write
r6-128r60-64r60-128r60-256r60-512r60-1024
number of threads
perf
orm
ance
MB/
s
KERNEL TUNING
read Ahead sizeRequest queue lengthIO scheduler
READAHEAD SIZESystem setup:• Controller=perc5• PCI slots= both pci express x4 and x8• raid=r50• stripe size =128KB• read=ra, write=wb• OS kernel= 2.6.20-20UL3smp• nr_queue=128 queue_depth=128• IO_scheduler=deadlinefile system options:• su=0 , sw=0• isize=256, bsize=4096, • log=internal bsize=4096 iozone options:• filesize=32GB, ram size=16GB• record size=512KB , Measure: different readAhead size
READ
128
256 512 768 1024
2048
3072
4096
5120
6144
7168
8192
9126102
4011
26412
28813
312143
36153
60163
84174
08184
320
100
200
300
400
500
600
700
800
900
read performance with different readAhead size
read
readAhead size (blocks)
Perf
orm
ance
MB/
s
REQUEST QUEUE LENGTHSystem setup:• Controller=perc6• PCI slots= both pci express x4 and x8• raid=r60• stripe size =128KB• read=ra, write=wb• OS kernel= 2.6.20-20UL3smp • readAhead size=10240Blocks=5MB• queue_depth=128• IO_scheduler=deadlinefile system options:• su=0 , sw=0• isize=256, bsize=4096, • log=internal bsize=4096 iozone options:• filesize=32GB, ram size=16GB• record size=512KB , multiple threadsMeasure: different request queue length
READ
1 2 3 4 5 6 7 8 9 10 11 120
200000
400000
600000
800000
1000000
1200000
1400000diff nr_queue read
32641282565121024
number of threads
perf
orm
ance
MB/
s
WRITE
1 2 3 4 5 6 7 8 9 10 11 120
100000
200000
300000
400000
500000
600000
700000
800000
900000
1000000diff nr_queue write
32641282565121024
number of threads
perf
orm
ance
M
B/s
IO SCHEDULERSystem setup:• Controller=perc6• PCI slots= both pci express x4 and x8• raid=r50• stripe size =128KB• read=ra, write=wb• OS kernel= 2.6.20-20UL3smp• readAhead size=10240Blocks=5MB• nr_queue=512 queue_depth=128file system options:• su=0 , sw=0• isize=256, bsize=4096, • log=internal bsize=4096 iozone options:• filesize=32GB, ram size=16GB• record size=512KB, multiple threadsMeasure: different scheduler
READ
1 2 3 4 5 6 7 8 9 10 11 120
200000
400000
600000
800000
1000000
1200000
1400000
1600000diff IO scheduler read
anticipatecfqdeadlinenoop
number of threads
perf
orm
ance
MB/
s
WRITE
1 2 3 4 5 6 7 8 9 10 11 120
100000
200000
300000
400000
500000
600000
700000
800000
900000
1000000diff IO schedule write
anticipatecfqdeadlinenoop
number of threads
Perf
orm
ance
MB/
s
RANDOM READ
1 2 3 4 5 6 7 8 9 10 11 120
20000
40000
60000
80000
100000
120000
140000
160000
180000
200000diff IO scheduler random read
anticipatecfqdeadlinenoop
number of threads
Perf
orm
ance
MB/
s
FILESYSTEM TUNING
• inode size• su/sw size• internal/external log device
FILE SYSTMESystem setup:• Controller=perc5 • Raid=r50• PCI slots= both pci express x4 and x8• stripe size =128KB• OS kernel= 2.6.20-20UL3smp• readAhead size=10240Blocks=5MB• nr_queue=128 queue_depth=128• IO_scheduler=deadlinefile system options:• su=0 , sw=0• isize=256, bsize=4096, dd options:• filesize=10GB, ram size=320MB• record size=1MBMeasure: internal or external log device for xfs
WRITE
1 2 3 4 5 6 7 8 9 10 11 120
100
200
300
400
500
600
700
800
xfs external vs internal log device write
ex-log-10240in-log-10240
Number of threads
Perf
orm
ance
MB/
s
READ
1 2 3 4 5 6 7 8 9 10 11 120
200
400
600
800
1000
1200
1400
1600
1800
xfs external vs internal log device read
ex-log-10240in-log-10240
Number of threads
Perf
orm
ance
MB/
s
XFS INODE SIZESystem setup:• Controller=perc5 • Raid=r50• PCI slots= both pci express x4 and x8• stripe size =128KB• OS kernel= 2.6.20-20UL3smp• readAhead size=10240Blocks=5MB• nr_queue=128 queue_depth=128• IO_scheduler=deadlinefile system options:• su=0 , sw=0• bsize=4096, • Internal Log, isize=256, bsize=4096dd options:• filesize=10GB, ram size=320MB• record size=1MBMeasure: xfs inode size
XFS INODE SIZE
256 512 1024 2048 3072 40890
100
200
300
400
500
600
700
800
xfs inode size
writeread
inode size kB
perf
orm
ance
MB/
s
XFS SU/SW SIZESystem setup:• Controller=perc5 • Raid=r50• PCI slots= both pci express x4 and x8• stripe size =128KB• OS kernel= 2.6.20-20UL3smp• readAhead size=10240Blocks=5MB• nr_queue=128 queue_depth=128• IO_scheduler=deadlinefile system options:• isize=256KB, bsize=4096KB, • Internal Log, isize=256KB, bsize=4096KBiozone options:• filesize=10GB, ram size=320MB• record size=1MBMeasure: xfs sw/su size
SU /SW SIZE
128k/28 128k/14 64k/28 64k/14 32k/28 32k/14 0/00
100
200
300
400
500
600
700
800
xfs su/sw size
writeread
su/sw size
perf
orm
ance
MB/
s
OUR SETUP NOWSystem setup:• Controller=perc56• Raid=r60• PCI slots= both pci express x4 and x8• stripe size =512KB• OS kernel= 2.6.20-20UL5smpKernel options:• readAhead size=10240Blocks=5MB• nr_queue=512 queue_depth=128• IO_scheduler=deadlinefile system options:• isize=256KB, bsize=4096KB, • Internal Log, isize=256KB, bsize=4096KB
OUR PERFORMANCE NOWSingle read=670MB/sAggregate read=1500MB/s (threads>=2)Even with 40 concurrent readers, it can still
achieve 1200MB/s ..
Single write=320MB/sAggregate write=680MB/s (threads>=2)
This is not the best IO, r60 with stripe size 128KB can achieve 760MB/s of single read and single write performs almost the same. For a production system, we focus more on the aggregate performance…
ONGOING PROJECTCITI people of UM are doing: Disk-to-disk transfer over 10 GbEDeliverables• Monthly report on performance tests, server configurations, kernel tuning, and kernel
bottlenecks• Final report on performance tests, server configurations, kernel tuning, and kernel
bottlenecks
UltraLight kernelDeliverables• Tuned and tested UltraLight kernel with full feature set• Current 10GbE NIC drivers• Current storage drivers• Tuned for WAN data movement• Web100 patches• Other patches for performance, security, and stability• Release document and web page updates for UltraLight kernelhttp://www.ultralight.org/web-site/ultralight/workgroups/network/Kernel/kernel.html• Recommend sustainable options for the Ultralight kernel in the near and intermediate term
ONGOING PROJECT(CONT) QoS experiments
Deliverable• Document throughput performance with and without QoS in the face of competing traffic
MORE INFORMATIONAGLT2 IO benchmark page:https://hep.pa.msu.edu/twiki/bin/view/AGLT2/IOTestOnRaidSystemsReferences:http://www.makarevitch.com/rant/3ware/http://insights.oetiker.ch/linux/raidoptimization.html