Download - Throughput performance evaluation of the Intel® SSD … · Throughput performance evaluation of the Intel® SSD DC P3700 for NVMe on the SGI® UV™ 300 ... , [email protected]}

Transcript
Page 1: Throughput performance evaluation of the Intel® SSD … · Throughput performance evaluation of the Intel® SSD DC P3700 for NVMe on the SGI® UV™ 300 ... , paul.muzio@csi.cuny.edu}

1

ThroughputperformanceevaluationoftheIntel®SSDDCP3700forNVMeontheSGI®UV™300

NikosTrikoupis,PaulMuzio

CUNYHighPerformanceComputingCenterCityUniversityofNewYork

{nikolaos.trikoupis,[email protected]}

AbstractWefocusonmeasuringtheaggregatethroughputdeliveredby12 Intel®SSDDCP3700forNVMecardsinstalledontheSGIUV300scale-upsystemintheCityUniversityofNewYork(CUNY)HighPerformanceComputing Center (HPCC). We establish a performance baseline for a single SSD. The 12 SSDs areassembled into a single RAID-0 volumeusing Linux SoftwareRAID and the XVMVolumeManager. Theaggregate read andwrite throughput ismeasured against different configurations that include the XFSandtheGPFSfilesystems.WeshowthatforsomeconfigurationsthroughputscalesalmostlinearlywhencomparedtothesingleDCP3700baseline.

1.IntroductionFlash storage is enjoying growing popularity. Typically PCIe NVMe Flash Storage is used to bridge thelatency gap between DRAM-based main memory and disk-based non-volatile storage, particularly forapplications with random access patterns. Here, random IOPS performance of SSDs is generally thecharacteristic attracting themost attention. This is understandable, since designing a file systemwithhigh IOPS performance is a major cost driver. There are, however, still many workloads in high-performance computing (HPC) that can benefit from the improved sequential access throughputperformanceprovidedbySSDs.Forsequentialread/writes,asingleIntelP37002TBadd-in-cardcandeliver2.8GB/sforreadsand2GB/sfor sequentialwrites [1].As SGIdemonstratedat theSupercomputing2014 conference [2] theUV300scale-upx86system iscapableofhostingupto64 IntelP3700cardswithanaggregateperformanceof180GB/sofsequentialthroughput.OtherthanSGI'sSupercomputingannouncements,tothebestofourknowledgetherehavebeennootherbenchmarksdemonstratingthiskindofscalingonasinglex86Linux-basedsystemusingdifferentvolumemanagersorfilesystems.

2.MotivationandBackgroundTransferringlargedatasets inandoutofaservernodewithaDRAMconfigurationof12-TBispartofaProofofConcept(POC)projectattheCUNYHPCCinvestigatingthepotentialperformancebenefitsofPCIeNVMeFlash Storage for kdb+byKx Systems [3], apopular commercial databaseusedglobally forhighperformance,data-intensiveapplications.Thedatasetsthedatabasetypicallyoperatesagainstcangrowanywhere from a few terabytes to multiple petabytes. The majority of the workloads are highlysequential,readintensiveI/O.TheresultsdescribedonthefollowingpagesaretheinitialresultfromthisPOCproject.

Page 2: Throughput performance evaluation of the Intel® SSD … · Throughput performance evaluation of the Intel® SSD DC P3700 for NVMe on the SGI® UV™ 300 ... , paul.muzio@csi.cuny.edu}

2

2.1.HardwareArchitectureTheCUNYHPCC SGIUV 300 is named “APPEL”, in honor ofDr. KennethAppel, an alumnus ofQueensCollege/CUNY,knownforhisworkintopologyand,inparticular,provingthefourcolormaptheoremwithWolfgangHaken in1976. APPEL is amultiprocessordistributed sharedmemory (DSM) systemwith32IntelXeonIntelXeonE7-8857v23.00GHzprocessors,(384IvyBridgeCPUcores),12-TBofDDR3memory,12NVIDIAK20mGPGPUs,and2IntelXeonPHIKNC2255co-processors.AsingleLinuxkernelismanagingall devices and sharing the memory of the system. The operating system is a standard SuSE LinuxEnterpriseServer11distribution."APPEL"isaCacheCoherentNon-UniformMemoryArchitecture(ccNUMA)system.Memoryisphysicallylocated at various distances from the processors. As a result, memory access times or latencies aredifferentornon-uniform.Forexample,inthediagrambelow,ittakeslesstimeforaprocessorlocatedintheleftunittoreferenceitslocallyinstalledmemorythantoreferenceremotememoryintherightunit.SGI's interconnect, NUMAlink v7, makes it so that all of the processors can work sharing the singlememoryspaceof12-TBwithaguaranteedlatencyoflessthan500nsforanymemoryreferencefromanyprocessor[4].

Diagram1:APPEL'ssystemarchitecture

A single-rack SGI UV 300 comeswith a total of 96 PCIe 3.0 slots. For the purposes of this testing,wedistributedtwelve2-TBIntelP3700SSDcardsintheavailablex16slotsacrossthesystemunits.Itshouldbenoted,however,thattheP3700onlyrequirex4width,allowingfortheaddition,ifrequired,ofmanymoreP3700s(orotherdevices)intheUV300configuration.ThelayoutisshowninDiagram2:

Diagram2:APPEL'sPCIelayout

Page 3: Throughput performance evaluation of the Intel® SSD … · Throughput performance evaluation of the Intel® SSD DC P3700 for NVMe on the SGI® UV™ 300 ... , paul.muzio@csi.cuny.edu}

3

2.2.DriversandSoftwareNVMExpress(NVMe)driver.TheIntelP3700isthefirstgenerationofIntelSSDsbasedontheinnovativeNVMe protocol. The traditional Linux block layer had already become a bottleneck to storageperformance as the advent ofNAND Flash storage allowed systems to reach 800,000 IOPS [5]. NVMeallows for increasedparallelismbyprovidingmultiple I/Odispatchqueuesdistributedso that there isalocalqueueperNUMAnodeorprocessor.TheIntelP3700cardscansupportupto31NVMeI/Oqueuesandoneadminqueue.SGImodifiedtheNVMedrivertooptimizeitforitsUVsystemssothatthequeuesaredistributedevenlyacrossthemultipleCPUsocketsinthesystem.EachprocessmaynowhaveitsownI/Osubmissionandcompletionqueueandaninterruptforeachqueuetothestoragedevice,eliminatingthe need for remote memory accesses [6][7]. The versions of the driver are: sgi-nvme-kmp-default-1.0.0_3.0.76_0.11-sgi713a1.sles11sp3.x86_64.rpmandsgi-nvme-1.0.0-sgi713a1.sles11sp3.x86_64.rpm.Linux MD driver with Intel's RSTe extensions. We use the Multiple Device Driver, known as LinuxSoftware RAIDwith Intel RSTe (IMSMmetadata container), to create a striped RAID 0 device from alltwelveSSDs[8]. Besides itsownformatsforRAIDvolumemetadata,LinuxsoftwareRAIDalsosupportsexternalmetadataformats,suchasIntel’sRapidStorageTechnologyEnterprise(RSTe)extensions(IMSMmetadata container). The latest version of the mdadm userspace utility was downloaded fromhttp://git.neil.brown.name/?p=mdadm.git.XVMVolumeManager.SGI'sXVMisaNUMA-awarevolumemanageroptimizedfortheUV300system.SimilartoLinuxMD-RAID,weuseXVMtocreatealogicalvolumethatisstripedacrossalltwelveSSDs.XFSFilesystem.XFSisapopularhigh-performancejournalingfilesystem,whichiscapableofhandlingfilesystemsaslargeas9millionterabytes[9],andisusedatourHPCcenter.GPFSfilesystem.TheGeneralParallelFileSystem,recentlyrebrandedasSpectrumScale,andalsoinuseattheCUNYHPCCenter,isahigh-performanceparallelfilesystemdevelopedbyIBM.Ratherthanrelyingonstripinginaseparatevolumemanagerlayer,GPFSimplementsstripingatthefilesystemlevel[10].

3.MethodologyBecausethedatabaseinourPOCproject,KxSystems’,kdb+,demonstrateshighlysequentialreadI/Oforhistoric and real-timedata, weoptimize andmeasure the throughputperformanceof the Intel P3700SSDsunderavarietyofvolumemanagerandfilesystemconfigurations.Forthispurpose,weusethewell-knownIORbenchmark[11]developedbyLawrenceLivermoreNationalLaboratory.IOR, is anMPI-coordinated benchmark.We compile and execute it using theMessage Passing Toolkit(MPT)whichisSGI'sMPIimplementation.ThemostsignificantIORoptionsusedinourtestsare:

• -aPOSIX usethePOSIXAPIforI/O.• -B usedirectI/O,bypassingbuffers.• -ttransferSize sizeinbytesofasingleI/Otransactiontransferingdatafrommemorytothe

datafile.• -s segmentCount controlsthetotalamountofthesizeofthedatafile.• -N numTasks numberofMPIthreadsparticipatinginthetest.• -F filePerProc leteachthreadwritetotheirowndatafile.• -odirectories distributethedatafilesevenlyonthespecifieddirectories.Beforeallthetests,allSSDcardsweresecurelyerasedusingblkdiscard -v /dev/${DEV}.Eachtestwasruntwice.ThefirstsetofnumberswerediscardedinanefforttoapproachtheSSD'ssteadystate.Thesecondsetofnumberswererecordedandarebeingpresentedhere.

Page 4: Throughput performance evaluation of the Intel® SSD … · Throughput performance evaluation of the Intel® SSD DC P3700 for NVMe on the SGI® UV™ 300 ... , paul.muzio@csi.cuny.edu}

4

Direct I/O (O_DIRECT flag) is commonly used with SSDs to boost bulk I/O operations by allowing theapplicationstowritedirectlytothestoragedevice,bypassingthekernelpagecache.Theusageoftheflaginproductionsystemsiscomplicated,becauseitrequiresasynchronouswriteswithalargequeuedepthandisalsocontroversial[12].However,consideringtheamountof12TBmemoryavailableinoursystem,wemadeaconsciousdecisiontouse itduringthesetests inanattempttokeepthethroughputresultsfrombeingobscuredduetocachingeffects.To establish a baseline,we first do a number of IOR test runswith a varying number of IOR threadsagainst a single P3700 formatted with XFS, and mounted with the noatime and nobarrier options toremove some unnecessary file system overhead. Common practice in measuring throughput is to uselargetransfersizes, typicallystarting from1MBandgoing forwardtowards4,8,or16MB.Wedecidedthatourbaselinewillbethebestreadandwritethroughputresultsobservedfortransfersizesof1MB,andthesearethenumberswearerecording.Next,weformateachof thetwelve installedP3700SSDswithXFSandmountthemonthesystem.Wecompletea seriesof readandwrite testsusing transfer sizesof1,4,8and16MB.Werepeat the testusing33,66,99,132,and264threadstofindtheminimumamountofserverthreadsrequiredtosaturatetheSSDs.UsingtheP3700cardsasindividualfilesystems,however,hasgenerallylimitedvalue.ItwouldbemoreinterestingtoassembletheminaRAIDdeviceandaggregatetheirstoragecapacityunderonefilesystem.For this, we use two different methods to create the single volume. One is with SGI's XVM VolumeManager. The other one is with Linux Software RAID (MD-RAID) with Intel Rapid Storage TechnologyEnterprise extension (IMSM metadata container). In both cases we create a striped, RAID-0 volume,withoutbeingconcernedaboutredundancyormirroring.WeformatthevolumewiththeXFSfilesystem.Asbefore,wecompleteaseriesofreadandwritetestswithtransfersizesof1to16MBandwerepeatusing33,66,99,132,and264threads.Finally, we format a single GPFS file system configuring each of the raw /dev/nvme* devices as NSDdrives.An importantconsiderationwhencreatingaGPFS filesystem is theselectionof theappropriateblocksize,avariablesetat format time. In theGPFSversion3.5.0.26 thatweuse for this test,allowedblocksizesarebetween64KBand8MB.WhenoptimizingaGPFSfilesystemforthroughput,largerblocksizesarepreferred,althoughthistendstowastestoragespace,particularlywhensmallfilesarestoredonthefilesystem.Werepeatedthesamecollectionoftestsasbefore,fortwoGPFSinstances,onewithablocksizeof1MBandonewith4MB.

3.1SystemTuningNVMe Queue Distribution. The SGI-modified NVMe driver calculates the optimal queue distributionacross all available processor sockets so that each socket gets at least one queue and generates CPUinterruptaffinity[6].SGI’sIRQbalancer,sgi_irqbalance,currentlyhasnoawarenessofthisdriversoitwasdisabled.Afterrebootingthesystem,thefollowingscriptwasusedtosettheaffinityinaccordancewiththedriverhintsforallIntelNVMedevicesinthesystem:# find /proc/irq/*/nvme* | cut -d/ -f4 | xargs -I '{}' \ sh -c "cat /proc/irq/{}/affinity_hint > /proc/irq/{}/smp_affinity"

C1Epowersettings.Topreventinactivesocketsenteringpowersavingmodeandimpactingmeasuredperformance,wedisabledC1EstateonallCPUsonthesystemusingthefollowingscript:# for p in $(seq $(sed 's/-/ /' /sys/devices/system/cpu/online)); \

Page 5: Throughput performance evaluation of the Intel® SSD … · Throughput performance evaluation of the Intel® SSD DC P3700 for NVMe on the SGI® UV™ 300 ... , paul.muzio@csi.cuny.edu}

5

do wrmsr -p $p 0x1fc 0x35040041; done HyperthreadingisdisabledonthisUV300system.

4.TestresultsTheconfigurationofthetestsandtheirresultsarepresentedbelow.

4.1Test1:Baseline-OneIntelP3700cardformattedwithXFSConfigurationAsingleIntelP3700cardisformattedandmountedasfollows:# mkfs.xfs -f -K -d su=128k,sw=1 /dev/nvme0n1 # mount -t xfs /dev/nvme0n1 /p3700/0 Thecommandlineusedis:$ /scratch/nikos/IOR/src/C/IOR -a POSIX -B -e -t 1m -b 1m -s 10000 -N 6 -C –FResults• Writeresult:2.05GB/s,achievedusing6threadsandIORtransfersizeof1MB.• Readresult:2.78GB/sachievedusing6threadsandIORtransfersizeof1MB.Duringthistest,iostatreportscloseto100%utilizationforthe/dev/nvme*device.However,itshouldbestressed that IOR is a synthetic, best-case-scenariobenchmark; the realworld applicationperformancemaybeless.

4.2Test2:TwelveIntelP3700cards,oneXFSfilesystempercardConfiguration12 Intel P3700 cards are formatted as in Test 1 andmounted under 12 separatemount points below/p3700:# mount -t xfs /dev/nvme0n1 /p3700/0# mount -t xfs /dev/nvme11n1 /p3700/11 IORisrunusing33to264threads.WearetestingwithIORtransfersizesfrom1MBto16MB.Duringeachrun,IORiswritingonallIntelcardssimultaneously.ExamplecommandIORlineused:$ mpiexec_mpt /scratch/nikos/IOR/src/C/IOR -a POSIX -B -e -t 1m -b 1m \ -s 10000 -N 132 -C -F -k -o 0/0@1/1@2/2@3/3@4/4@5/5@6/6@7/7@8/8@9/9@10/10@11/11 Results• Bestwriteresult:24.1GB/s,usingatleast132threadsandtransfersizeof1MB.• Bestreadresult:32.9GB/s,usingatlest132threadsandtransfersizeof1MB.Duringthetest,iostatreportedcloseto100%utilizationfortheNVMedevices.

Page 6: Throughput performance evaluation of the Intel® SSD … · Throughput performance evaluation of the Intel® SSD DC P3700 for NVMe on the SGI® UV™ 300 ... , paul.muzio@csi.cuny.edu}

6

Test 2.With33 IOR threads, 1MB reads showan81% scalability and 1 MB writes are at 90%scalability.With132IORthreads,thecomparablepercentagesareeachcloseto100%. This isduetothefactthatthereisnoadditionaloverheadonthe P3700 cards over the single-card, single-filesystem setup. This is important in the context ofour POC project, since our application can beconfiguredwithmultipleI/Oprocesses,eachusinga separate physical directory for its own dataset.This setup gave the best overall performanceamong all our tests. Note that the results of allfour write tests are shown in the chart and soclosely overlap in performance that the distinctlinesarenoteasilyvisible.

4.3Test3:TwelveIntelP3700cards,oneGPFSfilesystemwith1MblocksizeConfigurationFor this test,GPFS version v3.5.0.26 is set up as a single file systemwith eachof the raw /dev/nvme*devicesasNSDdrives.Itiscreatedusing1Masblocksizeandmountedasfollows:# mmcrfs p3700 -F p3700.lst –B 1M -v no -n 32 -j scatter -T/global/p3700 -A no # mmmount /global/p3700 -o dio,nomtime,noatime TestsareruninsidetheNSDserver,theUV300itself.Noremoteclientsareinvolvedinanytest.IORisrunusing33to264threadswithIORtransfersizesfrom1MBto16MB.ExampleIORcommandlineused:$ mpiexec_mpt /scratch/nikos/IOR/src/C/IOR -a POSIX -B -e -t 1m -b 1m \ -s 1000 –N 33 -C -F –k Results• Best read result: 31.6 GB/s, with 66 threads and transfer size of 16 MB, although very similar

numberswereobservedusing1MBoftransfersize.• Bestwriteresult:18.8GB/s,using33threadsandtransfersizeof16MB.

Test 3. Although read performance using GPFSwas comparable to the results from Test 2 usingXFSwith 95% scalability at 66 IOR threads,writeperformance was decidedly not, with only 31%scalability.WeexpectthatnewerversionsofGPFSwith tuning for NVMe drives will show betterperformanceonwrites.

Page 7: Throughput performance evaluation of the Intel® SSD … · Throughput performance evaluation of the Intel® SSD DC P3700 for NVMe on the SGI® UV™ 300 ... , paul.muzio@csi.cuny.edu}

7

4.4Test4:TwelveIntelDCP3700cards,oneGPFSfilesystemwith4MblocksizeConfigurationGPFS version v3.5.0.26 is set upwith each of the raw /dev/nvme* devices asNSD drives. It is createdusing4Masblocksizeandmountedasfollows:# mmcrfs p3700 -F p3700.lst –B 4M -v no -n 32 -j scatter -T/global/p3700 -A no # mmmount /global/p3700 -o dio,nomtime,noatime No remoteclientswere involved. IOR is runusing33 to264 threadsand testedwith IOR transfer sizesfrom1MBto16MB.Examplecommandlineused:$ mpiexec_mpt /scratch/nikos/IOR/src/C/IOR -a POSIX -B -e -t 1m -b 1m -s 1000 –N 66 -C -F –k Results• Best read result: 31.3GB/s, using 99 threads and IOR transfer size of 16MB (although very similar

numberswereachievedusing1MBtransfersize).• Bestwriteresult:22.4GB/s,using66threadsandIORtransfersizeof16MB.

Test4.Formattingusingalarger,4MBGPFSblocksize, improved performance for large sequentialwritescomparedtoTest3,but results still laggedcompared to Test 2 with XFS. Again, we expectthatnewerversionsofGPFSwithtuningforNVMedrives will show better performance on writes.Readperformanceremainedexcellent forasinglefilesystemspreadover12NVMecards.

4.5Test5a:TwelveIntelP3700cards,MD-RAID,oneXFSfilesystemConfigurationInthistest,all/dev/nvme*devicesareassembledinaRAID-0arrayusingLinuxSoftwareRAID(MD-RAID).Ontopofthearray,anXFSfilesystemislaidout,usingthedefaultformattingoptions:# mdadm --create /dev/md0 --chunk=128 --level=0 --raid-devices=12 /dev/nvme0n1 /dev/nvme1n1 /dev/nvme2n1 /dev/nvme3n1 /dev/nvme4n1 /dev/nvme5n1 /dev/nvme6n1 /dev/nvme7n1 /dev/nvme8n1 /dev/nvme9n1 /dev/nvme10n1 /dev/nvme11n1 # mkfs.xfs -k /dev/md0 # mount -o noatime,nodiratime,nobarrier /dev/md0 /scratch_ssd/ IORisrunusing33to264threads,andoneachroundwetestwithIORtransfersizesfrom1MBto16MB.WeuseIORwiththePOSIXAPI,onefile-per-process,doingsequentialwrites.Examplecommandline:$ mpiexec_mpt /scratch/nikos/IOR/src/C/IOR -a POSIX -B -e -t 1m -b 1m -s 10000 -N 66 -C -F -k

Page 8: Throughput performance evaluation of the Intel® SSD … · Throughput performance evaluation of the Intel® SSD DC P3700 for NVMe on the SGI® UV™ 300 ... , paul.muzio@csi.cuny.edu}

8

Results• Bestreadresult:29.8GB/s,using66threadsandtransfersizeof4MB.• Bestwriteresult:23.9GB/s,using66threadsandtransfersizeof8MB.Duringthetest,iostatreportscloseto100%utilizationforallNVMedevices.

Test 5a. This setup using 1 XFS file system overMD-RAID gave the excellent and the mostconsistent performance among the three priorscenarios.

4.6Test5b:TwelveIntelP3700cards,MD-RAIDwithRSTe,oneXFSfilesystemConfigurationThis test is very similar to Test 5a,with thedifference thatweare creating the volumeusing the IntelStorage Technology Enterprise RAID metadata format, which enables Intel RSTe features [13], such ascreatingmultiplevolumeswithinthesamearray.Also,insteadofformattingtheraw/dev/md0device,wepartitionitfirst.Weusethefollowingcommandstosetthearray,partition,formatandmountit:# mdadm -C /dev/md/imsm /dev/nvme0n1 /dev/nvme1n1 /dev/nvme2n1 /dev/nvme3n1 /dev/nvme4n1 /dev/nvme5n1 /dev/nvme6n1 /dev/nvme7n1 /dev/nvme8n1 /dev/nvme9n1 /dev/nvme10n1 /dev/nvme11n1 -n 12 -e imsm -f # mdadm -C /dev/md0 /dev/md/imsm -n 12 -l 0 -f # parted /dev/md0 mklabel gpt # parted /dev/md0 mkpart primary 2097152B 100% # parted /dev/md0 align-check opt 1 # mkfs.xfs -f -K /dev/md0p1 # mount -o noatime,nodiratime,nobarrier /dev/md0p1 /scratch_ssd/ Asbefore,IORisrunusing33to264threads,andoneachroundwetestwithIORtransfersizesnetween1MBto16MBResults• Bestreadresult:29.9GB/susing66threadsandtransfersizeof4MB.• Bestwriteresult:24.2GB/susing66threadsandtransfersizeof8MB.Duringthetest,iostatreportscloseto100%utilizationforallNVMedevices.

Page 9: Throughput performance evaluation of the Intel® SSD … · Throughput performance evaluation of the Intel® SSD DC P3700 for NVMe on the SGI® UV™ 300 ... , paul.muzio@csi.cuny.edu}

9

Test 5b. This setup using 1 XFS file system overMD-RAID/IMSM, gave excellent performance,marginally better to that of MD-RAID, but theperformancewasslightlylessconsistentacrossthenumberofIORthreads.

4.7Test6:TwelveIntelDCP3700cards,XVMvolumemanager,oneXFSfilesystemConfigurationForthistestwe install the latestXVM3.4binariesfromSGI.All/dev/nvme*devicesareassembled inabasicstripedXVMvolumeusingSGI'sxvmgr.Ontopofthevolume,anXFSfilesystemwaslaidout,usingdefaultformattingoptions.IORisrunusing33to264threads,andoneachroundwetestwithIORtransfersizesbetween1MBand16MB. ResultsBestreadresult:28.9GB/s,using66threadsandtransfersizeof16MB.Bestwriteresult:23.8GB/s,using66threadsandtransfersizeof8MB.

Test 6. The results for this setup were similar toTests 5a and 5b (XFS over MD-RAID), with thelatter two giving marginally better and moreconsistentperformance.

Page 10: Throughput performance evaluation of the Intel® SSD … · Throughput performance evaluation of the Intel® SSD DC P3700 for NVMe on the SGI® UV™ 300 ... , paul.muzio@csi.cuny.edu}

10

6.ConclusionWeevaluatedthesequentialthroughputperformanceofacollectionof12IntelP3700NVMeFlashcardsinstalledona single SGIUV300 system. Although theUV300 can supportmanymore SSD cards, oursystemhasonly12installed.ThesecardsaretypicallyinstalledinHPCenvironmentsaimingforhighIOPS.We found that they offer excellent throughput performance, slightly better than the publishedspecifications,forsequentialreadsandwrites.WetestedtheDCP3700SSDsusingdifferentfilesystemsandvolumemanagers.Throughputdisplayedexcellent scalarperformancecompared to the single cardbaselinenumberswithverylittleoverhead.Theconfiguration thatgave thebestoverallperformanceamongallour tests,32.9GB/s readand24.1GB/swrite,isinTest2wherewedonotputasinglevolumeontopofalltheSSDs,butwhenweusethemformatted and mounted as individual file systems. This is acceptable for our POC project, since ourapplicationcanbeconfiguredwithmultiple I/Oprocesses,eachusingaseparatephysicaldirectorywithitsowndataset.Following closely are the results in Test 5b,wherewe stripe the SSDs using Linux Software RAIDwithIntel's IMSMextensions and achieving a throughput of 29.9MB/s read and 24.2MB/swrite. Althoughwritethroughputisthesameasinthebest-casescenariowheretheSSDsareindividuallyformattedandmounted,thereseemstobea9.1%taxinreadthroughput.Webelievethemostprobablereasonforthis,itthatthe/dev/md0virtualdeviceinMD-RAIDstillsuffersfromtheclassicarchitecturallimitationintheLinuxblocklayerofasinglesubmission-completionqueueforblockdevices,eveniftheunderlyingNVMedevicessupportthenewblock-multiqueuearchitecture.

7.AcknowledgementsTheauthorswanttothankthefollowingpeoplefortheirsupportandtheirhelpinthisproject:FromIntel:ChrisAllison,MelanieFekete,AndreyKudryavtsev,CyndiPeach.FromSiliconGraphicsInternational:JamesHooks,JohnKichury,KirillMalkin.

Page 11: Throughput performance evaluation of the Intel® SSD … · Throughput performance evaluation of the Intel® SSD DC P3700 for NVMe on the SGI® UV™ 300 ... , paul.muzio@csi.cuny.edu}

11

8.References[1]IntelSolidStateDriveDCP3700Series-ProductSpecificationshttp://www.intel.com/content/www/us/en/solid-state-drives/ssd-dc-p3700-spec.html[2]HPCbodSGIracksUVbrains,reaches30MEEELLIONIOPShttp://www.theregister.co.uk/2014/11/17/sgi_uv_reaching_30_million_iops_with_nvme_flashers/[3]kdb+databasehttps://kx.com[4]SGI®UV™300HforSAPHANAhttps://www.sgi.com/pdfs/4554.pdf[5]Bjørling,Axboe,Nellans,Bonnet:LinuxBlockIO:IntroducingMulti-queueSSDAccessonMulti-coreSystemshttp://kernel.dk/systor13-final18.pdf[6]Malkin,Patel,Higdon:DeployingIntelDCP3700FlashonSGIUVSystems-BestPracticesforEarlyAccess(SGIProprietary)[7]Malkin:DeliveringPerformanceofModernStorageHardwaretoApplicationshttp://blog.sgi.com/delivering-performance-of-modern-storage-hardware-to-applications[8]Kudryavtsev,Bybin:Hands-onLab:HowtoUnleashYourStoragePerformancebyUsingNVMExpressBasedPCIExpressSolid-StateDrives-IntelDeveloperForum2015http://www.slideshare.net/LarryCover/handson-lab-how-to-unleash-your-storage-performance-by-using-nvm-express-based-pci-express-solidstate-drives[9]XFSFAQ,http://xfs.org/index.php/XFS_FAQ[10]GPFS3.5Concepts,Planning,andInstallationGuidehttp://www-01.ibm.com/support/docview.wss?uid=pub1ga76041305[11]IORbenchmark,https://github.com/chaos/ior[12]Torvalds,EmailforumexchangesonO_DIRECThttps://lkml.org/lkml/2007/1/10/233http://yarchive.net/comp/linux/o_direct.html[13]IntelNVMeSSDsandIntelRSTeforLinuxhttp://www.intel.com/content/dam/support/us/en/documents/solid-state-drives/Quick_Start_RSTe_NVMe_for%20Linux.pdf