Post on 04-Jul-2020
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|
AchievingMemoryLevelPerformance:SecretsBeyondSharedFlash
Kothanda (Kodi)UmamageswaranVicePresident,ExadataDevelopment
GurmeetGoindiExadataProductManagement
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|
SafeHarborStatementThefollowingisintendedtooutlineourgeneralproductdirection.Itisintendedforinformationpurposesonly,andmaynotbeincorporatedintoanycontract.Itisnotacommitmenttodeliveranymaterial,code,orfunctionality,andshouldnotberelieduponinmakingpurchasingdecisions.Thedevelopment,release,andtimingofanyfeaturesorfunctionalitydescribedforOracle’sproductsremainsatthesolediscretionofOracle.
2
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|
DidYouMisstheStorageRevolution?
• Incumbentstoragevendorshavedecadesoldinvestmentinlegacyprotocolskeepingthemfromadoptingnewtechnologies
• PCIe FlashwithNVMe interfaceisanewinterfacethatrealizesfullflashpotential
• PCIe/NVMe storagearchitecturesareordersofmagnitudefasterthanwhatyouprobablyusetoday
• AvailablenowwithOracleExadataandSuperCluster
GoodChanceYourStorageVendorDidToo
2014ConventionalStorageEra
ModernFlashEra
StoragePe
rforman
ce
SCSISAS
StorageVendors
PCIeNVMe
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|
SolidStateMediaisVeryDifferentThanSpinningDisk• ComparedtoSpinningDisk,Flash– Ismanyordersofmagnitudefaster– Hasmanyordersofmagnitudehigherbandwidth– Hasextremelylowlatency– Haswearingissuesasitages,buttechnologyiscatchingup– Isexpensive,butthepricegapisshrinking
• EverystoragevendorhassomeflashbasedsolutionforyourDatabase
4
Q:Willmydatabaserealizethefullbenefitofflashtechnology?A:Itwilldependonhowfastyoucanmovethedatafromtheflashtothedatabase
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|
SCSIAccessModel• SCSIwasdesignedfortapesandHDDs• HDDsaresequential whereasFlashdevicesaremassivelyparallel• TraditionalIOstackisoptimizedforspinningmedia– 512Byteblocksizetransfers– Flashanddatabasesdo4KB/8KBIOs
• UsinglegacyinterfaceslikeSCSIfundamentallybottlenecksflashdrives
5
CPU HBA SCSI
8KBIO
512B
4 KBIO512B
512B
SCSI
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|
PCIExpressVs SASConnectivity• PCIExpressisordersofmagnitudefasterthanSAS,andisgettingfaster• PCIExpresshasthesamecharacteristicsasFlash– HighThroughput– LowLatency
• UsinglegacyinterconnectslikeSASfundamentallybottlenecksflashdrives
6
0.61.2
4
8
SAS6Gbps SAS12Gbps PCIe3.0x4 PCIe3.0x8
ThrougputGB/s
PCIe has13xthroughputofSAS
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|
PCIExpressFlashwithNVMe Interface• NonVolatileMemoryExpressisabrandnewgroundsupinterfacedesignedforflash• NVMe isinherentlyparallel• NVMe providesnativeatomicIOsizeaffinity fordatabases• NVMe IOstackmassivelyreduces CPUutilizationandlatency
7
PCIExpressFlashwithNVMe InterfaceistherightchoiceforyourDatabase
CPUPCIe NVMe
2.2xNVMe is2.2x
FasterthanSCSI
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|
ExadataisLeadingNVMe Adoption
8
ThousandsofExadataandSuperCluster systemsshippedwithNVMe Flashsince2014
2014 2016
1st NVMe DrivebySamsung
2015
1st NVMe DrivebyIntel
ExadataX5-2Industry’sFristEnterpriseSystem
withNVMe
EMCAnnouncesDSSDD5withNVMeExadataCloud
ServiceusesNVMe inPublic
Cloud
ExadataX6-22ndGeneration
NVMe with3DVNAND
FacebooklaunchesLightningbasedon
NVMe
2017
EMCClosesDSSDdivision
ExadataX73rdGeneration
NVMe
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|
SharedStorageHasManyAdvantagesoverLocalStorage
• Muchbetterspaceutilization• Muchbettersecurity,management,reliability• EnablesDBconsolidation,DBhighavailability,RACscale-out• Sharesstorageperformance– Aggregateperformanceofsharedstoragecanbedynamicallyusedbyanyserverthatneedsit
9
Servers
SharedStorage
SAN/LAN
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|
NVMe PCI-eFlashDisruptstheStorageArrayModel
|OracleConfidential– HighlyRestricted 10
LatestPCIeFlash5.5GB/sec
SANLink=40Gb5GB/sec
Lessthan1Flashcard
LeadingAllFlashArray38GB/sec
Lessthan5Flashcard
Newimprovementsarecausing100Xbottlenecksacrosssharedstoragestack
ArrayHeads
CPU
All-FlashStorageArrayIOPath: manysteps,eachaddslatency andcreatesbottlenecks
SAS/SATA PCIeFlashChips
Switches
SAN/LAN
SSDCtrl
HostHBA
SAN/LAN
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|
ExadataAchievesMemoryPerformancewithSharedFlash
• ExadataX7delivers350GB/secflashbandwidthtoany server– Approaches800GB/secaggregateDRAMbandwidthofDBservers
• Mustmovecomputetodatatoachievefullflashpotential– Requiresowningfullstack,can’tbesolvedinstoragealone
• Fundamentally,StorageArrayscanshareflashcapacity butnotflashperformance– Evenwithnextgenscale-out,PCIe networks,orNVMe overfabric
• Sharedstoragewithmemorylevelbandwidthisaparadigmchangeintheindustry– GetnearDRAMthroughput,withthecapacityofsharedflash
11
ExadataDBServers
ExadataSmartStorage
InfiniBand
CPUPCIe NVMeFlashChips
QueryOffload
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|
GettingMemoryperformancewithSharedFlashusingSmartSoftware
12
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|
Oracle’sInfrastructureInnovationsinFlash
• OracleExadata V2:Firsttobringflashstoragetothedatabasemarket
• OracleExadata X3:Doubledflashcapacity
• OracleExadata X4:100GB/sthroughputscansinasinglerack
• OracleExadata X5:LowestlatencyNVMe andincreasesscansto263GB/s
• OracleExadata X5:Hot-pluggableNVMe serverforthedatabase
• OracleLinux:FirstLinuxvendorwithproductionNVMe drivers
• OracleExadataX6:Highestthroughputover300GB/s, over5MillionIOPs
• OracleExadataX7:Highestthroughputover350GB/s,nearly6MillionIOPs
13
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|
Oracle’sSoftwareInnovationsinFlash
• ExadataSmartFlashCache• ExadataSmartFlashLog• ExadataSmartFlashCacheScanAwareness• ExadataSmartFileInitialization• ExadataColumnarFlashCache• ExadataFlashCacheSpaceResourceManagement• SmartwriteburstandtempIOinFlashCache
14
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|
ExadataSmartFlashCache– CompletelyAutomatic• UnderstandsdifferenttypesofI/Os fromdatabase– SkipscachingI/Os tobackups,datapumpI/O,archivelogs,tablespace formatting– CachesControlFileReadsandWrites,fileheaders,dataandindexblocks– Enablesmorespaceforrelevantuserdata
• Immediatelyadaptstochangingworkloads
• Write-backflashcache– Cacheswritesfromthedatabasenotjustreads
• Doesn’tneedtomirrorinflashforreadintensiveworkloads– Flasharraysstorebothmirrorcopiesalwaysinflashincreasingyourcost
• SmartScanscanrunatthethroughputofflashdrives– FlasharraysneedlotsofserverswithlotsofprocessesandstillcannotmatchSmartScan
throughputofsinglequery
• Providesperformanceofflashatcostofdisk
15
1.7PBDISK
360TBPCIFLASH
12TBDRAM
ColdData
HottestData
ActiveData
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|
Exadata SmartFlashLog • OutliersinlogIOslowdownlotsofclients
• Outliersfromanyonecopyofmirrorslowdownalltheforegrounds– Databasewaittimegoesupby#foregrounds*Stalltime– Backlogdoesn’tclearimmediatelylikeanaccidentonthefreewayandincreases“logfilesync”waits
• Performancecriticalalgorithmslikespacemanagementandindexsplitsaresensitivetologwritelatency
• LegacystorageIOcannotdifferentiateredologIOfromothers
• UPSprotectedcacheintraditionalstorageseemstoworkinitiallyuntilthecacheisoverwhelmedbyotherwrites– Measurelogfilelatencywithfullbackuporadataloadrunning
16
LogBuffer
client
foreground
client
foreground
client
foreground
LogWriter
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|
ExadataSmartFlashLog– CompletelyAutomatic
• SmartFlashLogusesflashasaparallelwritecachetodiskcontrollercache
• Whicheverwritecompletesfirstwins(diskorflash)
• Reducesresponsetimeandoutliers– “logfileparallelwrite”histogramimproves– Greatlyimproves“logfilesync”
• Usesalmostnoflashcapacity(<0.1%)
• NetworkresourcemanagementprovidespriorityforredologI/Os acrossthenetwork
• OLTPworkloadstransparentlyacceleratedandprovidepredictableresponsetimes
17
SmartLogging- Off SmartLogging- On
NoOutliers
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|
ExadataColumnarFlashCache– CompletelyAutomatic
• HybridColumnarCompressionbalancesneedforOLTPandAnalytics• AsCPUsgetfasterwantevenfasterscans• SmartFlashCacheautomaticallytransformsblocksfromhybridcolumnartopurecolumnarforanalyticsduringflashcache population• Dualformatrepresentationforsinglerowlookups• Onlyselectedcolumnsreadfromflashduringaquery• Upto 5x queryspeedup
selectcolumnA fromtablewhere…
CompressionUnits Columns
FlashCachePopulation
18
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|
FlashCacheSpaceResourceManagement• FlashCacheisasharedresource
• DatabaseasaServicecreatesneedforefficientresourcesharing
• Specifyminimum(flashCacheMin)andmaximum(flashCacheLimit)sizes,orfixedallocations(flashCacheSize),adatabasecanuseintheflashcache
ALTER IORMPLAN -
dbplan=((name=sales, flashCacheSize=100G), -
(name=finance,flashCacheLimit=100G, flashCacheMin=20G), -
(name=schain, flashCacheSize=200G))
• Containerdatabaseresourcespecifiedatthestorage
• Pluggabledatabasecontainerresourcelimitsexpressedaspercentagesinthecontainerdatabase
• DatabaseandPluggabledatabaseI/OresourcemanagementisuniquetoExadata
• Predictableperformancefordatabasequeries– nomorenoisyneighbor
FINANCE
SUPPLYCHAIN
SALES
19
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|
WriteburstsandtempIOinflashcache– CompletelyAutomatic• Writethroughputoffourflashcardshasbecomegreaterthanthewritethroughputof12-disks
• Whendatabasewritethroughputexceedsthethroughputofdisks,smartflashcacheintelligentlycacheswrites– Schemachangesduringapplicationupgradesrewriteentiretablesinsome
packagedapplications– Largedatabaseconsolidationscanhavewriteburstsatthesametime
• WhenquerieswritealotoftempIOanditisbottleneckedondisk,smartflashcacheintelligentlycachestempIO– Writestoflashfortempspillreduceselapsedtime– Readsfromflashfortempreduceselapsedtimefurther
• SmarttoprioritizeOLTPdataanddoesnotremovehotOLTPlinesfromthecache
• Smartflashwearmanagementforlargewrites
• Muchfasterscansanddiskwrites
20
WriteBurts andTempIOin
FlashCache
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|
Exadata SmartFlashBenefits• AutomaticDatabaseAwareFlashCache• SmartFlashLoggingavoidsredologoutliersautomatically• SmartFlashCacheScanprovidessubsetscanningandistablescanresistant• SmartFileInitializationcreatesafilebywritingmeta-datatoflashcache• SmartColumnarFlashCacheextendscolumnarbenefittostorageautomatically• SmartFlashCacheSpaceResourceManagementprovidesgranularcontrol• SmartwriteburstandtempIOinFlashCache
21
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|
TheNextBigThing:
In-MemoryPerformanceinStorage
22
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|
Analytics:ExadataBringsIn-MemoryAnalyticstoStorage
In-MemoryColumnarscans
In-FlashColumnarscans
• WithExadata Flashthroughputapproachingmemorythroughput,SQLbottleneckmovesfromI/OtoCPU
• ExadataautomaticallytransformstabledataintoIn-MemoryDBcolumnarformatsinExadataFlashcache– Enablesfastvectorprocessingforstorageserverqueries
• Uniquely optimizesnextgenerationFlashasmemory– WorksforbothrowformatOLTPdatabases,andHybridColumnarCompressedAnalyticsdatabases
23
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|
12.8TBFlash
In-MemoryColumnarFormatsinFlashCache(12.2.1.1.0)
24
3- 4xOverallAnalyticsPerformanceImprovement
Upto1.5TBDRAM
SGA
IMC
25.6TBFlashx3 =76.8TB(ormore)IMC(In-MemoryColumnar)data
DatabaseServer
In-MemoryColumnarscans
In-FlashColumnarscans
HCC/OLTPcompressed/UncompressedData
StorageServer
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|
OLTP: ExadataBringsIn-MemoryOLTPtoStorage
• ExadataStorageServersaddamemorycacheinfrontofFlashmemory– SimilartocurrentFlashcacheinfrontofdisk
• Cacheisadditive withcacheatDatabaseServer– OnlypossiblebecauseoftightintegrationwithDatabase
• 2.5xLowerlatencyforOLTPIO– 100usec
• Upto21TBofDRAMforOLTPaccelerationwithMemoryUpgradeKit– Compareto5TBofflashinV2Exadata
25
ComputeServer
StorageServer
Hot
Warm
Cold
Flash
DRAM
Disk
Copyright©2015, Oracleand/oritsaffiliates.Allrightsreserved.|
In-MemoryOLTPAcceleration– JourneyofaDatabaseBlock
DBBufferCache
In-MemoryOLTPCache
FlashCache
HardDiskDrive
OracleConfidential– Internal 26
DBBufferCache
In-MemoryOLTPCache
FlashCache
HardDiskDrive
1.DBreadsablock
ExadataServestheBlockfromStorageDatainitiallyresidesonharddisk
DatabaseServer
StorageServer
Copyright©2015, Oracleand/oritsaffiliates.Allrightsreserved.|
In-MemoryOLTPAcceleration– JourneyofaDatabaseBlock
DBBufferCache
In-MemoryOLTPCache
FlashCache
HardDiskDrive
OracleConfidential– Internal 27
DBBufferCache
In-MemoryOLTPCache
FlashCache
HardDiskDrive
2.FlashCacheGetsPopulated
DatabaseServer
StorageServer
Copyright©2015, Oracleand/oritsaffiliates.Allrightsreserved.|
In-MemoryOLTPAcceleration– JourneyofaDatabaseBlock
DBBufferCache
In-MemoryOLTPCache
FlashCache
HardDiskDrive
OracleConfidential– Internal 28
DBBufferCache
In-MemoryOLTPCache
FlashCache
HardDiskDrive
3.Databaseevictstheblock
ExadataCachestheblockinIn-MemoryOLTPCache
DatabaseServer
StorageServer
Copyright©2015, Oracleand/oritsaffiliates.Allrightsreserved.|
In-MemoryOLTPAcceleration– JourneyofaDatabaseBlock
DBBufferCache
In-MemoryOLTPCache
FlashCache
HardDiskDrive
OracleConfidential– Internal 29
DBBufferCache
In-MemoryOLTPCache
FlashCache
HardDiskDrive
4.Databasereadsthesameblockagain
ExadataservestheblockfromIn-MemoryOLTPCachewith100uslatency
DatabaseServer
StorageServer
Copyright©2015, Oracleand/oritsaffiliates.Allrightsreserved.|
In-MemoryOLTPAcceleration
DBBufferCache
In-MemoryOLTPCache
FlashCache
HardDiskDrive
OracleConfidential– Internal 30
DBBufferCache
In-MemoryOLTPCache
FlashCache
HardDiskDrive
DataisneverinDBBufferCacheorInMemoryOLTPCacheatthesametime
DatabaseServer
StorageServer
Copyright©2015, Oracleand/oritsaffiliates.Allrightsreserved.|
CacheisAdditive withCacheatDatabaseServer• BlocksinbuffercachewillnotbecachedinStorageServerIn-MemoryOLTPCache– ClientreadhitsinStorageServerIn-MemoryOLTPCachewillevicttheblocksfromthecache– ClientreadmissesinStorageServerIn-MemoryOLTPCachewillpopulateflashcache,butnotStorageServerIn-MemoryOLTPCache
• BlocksevictedfrombuffercachegloballywillbepopulatedintoStorageServerIn-MemoryOLTPCache– StorageServerwillreadtheblocksfromflashcacheandpopulateintoIn-MemoryOLTPCache
• EliminationofContextSwitchesFurtherReducesLatency– 100usecreadlatency
OracleConfidential– Internal 31
Copyright©2015, Oracleand/oritsaffiliates.Allrightsreserved.|
IdentifyingWorkloadsforIn-MemoryOLTPAcceleration
OracleConfidential– Internal 32
Copyright©2015, Oracleand/oritsaffiliates.Allrightsreserved.| 33
AChoiceofExadata DeploymentModels
PublicCloudServiceCloudatCustomer
X7-2 X7-8
On-Premises
CustomerDataCenterPurchased
CustomerManaged
CustomerDataCenterSubscription
OracleManaged
OracleCloudSubscription
OracleManaged
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|
ExadataAdvantagesIncreaseEveryYear
34
• SmartScan• InfiniBandScale-Out
• DatabaseAwareFlashCache• StorageIndexes• ColumnarCompression
• IOPriorities• DataMiningOffload•OffloadDecryptonScans
• In-MemoryFaultTolerance• Direct-to-wireProtocol• JSONandXMLoffload• Instantfailuredetection
•NetworkResourceManagement•MultitenantAwareResourceMgmt• PrioritizedFileRecovery
•UnifiedInfiniBand
• Scale-OutServers
• Scale-OutStorage• DBProcessorsinStorage
• PCIeNVMe Flash• TieredDisk/Flash
• Software-in-Silicon
• 3DV-NANDFlash
• In-MemoryColumnarinFlash• ExadataCloudService• SmartFusionBlockTransfer
• ExadataCloudatCustomer• In-MemoryOLTPAcceleration
DramaticallyBetterPerformanceandCost • HotSwappable
Flash• 25GigEClientNetwork
Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.| 35