Lightning Talk Building Big Data Analytics Data Lake with ...€¦ · users to connect on premise...
Transcript of Lightning Talk Building Big Data Analytics Data Lake with ...€¦ · users to connect on premise...
QCT CONFIDENTIALwww.QCT.io
BuildingBigDataAnalyticsDataLakewith
All-FlashCephQCTMarcoHuangQCTAmyChang
QCT CONFIDENTIALwww.QCT.io
• Introduction ofQCT• Whydatalakearchitecture• BriefonDataLakewithAll-FlashCeph architecture
– Architecture design– Hardwareselection– Testingresult
• Conclusion
Agenda
2
AleadingclouddatacentersolutionproviderthatdeliversServer,Storage,Networking,RackSystemandCloudSolutionunderasingle,provenroof
QCT CONFIDENTIAL
www.QCT.io
Data-poweredcompanyneedsflexibledataanalyticframeworkPopularHadoopframeworkistopchoiceforexecutinganalyticaltasksyetcan’tscale-outon-demand
Data-GeneratingCenter
Data-PoweredCompany
Paradigm ShiftforEnterprise:Need fordataanalyticsincreases
+
Hyper-convergedHadoopFramework
DisaggregatedSDSArchitecture
VS
PreferDisaggregatedArchitecture:Targettoscale-outondemand
4
PoweredbyIntel®Xeon®processors
QCT CONFIDENTIALwww.QCT.io
5
DataLakewithAll-FlashCeph ArchitectureDisaggregatedataanalyticsclusterandbackendstoragetoprovidehigherflexibility
DataAnalyticsCluster1Query Engines(Hive)
HadoopHDFSTMP
DataAnalyticsCluster2
Presto
S3RESTAPI
80G
RADOSGateway(RGW)
RADOSGateway(RGW)
40G40G
x 16 x 16 x 16x 16 x 16 x 16 x 16 x 16 x 16x 16
40G40G
M Ceph-Monitor Node Ceph-OSDNode
M
LoadBalancer
PoweredbyIntel®Xeon®processors
QCT CONFIDENTIALwww.QCT.io
DataAnalyticsCluster1Query Engines(Hive)
HadoopHDFSTMP
DataAnalyticsCluster2
Presto
S3RESTAPI
80G
RADOSGateway(RGW)
RADOSGateway(RGW)
40G40G
x 16 x 16 x 16x 16 x 16 x 16 x 16 x 16 x 16x 16
40G40G
M Ceph-Monitor Node Ceph-OSDNode
M
LoadBalancer
6
Allowsmultipledataanalyticsclustertorunconcurrently
DataLakewithAll-FlashCeph ArchitectureDisaggregatedataanalyticsclusterandbackendstoragetoprovidehigherflexibility
PoweredbyIntel®Xeon®processors
QCT CONFIDENTIALwww.QCT.io
DataAnalyticsCluster1Query Engines(Hive)
HadoopHDFSTMP
DataAnalyticsCluster2
Presto
S3RESTAPI
80G
RADOSGateway(RGW)
RADOSGateway(RGW)
40G40G
x 16 x 16 x 16x 16 x 16 x 16 x 16 x 16 x 16x 16
40G40G
M Ceph-Monitor Node Ceph-OSDNode
M
LoadBalancer
7
CompatibilitywithS3allowsuserstoconnectonpremisecloudtopubliccloud
DataLakewithAll-FlashCeph ArchitectureDisaggregatedataanalyticsclusterandbackendstoragetoprovidehigherflexibility
PoweredbyIntel®Xeon®processors
QCT CONFIDENTIALwww.QCT.io
DataAnalyticsCluster1Query Engines(Hive)
HadoopHDFSTMP
DataAnalyticsCluster2
Presto
S3RESTAPI
80G
RADOSGateway(RGW)
RADOSGateway(RGW)
40G40G
x 16 x 16 x 16x 16 x 16 x 16 x 16 x 16 x 16x 16
40G40G
M Ceph-Monitor Node Ceph-OSDNode
M
LoadBalancer
8
Loadbalancertoequallydistributeworkloads
DataLakewithAll-FlashCeph ArchitectureDisaggregatedataanalyticsclusterandbackendstoragetoprovidehigherflexibility
PoweredbyIntel®Xeon®processors
QCT CONFIDENTIALwww.QCT.io
DataAnalyticsCluster1Query Engines(Hive)
HadoopHDFSTMP
DataAnalyticsCluster2
Presto
S3RESTAPI
80G
RADOSGateway(RGW)
RADOSGateway(RGW)
40G40G
x 16 x 16 x 16x 16 x 16 x 16 x 16 x 16 x 16x 16
40G40G
M Ceph-Monitor Node Ceph-OSDNode
M
LoadBalancer
9
Ceph asbackendstoragetoscale-outondemand
DataLakewithAll-FlashCeph ArchitectureDisaggregatedataanalyticsclusterandbackendstoragetoprovidehigherflexibility
PoweredbyIntel®Xeon®processors
QCT CONFIDENTIALwww.QCT.io
QuantaGrid D52BQ-2U– ScaleAlongwithYourBusinessIntelPurley platformwithupto242.5”bayswithSATA/SAS/NVMe support
TopshelfXeon®Pprocessor1
Upto10x PCIeexpansionslots
Upto26x hot-swapdrivebays
Upto3TBmemorycapacity2
1.Withlimitedconditions2.WithspecificCPU
Asmanyas24xSFF+optionalextra2xrearSSDbays(SATA/SAS/NVMe support)
12xLFF+optionalextra2xrearSSDbays(SATA/SAS/NVMe support)
Allscrew-less,hot-swappable!
PoweredbyIntel®Xeon®processors
QCT CONFIDENTIAL
www.QCT.io11
QxStor Ceph – KnowyourDemand,EasytoConfigureQCTQxStor BigDataAnalyticsDataLakewithAll-FlashCeph Solution
ThroughputOptimized
QxStor RCT-400QxStor RCT-200
Cost/Capacity Optimized
QxStor RCC-400
IOPS Optimized
QxStor RCI-300
ForStreamingMedia
ForArchiving
ForMission
CriticalApp
PoweredbyIntel®Xeon®processors
D52BQ-2UD51PH-1ULH T21P-4U T21P-4U D51BP-1U
+25% InTotalStorageCapacityavailable1
-33% InsequentialwritingLatencyTesting2
+50% InsequentialwritingThroughput2
Upto560TB Perchassis
Upto63% Costdown
+100% ImproveinIOPSperformance3
1.6M/s HighestIOPS3
-50%
Purley Available!
ReduceinLatency3
3TestresultofRCI-3002 TestresultofRCT-4001SKUstatisticsofRCT-200
QCT CONFIDENTIALwww.QCT.io
HighPerformance-OptimizedStorage
SuitableforMissionCriticalApp
CostEfficient ComparedtoHDD
All-FlashCeph ispreferredfordataanalyticworkloadsNVMe ispreferredboth fromthebusiness andperformanceperspective
CPUUtilization
NetworkTraffic
DiskReadThroughput
DiskReadLatency
x9.24Incoming:x3.9
x2.81 x9.77Outgoing:x16.1
PerformancePerspectiveNVMe exhibits exceptional results on system metricsthan conventional disks
BusinessPerspectiveNVMe is no longer a luxury device for enterprisewith IO intensive workloads
12
PoweredbyIntel®Xeon®processors
QCT CONFIDENTIAL
www.QCT.io13
TestResult– HiveandPrestoAssuredperformancefordatalakearchitecturewhencomparedtoHDFShyper-convergedarchitecture
Minorchangesareobserved intotalruntimecomparisonbetweenHDFShyper-convergedandCeph disaggregatedarchitectureusingHive.
Upto22.91%fasterintotalruntimeforCephdisaggregatedarchitectureusing Presto,theeffectisespecially notableforlargedatasize.
PoweredbyIntel®Xeon®processors
QCT CONFIDENTIALwww.QCT.io
AssuredPerformanceLevelComparabletestresultstohyper-convergedarchitecturefordataanalytics
Cost-EfficientArchitectureLowerstoragerequiredfordatadurabilitythanHDFSorRAIDbasedsystems
Scale-OutAccordingtoNeedScalingcomponents independently reducescost&managementcomplexity
DisaggregatedarchitectureissuitablefordataanalyticsMeetthedemandforbigdataframeworkswhileprovidinghigherflexibility
14