Introduction | Pivotal Greenplum Database Docs
Transcript of Introduction | Pivotal Greenplum Database Docs
12346
181920222425
TableofContents
TableofContentsIntroductionKeyPointsforReviewCharacteristicsofaSupportedPivotalHardwarePlatformPivotalApprovedRecommendedArchitecturePivotalClusterExamplesExampleRackLayoutUsinggpcheckperftoValidateDiskandNetworkPerformancePivotalGreenplumSegmentInstancesperServerPivotalGreenplumonVirtualizedSystemsAdditionalHelpfulTools
©CopyrightPivotalSoftwareInc,2013-2016 1 A03
IntroductionTheEMCDataComputingApplianceprovidesaready-madeplatformthatstrivestoaccommodatethemajorityofcustomerworkloads.OneofPivotalGreenplum’sstrongestvaluepropositionsisitsabilitytorunonpracticallyanymodern-dayhardwareplatform.Moreandmore,PivotalEngineeringisseeingcaseswherecustomerselecttobuildaclusterthatsatisfiesaspecificrequirementorpurpose.
PivotalPlatformEngineeringpublishesthisframeworkasaresourceforassistingcustomersinthiseffort.
ObjectivesThisguidecanbeusedfor:
AclearunderstandingofwhatcharacterizesarecommendedplatformforrunningPivotalGreenplumDatabase
Areviewofthetwomostcommontopologieswithsupportingrecommendedarchitecturediagrams
Pivotalrecommendedreferencearchitecturethatincludeshardwarerecommendations,configuration,harddiskguidelines,networklayout,installation,dataloading,andverification
Extraguidancewithreal-worldGreenplumclusterexamples(seePivotalClusterExamples)
Thisdocumentdoes:providerecommendationsforbuildingawell-performingPivotalclusterusingthehardwareguidelinespresented
providegeneralconceptswithoutspecifictuningsuggestions
Thisdocumentdoesnot:
promisePivotalsupportfortheuseofthirdpartyhardware
assumethattheinformationhereinappliestoeverysite,butissubjecttomodificationdependingonacustomer’sspecificlocalrequirements
provideall-inclusiveproceduresforconfiguringPivotalGreenplum.AsubsetofinformationisincludedasitpertainstodeployingaPivotalcluster.
GreenplumTermstoKnowmaster
AserverthatprovidesentrytotheGreenplumDatabasesystem,acceptsclientconnectionsandSQLqueries,anddistributesworktothesegmentinstances.
segmentinstancesIndependentPostgreSQLdatabasesthateachstoreaportionofthedataandperformthemajorityofqueryprocessing.
segmenthostAserverthattypicallyexecutesmultipleGreenplumsegmentinstances.
interconnectNetworkinglayeroftheGreenplumDatabasearchitecturethatfacilitatesinter-processcommunicationbetweensegments.
FeedbackandUpdatesPleasesendfeedbackand/[email protected].
©CopyrightPivotalSoftwareInc,2013-2016 2 A03
KeyPointsforReview
WhatisPivotalEngineeringRecommendedArchitecture?ThisPivotalRecommendedArchitecturecomprisesgenericrecommendationsforthirdpartyhardwareforusewithPivotalsoftwareproducts.Pivotalmaintainsexamplesofvariousimplementationsinternallytoaidinassistingcustomersinclusterdiagnosticsandconfigurationassistance.Pivotaldoesnotperformhardwarereplacement,norisPivotalasubstitutefortheOEMvendorsupportfortheseconfigurations.
WhyInstallonanOEMVendorPlatform?TheEMCDCAstrivestoachievethebestbalancebetweenperformanceandcostwhilemeetingabroadrangeofcustomerneeds.Therearesomeveryvalidreasonscustomersmayopttodesigntheirownclusters.
Somepossibilitiesare:
Varyingworkloadprofilesthatmayrequiremorememoryorhigherprocessorcapacity
Specificfunctionalneedslikepublic/privateclouds,increaseddensity,ordisasterrecovery(DR)
Supportforradicallydifferentnetworktopologies
Deeper,moredirectaccessforhardwareandOSmanagement
ExistingrelationshipswithOEMhardwarepartners
PivotalEngineeringhighlyrecommendsfollowingPivotalarchitectureguidelinesifcustomersoptoutofusingtheapplianceanddiscussingtheimplementationwithaPivotalEngineer.Customersachievemuchgreaterreliabilitywhenfollowingtheserecommendations.
©CopyrightPivotalSoftwareInc,2013-2016 3 A03
CharacteristicsofaSupportedPivotalHardwarePlatform
CommodityHardwarePivotalbelievesthatcustomersshouldtakeadvantageoftheinexpensiveyetpowerfulcommodityhardwarethatincludesx86_64platformcommodityservers,storage,andEthernetswitches.
Pivotalrecommends:
Chipsetsorhardwareusedacrossmanyplatforms
NICchipsets(likesomeoftheIntelseries)RAIDcontrollers(likeLSIorStorageWorks)
Referencemotherboards/designs
Machinesthatusereferencemotherboardimplementationsarepreferred.AlthoughDIMMcountisimportant,ifamanufacturerintegratesmoreDIMMslotsthantheCPUmanufacturerspecifies,moreriskisplacedontheplatform.
Ethernet-basedinterconnects(10Gb)are
Highlypreferredtoproprietaryinterconnects.Highlypreferredtostoragefabrics.
ManageabilityPivotalrecommends:
Remote,out-of-bandmanagementcapabilitywithsupportforsshconnectivityaswellasweb-basedconsoleaccessandvirtualmedia.
DiagnosticLEDsthatconveyfailureinformation.Amberlightsareaminimum,butanLEDthatdisplaystheexactfailureismoreuseful.
Tool-freemaintenance(thecovercanbeopenedwithouttools,partsarehot-swappablewithouttools,etc.).
Labeling–componentssuchasDIMMsarelabeledsoit’seasytodeterminewhichpartneedstobereplaced.
Command-line,script-basedinterfacesforconfiguringtheserverBIOS,andoptionslikeRAIDcardsandNICs.
RedundancyPivotalrecommends:
Redundanthot-swappablepowersupplies
Redundanthot-swappablefans
Redundantnetworkconnectivity
Hot-swappabledrives
Hot-sparedriveswhenimmediatereplacementoffailedhardwareisunavailable
DeterminingtheBestTopology
TraditionalTopologyThisconfigurationrequirestheleastspecificnetworkingskills,andisthesimplestpossibleconfiguration.Inatraditionalnetworktopology,everyserverintheclusterisdirectlyconnectedtoeveryswitchinthecluster.Thisistypicallyimplementedover10GbEthernet.Thistopologylimitstheclustersizetothenumberofportsontheselectedinterconnectswitches.10Gbportsontheserversarebondedintoanactive/activepairandroutedirectlytoasetofswitchesconfiguredusingMLAG(orcomparabletechnology)toprovidearedundanthighspeednetworkfabric.
©CopyrightPivotalSoftwareInc,2013-2016 4 A03
Figure:RecommendedArchitectureExample1(TypicalTopology)
ScaleableTopologyScalablenetworksimplementanetworkcorethatallowstheclustertogrowbeyondthenumberofportsintheinterconnectswitches.Caremustbetakentoensurethatthenumberoflinksfromthein-rackswitchesisadequatetoservicethecore.
HowtoDeterminetheMaximumNumberofServers
Forexample,eachrackcanhold16serversandyoudeterminethatthecoreswitcheseachhave48ports.Oftheseports4areusedtocreatetheMLAGbetweenthetwocoreswitches.Oftheremaining44ports,networkingfromasinglesetofinterconnectswitchesinarackuses4linkspercoreswitch,2fromeachinterconnectswitchtoeachofthecoreswitches.Themaximumnumberofserversisdeterminedbythefollowingformula:
max-nodes=(nodes-per-rack*((core-switch-port-count-MLAGportutilization)/rack-to-rack-link-port-count))176=(16*((48-4)/4))
Figure:RecommendedArchitectureExample2(ScalableTopology)
©CopyrightPivotalSoftwareInc,2013-2016 5 A03
PivotalApprovedRecommendedArchitecture
MinimumServerGuidelinesTable1listsminimumrequirementsforagoodcluster.Usegpcheckperftogeneratethesemetrics.
SeeAppendixC:Using gpcheckperf toValidateDiskandNetworkPerformanceforexample gpcheckperf output.
Table1.BaselineNumbersforaPivotalCluster
MasterNodes(mdw&smdw)
Usersandapplicationsconnecttomasterstosubmitqueriesandreturnresults.Typically,monitoringandmanagingtheclusterandthedatabaseisperformedthroughthemasternodes.
8+physicalcoresatgreaterthan2GHzclockspeed
>256GB
>600MB/sRead
>500MB/sWrite
2x10GbNICs
MultipleNICs 1U
SegmentNodes(sdw)
Segmentnodesstoredataandexecutequeries.Theyaregenerallynotpublicfacing.
Multiplesegmentinstancesrunononesegmentnode.
8+physicalcoresatgreaterthan2GHzclockspeed
>256GB
>2000MB/sRead
>2000MB/sWrite
2x10GbNICs
MultipleNICs 2U
ETL/BackupNodes(etl)
Generallyidenticaltosegmentnodes.Theseareusedasstagingareasforloadingdataorasdestinationsforbackupdata.
8+physicalcoresatgreaterthan2GHzclockspeed
>64GBormore
>2000MB/sRead
>2000MB/sWrite
2x10GbNICs
MultipleNICs 2U
NetworkGuidelines
Table2.AdministrationandInterconnectSwitches
AdministrationNetwork
Administrationnetworksareusedtotietogetherlights-outmanagementinterfacesintheclusterandprovideamanagement
48 1GbAlayer-2/layer-3managedswitchperrackwithnospecificbandwidthorblockingrequirements.
©CopyrightPivotalSoftwareInc,2013-2016 6 A03
clusterandprovideamanagementrouteintoserverandOSswitches.
InterconnectNetwork 48 10GB
Twolayer-2/layer-3managedswitchesperrack.Allportsmusthavefullbandwidth,beabletooperateatlinerate,andbenon-blocking.
Table3.Racking,Power,andDensity
RackingGenerally,a40Uorlargerrackthatis1200mmdeepisrequired.Built-incablemanagementispreferred.ESMprotectivedoorsarealsopreferred.
Power
ThetypicalinputpowerforaPivotalGreenplumrackis4x208/220V,30amp,singlephasecircuitsintheUS.Internationally,4x230V,32amp,singlephasecircuitsaregenerallyused.Thisaffordsapowerbudgetof~9600VAoffullyredundantpower.
Otherpowerconfigurationsareabsolutelyfinesolongasthereisenoughenergydeliveredtotheracktoaccommodatethecontentsoftherackinafullyredundantmanner.
NodeGuidelines
OSLevelsAtaminimumthefollowingoperatingsystems(OS)aresupported:
RedHat/CentOSLinux5*
RedHat/CentOSLinux6
RedHat/CentOSLinux7**
SUSEEnterpriseLinux10.2or10.3
SUSEEnterpriseLinux11
*RHEL/CentOS5willbeunsupportedinthenextmajorrelease
**supportforRHEL/CentOS7isnearcompletion,pendingkernelbugfixes
ForthelatestinformationonsupportedOSversions,refertotheGreenplumDatabaseInstallationGuide.
SettingOSParametersforGreenplumDatabaseCarefulconsiderationmustbegivenwhensettingOSparametersforGreenplumDatabasehosts.RefertothelatestversionoftheGreenplumDatabaseInstallationGuideforthesesettings.
GreenplumDatabaseServerGuidelinesGreenplumDatabaseintegratesthreekindsofservers:masterservers,segmenthosts,andETLservers.GreenplumDatabaseserversmustmeetthefollowingcriteria.
MasterServers1Uor2Userver.Withlessofaneedfordrives,rackspacecanbesavedbygoingwitha1Uformfactor.However,a2Uformfactorconsistentwithsegmenthostsmayincreasesupportability.
©CopyrightPivotalSoftwareInc,2013-2016 7 A03
Sameprocessors,RAM,RAIDcard,andinterconnectNICsasthesegmenthosts.
Sixtotendisks(eightismostcommon)organizedintoasingleRAID5groupwithonehotspareconfigured.
SAS15korSSDdisksarepreferredwith10kdisksaclosesecond.
SATAdrivesareacceptableinsolutionsorientedtowardsarchivalspaceoverqueryperformance.
Alldisksmustbethesamesizeandtype.
Shouldbecapableofreadratesin gpcheckperf of500MB/sorhigher.(Thefasterthemasterscans,thefasteritcangeneratequeryplans,whichimprovesoverallperformance.)
Shouldbecapableofwriteratesin gpcheckperf of500MB/sorhigher.
Shouldhavesufficientadditionalnetworkinterfacestoconnecttothecustomernetworkdirectlyinthemannerdesiredbythecustomer.
SegmentHostsTypicallya2Userver.
Thefastestavailableprocessors.
256GBRAMormore.
OneortwoRAIDcardswithmaximumcacheandcacheprotection(flashorcapacitorspreferredoverbattery).RAIDcardsshouldbeabletosupportfullread/writecapacityofthedrives.
2x10GbNICs.
12to24disksorganizedintotwoorfourRAID5groups.Hotsparesshouldbeconfigured,unlesstherearedisksonhandforquickreplacement.
SAS15kdisksarepreferredwith10kdisksaclosesecond.SATAdisksarepreferredovernearlineSASifSAS15korSAS10kcannotbeused.Alldisksmustbethesamesizeandtype.
Aminimumreadratein gpcheckperf of300MB/spersegmentorhigher.(2000MB/sperserveristypical.)
Aminimumwriteratein gpcheckperf of300MB/sorhigher(2000MB/sperserveristypical.)
AdditionalTipsforSegmentHostConfigurationThenumberofsegmentinstancesthatarerunpersegmenthostisconfigurable,andeachsegmentinstanceisitselfadatabaserunningontheserver.Abaselinerecommendationoncurrenthardware,suchasthehardwaredescribedinAppendixA,is8primarysegmentinstancesperphysicalserver.
AsetofmemoryparameterswillbedeterminedwheninstallingthedatabasesoftwarethatdependupontheamountofRAMselectedforeachsegmentinstance.Whilethesearenotplatformparameters,itistheplatformthatdetermineshowmuchmemoryisavailableandhowthememoryparametersshouldbesetinthesoftware.Refertotheonlinecalculator(http://greenplum.org/calc/ )todeterminethesesettings.
RefertoAppendixDforfurtherreadingonsegmentinstanceconfiguration.
ETLServersTypicallya2Userver.
Thesameprocessors,RAM,andinterconnectNICsasthesegmentservers
OneortwoRAIDcardswithmaximumcacheandcacheprotection(flashorcapacitorspreferredoverbattery).
12to24disksorganizedintoRAID5groupsofsixtoeightdiskswithnohotsparesconfigured(unlessthereareavailabledisksaftertheRAIDgroupsareconstructed).
SATAdisksareagoodchoiceforETLasperformanceistypicallylessofaconcernthanstorageforthesesystems.
Shouldbecapableofreadratesin gpcheckperf of100MB/sorhigher.(ThefastertheETLserversscan,thefasterquerydatacanbeloaded.
Shouldbecapableofwriteratesin gpcheckperf of500MB/sorhigher.(ThefasterETLserverswrite,thefasterdatacanbestagedforloading.)
AdditionalTipsforSelectingETLServersETLnodescanbeanyserverthatoffersenoughstorageandperformancetoaccomplishthetasksrequired.Typically,between4and8ETLserversarerequiredpercluster.ThemaximumnumberisdependentonthedesiredloadperformanceandthesizeoftheGreenplumDatabasecluster.
Forexample,thelargertheGreenplumDatabasecluster,thefastertheloadscanbe.ThemoreETLservers,thefasterdatacanbeserved.HavingmoreETLbandwidththantheclustercanreceiveispointless.HavingmuchlessETLbandwidththantheclustercanreceivemakesforslowerloadingthanthe
©CopyrightPivotalSoftwareInc,2013-2016 8 A03
maximumpossible.
HardDiskConfigurationGuidelinesAgenericserverwith24hot-swappablediskscanhaveseveralpotentialdiskconfigurations.TestingbyPivotalPlatformandSystemsEngineeringshowsthatthebestperformingstorageforPivotalsoftwareis:
fourRAID5groupsofsixdiskseach(usedasfourfilesystems),or
combinedintooneortwofilesystemsusinglogicalvolumemanager.
ThefollowinginstructionsdescribehowtobuildtherecommendedRAIDgroupsandvirtualdisksforbothmasterandsegmentnodes.Howtheseultimatelytranslateintofilesystemsiscoveredintherelevantoperatingsystem’sinstallationguide.
LUNConfigurationTheRAIDcontrollersettingsanddiskconfigurationarebasedonsyntheticloadtestingperformedonseveralRAIDconfigurations.Unfortunately,thesettingsthatresultedinthebestreadratesdidnothavethehighestwriteratesandthesettingswiththebestwriteratesdidnothavethehighestreadrates.
Theprescribedsettingsofferacompromise.Inotherwords,thesesettingsresultinwriterateslowerthanthebestmeasuredwriteratebuthigherthanthewriteratesassociatedwiththesettingsforthehighestreadrate.Thesameistrueforreadrates.Thisisintendedtoensurethatbothinputandoutputarethebesttheycanbewhileaffectingtheothertheleastamountpossible.
LUNsforthesystemshouldbepartitionedandmountedas/data1forthefirstLUNandadditionalLUNsshouldfollowthesamenamingconventionwhileincrementallyincreasingthenumber(/data1,/data2,/data3…/dataN).AllfilesystemsshouldbeformattedasxfsandfollowtherecommendationssetforthinthePivotalGreenplumDatabaseInstallationGuide.
MasterServerMasterservers(primaryandsecondary)haveeight,hot-swappabledisks.Configurealleightdisksintoasingle,RAID5stripeset.Eachofthevirtualdisksthatarecarvedfromthisdiskgroupshouldhavethefollowingproperties:
256kstripewidth
Noread-ahead
Diskcachedisabled
DirectI/O
VirtualdisksareconfiguredintheRAIDcard’soptionalROM.EachvirtualdiskdefinedintheRAIDcardwillappeartobeadiskintheoperatingsystemwitha/dev/sd?devicefilename.
SegmentandETLServersSegmentservershave24,hot-swappabledisks.ThesecanbeconfiguredinanumberofwaysbutPivotalrecommendsfour,RAID5groupsofsixdiskseach(RAID5,5+1).Eachofthevirtualdisksthatwillbecarvedfromthesediskgroupsshouldhavethefollowingproperties:
256kstripewidth
Noread-ahead
Diskcachedisabled
DirectI/O
VirtualdisksareconfiguredintheRAIDcard’soptionalROM.EachvirtualdiskdefinedintheRAIDcardwillappeartobeadiskintheoperatingsystemwitha/dev/sd?devicefilename.
SSDStorageFlashstoragehasbeengaininginpopularity.PivotalhasnothadtheopportunitytodoenoughtestingwithSSDdrivestomakearecommendation.Itis
©CopyrightPivotalSoftwareInc,2013-2016 9 A03
importantwhenconsideringSSDdrivestovalidatethesustainedsequentialreadandwriteratesforthedrive.Manydriveshaveimpressiveburstrates,butareunabletosustainthoseratesforlongperiodsoftime.Additionally,thechoiceofRAIDcardneedstobeevaluatedtoensureitcanhandlethebandwidthoftheSSDdrives.
SAN/JBODStorageInsomeconfigurationsitmaybearequirementtouseanexternalstoragearrayduetothedatabasesizeorservertypebeingusedbythecustomer.Withthisinmind,itisimportanttounderstandthat,basedontestingbyPivotalPlatformandSystemsEngineering,SANandJBODstoragewillnotperformaswellaslocal,internalserverstorage.
Someconsiderationstobetakenintoaccountifinstallingorsizingsuchaconfigurationarethefollowing(independentofthevendorofchoice):
Knowthedatabasesizeandtheestimatedgrowthovertime
Knowthecustomer’sread/writeratio
LargeblockI/Oisthepredominantworkload(512KB)
DisktypeandpreferredRAIDtypebasedonthevendorofchoice
Expecteddiskthroughputbasedonreadandwrite
Responsetimeofthedisks/JBODcontroller
PreferredoptionistohaveBBUcapabilityoneithertheRAIDcardorcontroller
Redundancyinswitchzoning,preferablywithafanin:out2:1
Atleast8GBFibreChannel(FC)connectivity
EnsurethattheserversupportstheuseofFC,FCoE,orexternalRAIDcards
Inallinstanceswhereanexternalstoragesourceisbeingutilized,thevendorofthediskarray/JBODshouldbeconsultedtoobtainspecificrecommendationsbasedonasequentialworkload.Thismayalsorequirethecustomertoobtainadditionallicensesfromthepertinentvendors.
NetworkLayoutGuidelinesAllthesystemsintheGreenplumclusterneedtobetiedtogetherinsomeformofdedicated,high-speeddatainterconnect.Thisnetworkisusedforloadingdataandforpassingdatabetweensystemsduringqueryprocessing.Itshouldbeashigh-speedandlow-latencyaspossible,anditshouldnotbeusedforanyotherpurpose(i.e.,itshouldnotbepartofthegeneralLAN).
AruleofthumbfornetworkutilizationinaGreenplumclusteristoplanforuptotwentypercentofeachserver’smaximumI/Oreadbandwidthasnetworktraffic.Thismeansaserverwitha2000MB/sreadbandwidth(asmeasuredby gpcheckperf )mightbeexpectedtotransmit400MB/s.Greenplumalsocompressessomedataondiskbutuncompressesitbeforetransmittingtoothersystemsinthecluster,soa2000MB/sreadratewitha4xcompressionratioresultsinan8000MB/seffectivereadrate.Twentypercentof8000MB/sis1600MB/swhichismorethanasinglegigabitinterface’sbandwidth.
Toaccommodatethistraffic,10Gbnetworkingisrecommendedfortheinterconnect.Currentbestpracticesuggeststwo10Gbinterfacesfortheclusterinterconnect.Thisensuresthatthereisbandwidthtogrowinto,andreducescablingintheracks.Itisrecommendedtoconfigurethetwo10GbinterfaceswithNICbondingtocreateaload-balanced,fault-tolerantinterconnect.
Cisco,Brocade,andAristaswitchesaregoodchoicesasthesebrandsincludetheabilitytotieswitchestogetherinfabrics.TogetherwithNICbondingontheservers,thisapproacheliminatessinglepointsoffailureintheinterconnectnetwork.Intel,QLogic,orEmulexnetworkinterfacestendtoworkbest.Layer3capabilityisrecommendedsinceitintegratesmanyfeaturesthatareusefulinaGreenplumDatabaseenvironment.
Note:Thevendorhardwarereferencedaboveisstrictlymentionedasanexample.PivotalPlatformandSystemsEngineeringdoesnotspecifywhichproductstouseinthenetwork.
FCoEswitchsupportisalsorequiredifSANstorageisused,aswellassupportforFibresnooping(FIPS).
AGreenplumDatabaseclusterusesthreekindsofnetworkconnections:
Adminnetworks
Interconnectnetworks
Externalnetworks
AdminNetworks
©CopyrightPivotalSoftwareInc,2013-2016 10 A03
AnAdminnetworktiestogetherallthemanagementinterfacesforthedevicesinaconfiguration.Itisgenerallyusedtoprovidemonitoringandout-of-bandconsoleaccessforeachconnecteddevice.Theadminnetworkistypicallya1Gbnetworkphysicallyandlogicallydistinctfromothernetworksinthecluster.
Serversaretypicallyconfiguredsuchthattheout-of-bandorlights-outmanagementinterfacessharethefirstnetworkinterfaceoneachserver.Inthisway,thesamephysicalnetworkprovidesaccesstolights-outmanagementandanoperatingsystemlevelconnectionusefulfornetworkOSinstallation,patchdistribution,monitoring,andemergencyaccess.
SwitchTypes
Typicallyone24-or48-port,1Gbswitchperrackandoneadditional48-portswitchclusterasacore.
Any1GbswitchcanbeusedfortheAdminnetwork.Carefulplanningisrequiredtoensurethatanetworktopologyisdesignedtoprovideenoughconnectionsandthefeaturesdesiredbythesitetoprovidethekindsofaccessrequired.
CablesUseeithercat5eorcat6cablingfortheAdminnetwork.Cablethelights-outormanagementinterfacefromeachclusterdevicetotheAdminnetwork.PlaceanAdminswitchineachrackandcross-connecttheswitchesratherthanattemptingtoruncablesfromacentralswitchtoallracks.
Note:PivotalrecommendsusingadifferentcolorcablefortheAdminnetwork.
InterconnectNetworksTheinterconnectnetworktiestheserversintheclustertogetherandformsahigh-speed,low-contentiondataconnectionbetweentheservers.ThisshouldnotbeimplementedonthegeneraldatacenternetworkasGreenplumDatabaseinterconnecttraffictendstooverwhelmnetworksfromtimetotime.LowlatencyisneededtoensureproperfunctioningoftheGreenplumDatabasecluster.Sharingtheinterconnectwithageneralnetworktendstointroduceinstabilityintothecluster.
Typicallytwoswitchesarerequiredperrack,andtwomoretoactasacore.Usetwo10Gbcablesperserverandeightperracktoconnecttheracktothecore.
Interconnectnetworksareoftenconnectedtogeneralnetworksinlimitedwaystofacilitatedataloading.Inthesecases,itisimportanttoshieldboththeinterconnectnetworkandthegeneralnetworkfromtheGreenplumDatabasetrafficandvisa-versa.UsearouteroranappropriateVLANconfigurationtoaccomplishthis.
ExternalNetworkConnectionsThemasternodesareconnectedtothegeneralcustomernetworktoallowusersandapplicationstosubmitqueries.Typically,thisisdonewithasmallnumberof1Gbconnectionsattachedtothemasternodes.Anymethodthataffordsnetworkconnectivityfromtheusersandapplicationsneedingaccesstothemasternodesisacceptable.
InstallationGuidelinesEachconfigurationrequiresaspecificrackplan.Therearesingleandmulti-rackconfigurationsdeterminedbythenumberofserverspresentintheconfiguration.Asinglerackconfigurationisonewherealltheplannedequipmentfitsintoonerack.Multi-rackconfigurationsrequiretwoormorerackstoaccommodatealltheplannedequipment.
RackingGuidelinesfora42URackConsiderthefollowingifinstallingtheclusterina42Urack.
Priortorackinganyhardware,performasitesurveytodeterminewhatpoweroptionisdesired,ifpowercableswillbetoporbottomoftherack,andwhethernetworkswitchesandpatchpanelswillbetoporbottomoftherack.
InstalltheKMMtrayintorackunit19.
Installtheinterconnectswitchesintorackunits21and22leavingaone-unitgapabovetheKMMtray.
Racksegmentnodesupfromfirstavailablerackunitatthebottomoftherack(seemulti-rackrulesforvariationsusinglowrackunits).
Installnomorethansixteen2Uservers(excludesmasterbutincludessegment,andETLnodes).
Installthemasternodeintorackunit17.Installthestand-bymasterintorackunit18.
Adminswitchescanberackedanywhereintherack,thoughthetopistypicallythebestandsimplestlocation.
©CopyrightPivotalSoftwareInc,2013-2016 11 A03
Allcomputers,switches,arrays,andracksshouldbelabeledonboththefrontandback.
Allcomputers,switches,arrays,andracksshouldbelabeledasdescribedinthesectiononlabelslaterinthisdocument.
Allinstalleddevicesshouldbeconnectedtotwoormorepowerdistributionunits(PDUs)intherackwherethedeviceisinstalled.
Wheninstallingamulti-rackcluster:
Installtheinterconnectcoreswitchesinthetoptworackunitsifthecablescomeinfromthetoporinthebottomtworackunitsifthecablescomeinfromthebottom.
Donotinstallcoreswitchesinthemasterrack.
CablingThenumberofcablesrequiredvariesaccordingtotheoptionsselected.Ingeneral,eachserverandswitchinstalledwilluseonecablefortheAdminnetwork.Runcablesaccordingtoestablishedcablingstandards.Eliminatetightbendsorcrimps.Clearlylabelallateachend.Thelabeloneachendofthecablemusttracethepaththecablefollowsbetweenserverandswitch.Thisincludes:
Switchnameandport
Patchpanelnameandport,ifapplicable
Servernameandport
SwitchConfigurationGuidelinesTypically,thefactorydefaultconfigurationissufficient.
IPAddressingGuidelines
IPAddressingSchemefortheAdminNetworkAnadminnetworkshouldbecreatedsothatsystemmaintenanceandaccessworkcanbedoneonanetworkthatisnotthesameasclustertrafficbetweenthenodes.
Note:Pivotal’srecommendedIPaddressforserversontheAdminnetworkusesastandardinternaladdressspaceandisextensibletoincludeover1,000nodes.
AllAdminnetworkswitchespresentshouldbecrossconnectedandallNICsattachedtotheseswitchesparticipateinthe172.254.0.0/16network.
Table4.IPAddressesforServersandCIMC
HostType NetworkInterface IPAddress
SecondaryMasterNode CIMC 172.254.1.252/16
Eth0 172.254.1.250/16
SecondaryMasterNode CIMC 172.254.1.253/16
Eth0 172.254.1.251/16
Non-masterSegmentNodesinrack1(masterrack) CIMC 172.254.1.101/16through172.254.1.116/16
Eth0 172.254.1.1/16through172.254.1.16/16
Non-masterSegmentNodesinrack2 CIMC 172.254.2.101/16through172.254.2.116/16
Eth0 172.254.2.1/16through172.254.2.16/16
Non-masterSegmentNodesinrack# CIMC 172.254.#.101/16through172.254.#.116/16
Eth0 172.254.#.1/16through172.254.#.16/16
Note:Where#istheracknumber.
Thefourthoctetiscountedfromthebottomup.Forexample,thebottomserverinthefirstrackis172.254.1.1andthetop,excludingmasters,is172.254.1.16.
©CopyrightPivotalSoftwareInc,2013-2016 12 A03
Thebottomserverinthesecondrackis172.254.2.1andtop172.254.2.16.Thiscontinuesforeachrackintheclusterregardlessofindividualserverpurpose.
IPAddressingforNon-serverDevicesThefollowingtableliststhecorrectIPaddressingforeachnon-serverdevice.
Table5.Non-serverIPAddresses
Device IPAddress
FirstInterconnectSwitchinRack *172.254.#.201/16
SecondInterconnectSwitchinRack *172.254.#.202/16
*Where#istheracknumber
IPAddressingforInterconnectsusing10GbNICsTheInterconnectiswheredataisroutedathighspeedbetweenthenodes.
Table6.InterconnectIPAddressingfor10GbNICS
HostType PhysicalRJ-45Port IPAddress
PrimaryMaster 1stportonPCIecard 172.1.1.250/16
2ndportonPCIecard 172.2.1.250/16
SecondaryMaster 1stportonPCIecard 172.1.1.251/16
2ndportonPCIecard 172.2.1.251/16
Non-MasterNodes 1stportonPCIecard 172.1.#.1/16through172.1.#.16/16
2ndportonPCIecard 172.2.#.1/16through172.2.#.16/16
Note:Where#istheracknumber:
Thefourthoctetiscountedfromthebottomup.Forexample,thebottomserverinthefirstrackuses172.1.1.1and172.2.1.1.
Thetopserverinthefirstrack,excludingmasters,uses172.1.1.16and172.2.1.16.
EachNIContheinterconnectusesadifferentsubnetandeachserverhasaNIConeachsubnet.
IPAddressingforFaultTolerantInterconnectsThefollowingtablelistscorrectIPaddressesforfaulttolerantinterconnectsregardlessofbandwidth.
Table7.FaultTolerant(Bonded)Interconnects
HostType IPAddress
PrimaryMaster 172.1.1.250/16
SecondaryMaster 172.1.1.251/16
Non-MasterNodes 172.1.#.1/16through172.1.#.16/16
Note:Where#istheracknumber:
Thefourthoctetiscountedfromthebottomup.Forexample,thebottomserverinthefirstrackuses172.1.1.1.
Thetopserverinthefirstrack,excludingmasters,uses172.1.1.16.
©CopyrightPivotalSoftwareInc,2013-2016 13 A03
DataLoadingConnectivityGuidelinesHigh-speeddataloadingrequiresdirectaccesstothesegmentnodes,bypassingthemasters.TherearethreewaystoconnectaPivotalclustertoexternaldatasourcesorbackuptargets:
VLANOverlay–ThefirstandrecommendedbestpracticeistousevirtualLANs(VLANS)toopenupspecifichostsinthecustomernetworkandtheGreenplumDatabaseclustertoeachother.
DirectConnecttoCustomerNetwork–Onlyuseifthereisaspecificcustomerrequirement.
Routing–Onlyuseifthereisaspecificcustomerrequirement.
VLANOverlayVLANoverlayisthemostcommonlyusedmethodtoprovideaccesstoexternaldatawithoutintroducingnetworkproblems.TheVLANoverlayimposesanadditionalVLANontheconnectionsofasubsetoftheclusterservers.
HowtheVLANOverlayMethodWorksUsingtheVLANOverlaymethod,trafficpassesbetweentheclusterserversontheinternalVLAN,butcannotpassoutoftheinternalswitchfabricbecausetheexternalfacingportsareassignedonlytotheoverlayVLAN.TrafficontheoverlayVLAN(traffictoorfromIPaddressesassignedtotherelevantservers’virtualnetworkinterfaces)canpassinandoutofthecluster.
ThisVLANconfigurationallowsmultipleclusterstoco-existwithoutrequiringanychangetotheirinternalIPaddresses.Thisgivescustomersmorecontroloverwhatelementsoftheclustersareexposedtothegeneralcustomernetwork.TheOverlayVLANcanbeadedicatedVLANandincludeonlythoseserversthatneedtotalktoeachother;ortheOverlayVLANcanbethecustomer’sfullnetwork.
Figure:BasicVLANOverlayExample
Thisfigureshowsaclusterwith3segmenthosts,master,standbymasterandETLhost.Inthiscase,onlytheETLhostispartoftheoverlay.ItisnotarequirementtohaveETLnodeusetheoverlay,thoughthisiscommoninmanyconfigurationstoallowdatatobestagedwithinacluster.Anyoftheserversinthisrackoranyrackofanyotherconfigurationmayparticipateintheoverlayifdesired.Thetypeofconfigurationwilldependuponsecurityrequirementsandiffunctionswithintheclusterneedtoreachanyoutsidedatasources.
ConfiguringtheOverlayVLAN–AnOverviewConfiguringtheVLANinvolvesthreesteps:
1. VirtualinterfacetagspacketswiththeoverlayVLAN
2. ConfiguretheswitchintheclusterwiththeoverlayVLAN
3. Configuretheportsontheswitchconnectingtothecustomernetwork
©CopyrightPivotalSoftwareInc,2013-2016 14 A03
Step1–VirtualinterfacetagspacketswiththeoverlayVLAN
EachserverthatisbothinthebaseVLANandtheoverlayVLANhasavirtualinterfacecreatedthattagspacketssentfromtheinterfacewiththeoverlayVLAN.Forexample,supposeeth2isthephysicalinterfaceonanETLserverthatisconnectedtothefirstinterconnectnetwork.ToincludethisserverinanoverlayVLANtheinterfaceeth2.1000iscreatedusingthesamephysicalportbutdefiningasecondinterfacefortheport.ThephysicalportdoesnottagitspacketsbutanypacketsentusingthevirtualportistaggedwithaVLAN.
Step2–ConfiguretheswitchintheclusterwiththeoverlayVLAN
TheswitchintheclusterthatconnectstotheserversandthecustomernetworkisconfiguredwiththeoverlayVLAN.AlloftheportsconnectedtoserversthatwillparticipateintheoverlayarechangedtoswitchportmodeconvergedandaddedtoboththeinternalVLAN(199)andtheoverlayVLAN(1000).
Step3–Configuretheswitchportsconnectedtothecustomernetwork
Theportsontheswitchconnectingtothecustomernetworkareconfiguredaseitheraccessortrunkmodeswitchports(dependingoncustomerpreference)andaddedonlytotheoverlayVLAN.
DirectConnecttotheCustomer’sNetwork
EachnodeintheGreenplumDatabaseclustercansimplybecableddirectlytothenetworkwherethedatasourcesexistoranetworkthatcancommunicatewiththesourcenetwork.Thisisabruteforceapproachthatworksverywell.Dependingonwhatnetworkfeaturesaredesired(redundancy,highbandwidth,etc.)thismethodcanbeveryexpensiveintermsofcablingandswitchgearaswellasspaceforrunninglargenumbersofcables.
Figure:DataLoading—DirectConnecttoCustomerNetwork
Routing
Onewayistouseanyofthestandardnetworkingmethodsusedtolinktwodifferentnetworkstogether.Thesecanbedeployedtotietheinterconnectnetwork(s)tothedatasourcenetwork(s).Whichofthesemethodsisusedwilldependonthecircumstancesandthegoalsfortheconnection.
ArouterisinstalledthatadvertisestheexternalnetworkstotheserversintheGreenplumcluster.Thismethodcouldpotentiallyhaveperformanceandconfigurationimplicationsonthecustomer’snetwork.
ValidationGuidelinesMostofthevalidationeffortisperformedaftertheOSisinstalledandavarietyofOS-leveltoolsareavailable.AchecklistisincludedintherelevantOSinstallationguidethatshouldbeseparatelyprintedandsignedfordeliveryandincludestheissuesraisedinthissection.
Examineandverifythefollowingitems:
Allcableslabeledaccordingtothestandardsinthisdocument
©CopyrightPivotalSoftwareInc,2013-2016 15 A03
Allrackslabeledaccordingtothestandardsinthisdocument
Alldevicespoweron
Allhot-swappabledevicesareproperlyseated
Nodevicesshowanywarningorfaultlights
AllnetworkmanagementportsareaccessibleviatheadministrationLAN
Allcablesareneatlydressedintotheracksandhavenosharpbendsorcrimps
Allrackdoorsandcoversareinstalledandcloseproperly
Allserversextendandretractwithoutpinchingorstretchingcables
Labels
Racks
EachrackinaRecommendedArchitectureislabeledatthetopoftherackandonboththefrontandback.RacksarenamedMasterRackorSegmentRack#,where#isasequentialnumberstartingat1.Aracklabelwouldlooklikethis:
Servers
Eachserverislabeledonboththefrontandbackoftheserver.Thelabelshouldbethehostnameoftheserver.
Inotherwords,ifasegmentnodeisknownassdw15,thelabelonthatserverwouldbesdw15.
Switches
Switchesarelabeledaccordingtotheirpurpose.Interconnectswitchesarei-sw,administrationswitchesarea-sw,andETLswitchesaree-sw.Eachswitchisassignedanumberstartingat1.Switchesarelabeledonthefrontoftheswitchonlysincethebackisgenerallynotvisiblewhenracked.
CertificationGuidelines
NetworkPerformanceTestgpcheckperf
Verifiesthelinerateonboth10GbNICs.
Run gpcheckperf onthedisksandnetworkconnectionswithinthecluster.Aseachcertificationwillvaryduetothenumberofdisks,nodes,andnetworkbandwidthavailable,thecommandstoruntestswilldiffer.
SeeUsinggpcheckperftoValidateDiskandNetworkPerformanceformoreinformationonthe gpcheckperf command.
HardwareMonitoringandFailureAnalysisGuidelinesInordertosupportmonitoringofarunningclusterthefollowingitemsshouldbeinplaceandcapableofbeingmonitoredwithinformationgatheredavailableviainterfacessuchasSNMPorIPMI.
©CopyrightPivotalSoftwareInc,2013-2016 16 A03
Fans/TempFanstatus/presence
Fanspeed
Chassistemp
CPUtemp
IOHtemp
MemoryDIMMtemp
DIMMstatus(populated,online)
DIMMsinglebiterrors
DIMMdoublebiterrors
ECCwarnings(correctionsexceedingthreshold)
ECCcorrectableerrors
ECCuncorrectableerrors
MemoryCRCerrors
SystemErrorsPosterrors
PCIefatalerrors
PCIenon-fatalerrors
CPUmachinecheckexception
Intrusiondetection
Chipseterrors
PowerPowerSupplypresence
Powersupplyfailures
Powersupplyinputvoltage
Powersupplyamperage
Motherboardvoltagesensors
Systempowerconsumption
©CopyrightPivotalSoftwareInc,2013-2016 17 A03
PivotalClusterExamplesThefollowingtablelistsgoodchoicesforclusterhardwarebasedonIntelSandyBridgeprocessor-basedserversandCiscoswitches.
Table1.HardwareComponents
MasterNode
Twoofthesenodespercluster
1Userver(similartotheDellR630):
2xE5-2680v3processors(2.5GHz,12cores,120W)
256GBRAM(8x16GB)
1xRAIDcardw/1GBprotectedcache
8xSAS,10k,6Gdisks(typically8x600GB,2.5”)Organizedintoasingle,RAID5diskgroupwithahotspare.LogicaldevicesdefinedaspertheOSneeds(boot,root,swap,etc.)andtheremaininginasingle,largefilesystemfordata
2x10GbIntel,QLogic,orEmulexbasedNICs
Lightsoutmanagement(IPMI-basedBMC)
2x650Worhigher,high-efficiencypowersupplies
SegmentNode&ETLNode
Upto16perrack.Nomaximumtotalcount
2Userver(similartotheDellR730xd):
2xE5-2680v3processors(2.5GHz,12cores,120W)
256GBRAM(8x16GB)
1xRAIDcardw/1GBprotectedcache
12to24xSAS,10k,6Gdisks(typically12x600GB,3.5”or24x1.8TB,2.5”)OrganizedintotwotofourRAID5groups.Usedeitherastwotofourdatafilesystems(withlogicaldevicesskimmedoffforboot,root,swap,etc.)orasonelargedeviceboundwithLogicalVolumeManager.
2x10GbIntel,QLogic,orEmulexbasedNICs
Lightsoutmanagement(IPMI-basedBMC)
2x650Worhigherhigh-efficiencypowersupplies
AdminSwitch
CiscoCatalyst2960Series
Asimple,48-port,1GBswitchwithfeaturesthatallowittobeeasilycombinedwithotherswitchestoexpandthenetwork.Theleastexpensive,managedswitchwithgoodreliabilityisappropriateforthisrole.Therewillbeatleastoneperrack.
Interconnect
Arista7050-52
TheAristaswitchlineallowsformulti-switchlinkaggregationgroups(calledMLAG),easyexpansion,andareliablebodyofhardwareandoperatingsystem.
©CopyrightPivotalSoftwareInc,2013-2016 18 A03
ExampleRackLayoutThefollowingfigureisanexampleracklayoutwithproperswitchandserverplacement.
Figure:42URackDiagram
©CopyrightPivotalSoftwareInc,2013-2016 19 A03
UsinggpcheckperftoValidateDiskandNetworkPerformanceThefollowingexamplesillustratehowgpcheckperfisusedtovalidatediskandnetworkperformanceinacluster.
CheckingDiskPerformance—gpcheckperfOutput
[gpadmin@mdw~]$gpcheckperf-fhosts-rd-D-d/data1/primary-d/data2/primary-S80G
/usr/local/greenplum-db/./bin/gpcheckperf-fhosts-rd-D-d/data1/primary-d/data2/primary-S80G
--------------------
DISKWRITETEST
--------------------
--------------------
DISKREADTEST
--------------------
====================
==RESULT
====================
diskwriteavgtime(sec):71.33diskwritetotbytes:343597383680
diskwritetotbandwidth(MB/s):4608.23
diskwriteminbandwidth(MB/s):1047.17[sdw2]diskwritemaxbandwidth(MB/s):1201.70[sdw1]
perhostbandwidth--
diskwritebandwidth(MB/s):1200.82[sdw4]diskwritebandwidth(MB/s):1201.70[sdw1]diskwritebandwidth(MB/s):1047.17[sdw2]diskwritebandwidth(MB/s):1158.53[sdw3]
diskreadavgtime(sec):103.17diskreadtotbytes:343597383680
diskreadtotbandwidth(MB/s):5053.03
diskreadminbandwidth(MB/s):318.88[sdw2]diskreadmaxbandwidth(MB/s):1611.01[sdw1]diskreadbandwidth(MB/s):1611.01[sdw1]diskreadbandwidth(MB/s):318.88[sdw2]diskreadbandwidth(MB/s):1560.38[sdw3]--perhostbandwidth--
CheckingNetworkPerformance—gpcheckperfOutput
©CopyrightPivotalSoftwareInc,2013-2016 20 A03
[gpadmin@mdw~]$gpcheckperf-fnetwork1-rN-d/tmp
/usr/local/greenplum-db/./bin/gpcheckperf-fnetwork1-rN-d/tmp
-------------------
--NETPERFTEST
-------------------
====================
==RESULT
====================
Netperfbisectionbandwidthtestsdw1->sdw2=1074.010000
sdw3->sdw4=1076.250000sdw2->sdw1=1094.880000sdw4->sdw3=1104.080000
Summary:
sum=4349.22MB/secmin=1074.01MB/secmax=1104.08MB/secavg=1087.31MB/secmedian=1094.88MB/sec
©CopyrightPivotalSoftwareInc,2013-2016 21 A03
PivotalGreenplumSegmentInstancesperServer
UnderstandingGreenplumSegmentsGreenplumsegmentinstancesareessentiallyindividualdatabases.InaGreenplumclustertherewillbeaGreenplummasterserverwhichdispatchesworktobedonetomultiplesegmentinstances.Eachoftheseinstanceswillresideonsegmenthosts.Dataforatableisdistributedacrossallofthesegmentinstancesandwhenaqueryisexecutedthatrequestsdataitisdispatchedtoallofthemtoexecuteinparallel.Thoseinstancesthatactivelyprocessthequeryarereferredtoastheprimaryinstances.AGreenplumclusterinadditionwillberunningmirrorinstances,onepairedtoeachprimary.Themirrorsdonotparticipateinansweringqueries;theyarejustperformingdatareplication,sothatifaprimaryshouldfailitsmirrorcantakeoverprocessinginitsplace.
Whenplanningacluster,itisimportanttounderstandthatalloftheseinstancesaregoingtoacceptaqueryinparallelandactuponit.Thereforetheremustbeenoughresourcesonaservertofacilitatealloftheseprocessesrunningandcommunicatingwitheachotheratonce.
SegmentsResourcesRuleofThumbAgeneralruleofthumbisthatforeverysegmentinstance(primaryormirror)youwillwanttoprovideatleast:
1core
200MB/sIOread
200MB/sIOwrite
8GBRAM
1GBnetworkthroughput
Asegmenthostwith8primaryand8mirrorinstanceswouldhave:
16cores
3200MB/sIOread
3200MB/sIOwrite
128GBRAM
20GBnetworkthroughput
Thesenumbershaveproventoprovideareliableplatformforavarietyofusecasesandgiveagoodbaselineforthenumberofinstancestorunonasingleserver.Pivotalrecommendsamaximumof8primaryand8mirrorinstancesonaservereveniftheresourcesprovidedaresufficientformore.
Pivotalhasfoundthatallocatingaratioof1to2physicalCPUsperprimarysegmentworkswellformostusecases;itisnotrecommendtodropbelow1CPUperprimarysegment.IdealarchitectureswilladditionallyalignNUMAarchitecturewiththenumberofsegments.
ReasonstoreducethenumberofsegmentinstancesperserverAdatabaseschemathatusespartitionedcolumnartableshasthepotentialtogeneratealargenumberoffiles.Forexample,atablethatispartitioneddailyforayearwillhaveover300files,oneforeachday.Ifthattableadditionallyhascolumnarorientationwith300columnsitwillhavewellover90,000filesrepresentingthedatainthattableononesegmentinstance.Aserverthatisrunning8primaryinstanceswiththistablewouldhavetoopen720,000filesifafulltablescanquerywereissuedtothattable.Systemsthatmakeuseofpartitioncolumnartablesmaybenefitfromalessernumberofsegmentinstancesperserverifdataisbeingusedinawaythatrequiresmanyopenfiles.
Systemsthatspanlargenumbersofnodescreatemoreworkforthemastertoplanqueriesanddocoordinationofallofthesegments.Insystemsspanningtwoormoreracksconsiderreducingthenumberofsegmentinstancesperserver.
Whenqueriesrequirelargeamountsofmemoryreducingthenumberofsegmentsperserverincreasestheamountofmemoryavailabletoanyonesegment.
Iftheamountofconcurrentqueryprocessingcausesresourcestorunlowonthesystem,reducingtheamountofparallelismontheplatformitselfwillallowformoreparallelisminqueryexecution.
Reasonstoincreasethenumberofsegmentinstancesperserver
©CopyrightPivotalSoftwareInc,2013-2016 22 A03
Inlowconcurrencysystemsincreasingthesegmentinstancecountwillalloweachquerytoutilizemoreresourcesinparallelifsystemutilizationislow.
SystemswithlargeamountsoffreeRAMthatcanbeusedbytheOSforfilebuffersmaybenefitfromincreasingthenumberofsegmentinstancesperserver.
©CopyrightPivotalSoftwareInc,2013-2016 23 A03
PivotalGreenplumonVirtualizedSystems
GeneralunderstandingofPivotalGreenplumandvirtualizationGreenplumDatabaseisaparallelprocessingsoftware.ThismeansthatthePivotalGreenplumsoftwareoftendoesthesameprocessatthesametimeacrossaclusterofnodes.Virtualizationisfrequentlyusedtocentralizesystemssothattheywillbeabletoshareresources,takingadvantageofthefactthatsoftwareoftenutilizesresourcessporadically,allowingthoseresourcestobeover-subscribed.GreenplumDatabasewillnotfunctionwellinanoversubscribedenvironmentbecauseallsegmentsbecomeactiveatonceduringqueryprocessing.Inthattypeofenvironment,thesystemispronetobottlenecksandunpredictablebehaviorthatcouldresultfrombeingunabletoaccessresourcesthesystembelievesithasbeenallocated.
Withthisinmind,aslongasthesystemmeetstherequirementssetforthintheinstallationguide,Greenplumissupportedonvirtualinfrastructure.
ChoosingthenumberofsegmentinstancestorunperVMTherecommendedhardwarespecificationsarequitelargeandmaybehardtoachieveinavirtualenvironment.InthesecaseseachVMshouldhavenomorethan1primaryand1mirrorsegmentforevery2CPUs,32GBofRAM,and300MB/sofsequentialreadbandwidthandwritebandwidth.ThusaVMwith4CPU,64GBRAM,and1GB/ssequentialreadandwritewouldbeabletohost2primarysegmentinstancesand2mirrorsegmentinstances.
WhileitispossibletocreatesegmenthostVMsthatonlyhostasingleprimarysegmentinstance,itispreferredtohaveatleasttwoormoreprimarysegmentinstancesperVM.Certainqueriesthatperformtaskssuchaslookingforuniquenesscancausesomesegmentinstancestoperformmorework,andrequiremoreresources,thanotherinstances.Groupingmultiplesegmentinstancestogetherononeservercanmitigatesomeoftheseincreasedresourceneedsbyallowingasegmentinstancetoutilizetheresourcesallocatedtotheothersegmentinstances.
VMEnvironmentSettingsVMshostingGreenplumDatabaseshouldnothaveanyauto-migrationfeaturesturnedon.Thesegmentinstancesareexpectingtoruninparallelandifoneofthemispausedtocoalescememoryorstateformigrationthesystemcanseeitasafailureoroutage.Itwouldbebettertotakethesystemdown,removeitfromtheactiveclusterandthenintroduceitbackintoclusteronceithasbeenmoved.
Specialcareshouldbegiventounderstandthetopologyofprimaryandmirrorsegmentinstances.NosetofVMsthatcontainaprimaryanditsmirrorshouldrunonthesamehostsystem.Ifahostcontainingboththeprimaryandmirrorforasegmentfails,theGreenplumclusterwillbeofflineuntilatleastoneofthemisrestoredtocompletethedatabasecontent.
©CopyrightPivotalSoftwareInc,2013-2016 24 A03
AdditionalHelpfulTools
YumRepositoryConfiguringaYUMrepositoryonthemasterserverscanmakemanagementofthesoftwareacrosstheclustermoreefficient,particularlyincaseswherethesegmentnodesdonothaveexternalinternetaccess.Morethanonerepositorycanmakemanagementeasier,forexampleonerepositoryforOSfilesandanotherforallotherpackages.Configuretherepositoriesonboththemasterandstandbymasterservers.
KickstartImagesKickstartimagesforthemasterserversandsegmenthostscanspeedupimplementationofnewserversandrecoveryoffailednodes.Inmostcaseswherethereisanodefailurebutthedisksaregood,reimagingisnotnecessarybecausethedisksinthefailedservercanbetransferredtothenewreplacementnode.
©CopyrightPivotalSoftwareInc,2013-2016 25 A03