How Sitecore depends on MongoDB for scalability and ... · How Sitecore depends on MongoDB for...
Transcript of How Sitecore depends on MongoDB for scalability and ... · How Sitecore depends on MongoDB for...
HowSitecoredependsonMongoDBforscalabilityandperformance,and
whatitcanteachyouAntoniosGiannopoulos
DatabaseAdministrator– ObjectRocket
GrantKillianSitecoreArchitect- Rackspace
PerconaLive2017
AgendaWe are going to discuss:Key terms- Introduction to Sitecore- Introduction to MongoDBBest Practices for MongoDB with SitecoreScaling SitecoreBenchmarks
Who We AreAntonios GiannopoulosDatabase Administrator w/ ObjectRocket
Grant KillianSitecore Architect w/ RackspaceSitecore MVP
Sitecore ♥ MongoDB because . . .
● Unstructured document model is a better fit for Sitecore analytics vs traditional database rows
● ∞ scalability
● Introduces key flexibility to the system○ HTTP Session state○ Optional repository for other Sitecore modules○ 100% replacement for SQL Server (experimental)
■ $$$
MongoDBreplica-setAgroupof mongodprocessesthatmaintainthesamedataset
Replicasetsprovides:- Redundancy- Highavailability- Scaling
MongoDBreplica-setConsistsofatleast3nodes- Upto50nodesin3.0andhigher- 12onpreviousversions
Areplica-setnodemaybeeither:- Primary- Secondary- Arbiter
MongoDBreplica-setAsynchronousreplication- DelaybetweenPRIandSECs- SECspullandapplyoperationsAutomaticfailover- IfaPRIfailsaSECtakesitsplace
MongoDBreplica-setBestPractices- Oddnumberofmembers- Usesameserverspecs- Reliablenetworkconnections- Adjusttheoplogaccordingly
MongoDBShardedClustersConsistsof:Mongos- It’sastatement(query)router- Connectioninterfaceforthedriver- makesshardingtransparent
ConfigServers:Holdsclustermetadata- locationofthedataShards:Containsasubsetoftheshardeddata
MongoDBShardedClustersBestPractices- Deployshardsasreplica-sets- Reliablenetworkconnections- Butmostimportant…pickashardkey
Undoashardkeymightrequiredowntime
MongoDBShardedClustersWhatmakesagoodshardkey:- HighCardinality- NotNullvalues- Immutablefield(s)- NotMonotonicallyincreasedfields- Evenread/writedistribution- Evendatadistribution- Readtargeting/locality
Most important choose a shard key according to your application requirements
MongoDBStorageEnginesMongoDBversion3.0andhighersupports:- MMAPv1- WiredTiger- RocksDB(PerconaServer)- InMemory(PerconaServer)- FractalTree(PerconaServer)
SitecoreMongoDBDatabases1. Analytics- customervisitmetrics(IPaddress,browser,pages…)2. Tracking_contact- contactprocessing3. Tracking_history- historyworkerqueueforfullrebuilds4. Tracking_live- taskqueueforreal-timeprocessing5. Private_session- “classic”httpsessionstate6. Shared_session- metahttpsessionstateforcontacts
(engagementstateforlivetimeofinteractions…)
ScalingSitecore– SeparateWorkloadsMoveeachSitecoredatabasetoaseparateinstance
SitecoreusesdifferentconnectionstringperDatabaseconnectionString="mongodb://_mongo_server_01_:_port_number_/_session_database_name_"/>connectionString="mongodb://_mongo_server_02_:_port_number_/_analytics_database_name_"/>
Instancescanbeoptimizedaccordingtotheirworkload
ScalingSitecore– PolyglotUseadifferentstorageengineperdatabase:- Differentinstances- Shardedclusters,differentstorageenginespershard
PerconaIn-memorystorageengineisagoodfitfor_sessions- Basedonthein-memorystorageengineusedinMongoDBEnterpriseEdition- _sessionsdataarenotpersistent
ScalingSitecore- ShardingWhattoshard:- Largecollectionsforcapacity- Busycollectionsforloaddistribution
Howtopickashardkey:- Collectarepresentativestatementsampleandidentifystatementpatterns- Pickashardkeythatscalestheworkload/statements- Meetshardingconstraints
ScalingSitecore- ShardingFromSitecoredocumentation:“Sitecorecalculatesdiskspacesizingprojectionsusing5KBperinteractionand2.5KBperidentifiedcontactandthesetwoitemsmakeup80%ofthediskspace”
Shardinginteraction andcontact forcapacity.
ScalingSitecore- ShardingCollectionInteractionReceives:Inserts,QueriesandUpdatesRead/WriteRatio:60-40
Updates areusingthe_id
Queries areusing:"_id,ContactId”:80%"ContactId,_t”:5%"ContactId,ContactVisitIndex”:15%
ScalingSitecore- ShardingCollectionInteractionRecommendedshardkeyis _id:1or_id:hashed- Scalevastmajorityofstatements- But…fewscatter-gatherqueries(around20%)
{ContactId:1}isalsodecent,But:- Updatesonsharded collectionsMUST usetheshardkey(or{multi:true})- _idanexceptiontothatrule- _idisgeneratedbytheapplicationnotthedriver- PotentialforJumbochunks
ScalingSitecore- ShardingCollectionInteractionChooseyourshardkeyaccordingtoyourengine- MMAP_id:1or_id:hashed- WiredTiger_id:1or_id:hashedorContactId:1
SitecoremayoptimizeshardingbyincludingContactId ontheupdates
ScalingSitecore- ShardingCollectionContactsReceives:Inserts,QueriesandUpdatesRead/WriteRatio:80-20
Updates areusingthe_id
Queries areusingthe_id (withadditionalfields)
Recommendedshardkeyis _id:1or_id:hashed
ScalingSitecore- ShardingCollectionDevicesRecommendedshardkeyis _id:1or_id:hashed
CollectionClassificationsMapRecommendedshardkeyis _id:1or_id:hashed
CollectionKeyBehaviorCacheRecommendedshardkeyis _id:1or_id:hashed
ScalingSitecore- ShardingCollectionGeoIpsRecommendedshardkeyis _id:1or_id:hashed
CollectionOperationStatusesRecommendedshardkeyis _id:1or_id:hashed
CollectionReferringSitesRecommendedshardkeyis _id:1or_id:hashed
ScalingSitecore- Sharding{_id:1}vs{_id:hashed}
Clientgenerated_idaremonotonicallyincreasedthus“hashed”addedforrandomness
Sitecore_id isa.NETUUID(UniversallyUniqueIdentifier)bundledonBinDatadatatypeExample:"_id":BinData(3,"1eDJ1NXU8EeiD5a6WJtxbA==")
ScalingSitecore- Sharding{_id:1}vs{_id:hashed}
Youmayusetheuuidhelpers.js utilitytoconvert_id toUUIDDownloadfrom:https://github.com/mongodb/mongo-csharp-driver/blob/master/uuidhelpers.js
>doc=db.test.findOne(){"_id":BinData(3,"1eDJ1NXU8EeiD5a6WJtxbA==")}>doc._id.toCSUUID()CSUUID("d4c9e0d5-d4d5-47f0-a20f-96ba589b716c")
ScalingSitecore- ShardingUse{_id:"hashed”}whenyouhaveanemptycollection
Using numInitialChunksallowstopre-splitanddistributeemptychunks.- Avoidchunksplits- Avoidchunkmoves
db.adminCommand({shardCollection:<collection>,key:{_id:”hashed”},numInitialChunks:<number>}),number <8192pershard.
ScalingSitecore- ShardingUse{_id:"hashed”}whenyouhaveanemptycollection
DefinenumInitialChunksSize=Collectionsize(inMB)/32Count=Numberofdocuments/125000Limit=Numberofshards*8192
numInitialChunks=Min(Max(Size,Count),Limit)
ScalingSitecore- ShardingMovePrimaryMoveeachsitecoredatabasetoadifferentshard:(analytics,tracking_live…)
db.runCommand({movePrimary:<databaseName>,to:<newPrimaryShard>})
Requiresdowntimeforlivedatabases
ScalingSitecore– SecondaryReadsYou can configure Secondary Reads from the driver (secondary or secondaryPreferred)
connectionString="mongodb://_mongo_server_01_:_port_number_/_session_database_name_?readPreference=secondary/>
In 3.4 maxStalenessSeconds was introduced to control stale readsSpecifies, in seconds, how stale a secondary can be before the client stops using it for read operations
ScalingSitecore– SecondaryReadsUseReplicaSetTagstotargetreads:- Directreadstospecificreplicasetnodes- Reducesavailabilityconf=rs.conf();conf.members[0].tags={"db":"analytics"}rs.reconfig(conf)
SetreadPreferenceTagsontheconnectionstringconnectionString="mongodb://_mongo_server_01_:_port_number_/_session_database_name_?readPreferenceTags=analytics/>OrdermatterswhensettingmultipletagsOrdermatters
ScalingSitecore– MultiRegionChallenges:- Directreadstotheclosestnode- Directwritestotheclosestnode- Singledatabaseentityforreporting- Minimumcomplexity
ScalingSitecore– MultiRegionReplicaSet:- Targetreadsusingnearest readconcern- Targetreadsusingregionbasedtags- Writesmust gotothePrimary- Requiresatleastonesecondaryperregion
ScalingSitecore– MultiRegionShardedcluster:- Targetreadsusingnearest readconcern- Targetreadsusingregionbasedtags- Requiresatleastonesecondaryperregion- Writesmust gotothePrimaries- TagsorZonesarebasedonshardkeyranges- Addlocationtoshardkeyasprefix– changethesourcecode
ScalingSitecore– MultiRegionMongotoMongoconnector:- CreatesapipelinefromaMongoDBclustertoanother
MongoDBcluster- Readsandreplicatesoplogoperations- Easydeploymentmongo-connector-m<name:port>-t<name:port>-d<database>
ScalingSitecore– Connector
oplog oplog
db.Insert.foo({a:1})
db.Insert.foo({_id:1,a:1})
{"ts":Timestamp(),"h":NumLong(),"v":2,"op":"i","ns”:”foo.foo”,"o":{
"_id":1,a:1}
BenchmarksBenchmark1: Single/ReplicasetMMAPvsSingleshard/ReplicasetWiredTiger(3.2.8)
Results:WiredTigeris9.5%faster
Benchmark2: ShardedclusterMMAPvsShardedclusterWiredTiger(Analyticsshardedon{_id:1})
Results: WiredTigeris9.4%faster
Sowhat?- Evaluate your MongoDB architecture to determine if it
would benefit from scaling- If scaling is in order, consider this talk as a
reference- Recognize how MongoDB’s versatility makes it
relevant to a wide variety of applications
Whatsnext?- Test MongoRocks (Percona Server) against Sitecore- Test In-Memory (Percona Server) for sessions or
cache(s)- Expand sharding recommendations on add-ons- Evaluate other Sitecore modules for suitability with
MongoDB- Re-invent our benchmarks
We’re Hiring! Looking to join a dynamic & innovative team?
Justine is here at Percona Live 2017,
Reach out directly to our Recruiter at [email protected]