How Sitecore depends on MongoDB for scalability and ... · How Sitecore depends on MongoDB for...

50
How Sitecore depends on MongoDB for scalability and performance, and what it can teach you Antonios Giannopoulos Database Administrator – ObjectRocket Grant Killian Sitecore Architect - Rackspace Percona Live 2017

Transcript of How Sitecore depends on MongoDB for scalability and ... · How Sitecore depends on MongoDB for...

HowSitecoredependsonMongoDBforscalabilityandperformance,and

whatitcanteachyouAntoniosGiannopoulos

DatabaseAdministrator– ObjectRocket

GrantKillianSitecoreArchitect- Rackspace

PerconaLive2017

AgendaWe are going to discuss:Key terms- Introduction to Sitecore- Introduction to MongoDBBest Practices for MongoDB with SitecoreScaling SitecoreBenchmarks

Who We AreAntonios GiannopoulosDatabase Administrator w/ ObjectRocket

Grant KillianSitecore Architect w/ RackspaceSitecore MVP

Sitecore ArchitectureMinimum necessary to understand this talk

Gartner Magic Quadrant for WCM (Web

Content Management)

-Sept 2016

Sitecore is a framework for building websites...

Sitecore ♥ MongoDB because . . .

● Unstructured document model is a better fit for Sitecore analytics vs traditional database rows

● ∞ scalability

● Introduces key flexibility to the system○ HTTP Session state○ Optional repository for other Sitecore modules○ 100% replacement for SQL Server (experimental)

■ $$$

MongoDBreplica-setAgroupof mongodprocessesthatmaintainthesamedataset

Replicasetsprovides:- Redundancy- Highavailability- Scaling

MongoDBreplica-setConsistsofatleast3nodes- Upto50nodesin3.0andhigher- 12onpreviousversions

Areplica-setnodemaybeeither:- Primary- Secondary- Arbiter

MongoDBreplica-setAsynchronousreplication- DelaybetweenPRIandSECs- SECspullandapplyoperationsAutomaticfailover- IfaPRIfailsaSECtakesitsplace

MongoDBreplica-setBestPractices- Oddnumberofmembers- Usesameserverspecs- Reliablenetworkconnections- Adjusttheoplogaccordingly

MongoDBShardedClustersConsistsof:Mongos- It’sastatement(query)router- Connectioninterfaceforthedriver- makesshardingtransparent

ConfigServers:Holdsclustermetadata- locationofthedataShards:Containsasubsetoftheshardeddata

MongoDBShardedClusters

MongoDBShardedClustersBestPractices- Deployshardsasreplica-sets- Reliablenetworkconnections- Butmostimportant…pickashardkey

Undoashardkeymightrequiredowntime

MongoDBShardedClustersWhatmakesagoodshardkey:- HighCardinality- NotNullvalues- Immutablefield(s)- NotMonotonicallyincreasedfields- Evenread/writedistribution- Evendatadistribution- Readtargeting/locality

Most important choose a shard key according to your application requirements

MongoDBStorageEnginesMongoDBversion3.0andhighersupports:- MMAPv1- WiredTiger- RocksDB(PerconaServer)- InMemory(PerconaServer)- FractalTree(PerconaServer)

SitecoreMongoDBDatabases1. Analytics- customervisitmetrics(IPaddress,browser,pages…)2. Tracking_contact- contactprocessing3. Tracking_history- historyworkerqueueforfullrebuilds4. Tracking_live- taskqueueforreal-timeprocessing5. Private_session- “classic”httpsessionstate6. Shared_session- metahttpsessionstateforcontacts

(engagementstateforlivetimeofinteractions…)

Forexample...

Graphic courtesy of http://www.techphoria414.com

ScalingSitecore– SeparateWorkloadsMoveeachSitecoredatabasetoaseparateinstance

SitecoreusesdifferentconnectionstringperDatabaseconnectionString="mongodb://_mongo_server_01_:_port_number_/_session_database_name_"/>connectionString="mongodb://_mongo_server_02_:_port_number_/_analytics_database_name_"/>

Instancescanbeoptimizedaccordingtotheirworkload

ScalingSitecore– PolyglotUseadifferentstorageengineperdatabase:- Differentinstances- Shardedclusters,differentstorageenginespershard

PerconaIn-memorystorageengineisagoodfitfor_sessions- Basedonthein-memorystorageengineusedinMongoDBEnterpriseEdition- _sessionsdataarenotpersistent

ScalingSitecore- ShardingWhattoshard:- Largecollectionsforcapacity- Busycollectionsforloaddistribution

Howtopickashardkey:- Collectarepresentativestatementsampleandidentifystatementpatterns- Pickashardkeythatscalestheworkload/statements- Meetshardingconstraints

ScalingSitecore- ShardingFromSitecoredocumentation:“Sitecorecalculatesdiskspacesizingprojectionsusing5KBperinteractionand2.5KBperidentifiedcontactandthesetwoitemsmakeup80%ofthediskspace”

Shardinginteraction andcontact forcapacity.

ScalingSitecore- ShardingCollectionInteractionReceives:Inserts,QueriesandUpdatesRead/WriteRatio:60-40

Updates areusingthe_id

Queries areusing:"_id,ContactId”:80%"ContactId,_t”:5%"ContactId,ContactVisitIndex”:15%

ScalingSitecore- ShardingCollectionInteractionRecommendedshardkeyis _id:1or_id:hashed- Scalevastmajorityofstatements- But…fewscatter-gatherqueries(around20%)

{ContactId:1}isalsodecent,But:- Updatesonsharded collectionsMUST usetheshardkey(or{multi:true})- _idanexceptiontothatrule- _idisgeneratedbytheapplicationnotthedriver- PotentialforJumbochunks

ScalingSitecore- ShardingCollectionInteractionChooseyourshardkeyaccordingtoyourengine- MMAP_id:1or_id:hashed- WiredTiger_id:1or_id:hashedorContactId:1

SitecoremayoptimizeshardingbyincludingContactId ontheupdates

ScalingSitecore- ShardingCollectionContactsReceives:Inserts,QueriesandUpdatesRead/WriteRatio:80-20

Updates areusingthe_id

Queries areusingthe_id (withadditionalfields)

Recommendedshardkeyis _id:1or_id:hashed

ScalingSitecore- ShardingCollectionDevicesRecommendedshardkeyis _id:1or_id:hashed

CollectionClassificationsMapRecommendedshardkeyis _id:1or_id:hashed

CollectionKeyBehaviorCacheRecommendedshardkeyis _id:1or_id:hashed

ScalingSitecore- ShardingCollectionGeoIpsRecommendedshardkeyis _id:1or_id:hashed

CollectionOperationStatusesRecommendedshardkeyis _id:1or_id:hashed

CollectionReferringSitesRecommendedshardkeyis _id:1or_id:hashed

ScalingSitecore- Sharding{_id:1}vs{_id:hashed}

Clientgenerated_idaremonotonicallyincreasedthus“hashed”addedforrandomness

Sitecore_id isa.NETUUID(UniversallyUniqueIdentifier)bundledonBinDatadatatypeExample:"_id":BinData(3,"1eDJ1NXU8EeiD5a6WJtxbA==")

ScalingSitecore- Sharding{_id:1}vs{_id:hashed}

Youmayusetheuuidhelpers.js utilitytoconvert_id toUUIDDownloadfrom:https://github.com/mongodb/mongo-csharp-driver/blob/master/uuidhelpers.js

>doc=db.test.findOne(){"_id":BinData(3,"1eDJ1NXU8EeiD5a6WJtxbA==")}>doc._id.toCSUUID()CSUUID("d4c9e0d5-d4d5-47f0-a20f-96ba589b716c")

ScalingSitecore- ShardingUse{_id:"hashed”}whenyouhaveanemptycollection

Using numInitialChunksallowstopre-splitanddistributeemptychunks.- Avoidchunksplits- Avoidchunkmoves

db.adminCommand({shardCollection:<collection>,key:{_id:”hashed”},numInitialChunks:<number>}),number <8192pershard.

ScalingSitecore- ShardingUse{_id:"hashed”}whenyouhaveanemptycollection

DefinenumInitialChunksSize=Collectionsize(inMB)/32Count=Numberofdocuments/125000Limit=Numberofshards*8192

numInitialChunks=Min(Max(Size,Count),Limit)

ScalingSitecore- ShardingMovePrimaryMoveeachsitecoredatabasetoadifferentshard:(analytics,tracking_live…)

db.runCommand({movePrimary:<databaseName>,to:<newPrimaryShard>})

Requiresdowntimeforlivedatabases

ScalingSitecore– SecondaryReadsYou can configure Secondary Reads from the driver (secondary or secondaryPreferred)

connectionString="mongodb://_mongo_server_01_:_port_number_/_session_database_name_?readPreference=secondary/>

In 3.4 maxStalenessSeconds was introduced to control stale readsSpecifies, in seconds, how stale a secondary can be before the client stops using it for read operations

ScalingSitecore– SecondaryReadsUseReplicaSetTagstotargetreads:- Directreadstospecificreplicasetnodes- Reducesavailabilityconf=rs.conf();conf.members[0].tags={"db":"analytics"}rs.reconfig(conf)

SetreadPreferenceTagsontheconnectionstringconnectionString="mongodb://_mongo_server_01_:_port_number_/_session_database_name_?readPreferenceTags=analytics/>OrdermatterswhensettingmultipletagsOrdermatters

ScalingSitecore– MultiRegionChallenges:- Directreadstotheclosestnode- Directwritestotheclosestnode- Singledatabaseentityforreporting- Minimumcomplexity

ScalingSitecore– MultiRegionReplicaSet:- Targetreadsusingnearest readconcern- Targetreadsusingregionbasedtags- Writesmust gotothePrimary- Requiresatleastonesecondaryperregion

ScalingSitecore– MultiRegionShardedcluster:- Targetreadsusingnearest readconcern- Targetreadsusingregionbasedtags- Requiresatleastonesecondaryperregion- Writesmust gotothePrimaries- TagsorZonesarebasedonshardkeyranges- Addlocationtoshardkeyasprefix– changethesourcecode

ScalingSitecore– MultiRegionMongotoMongoconnector:- CreatesapipelinefromaMongoDBclustertoanother

MongoDBcluster- Readsandreplicatesoplogoperations- Easydeploymentmongo-connector-m<name:port>-t<name:port>-d<database>

ScalingSitecore– Connector

oplog oplog

db.Insert.foo({a:1})

db.Insert.foo({_id:1,a:1})

{"ts":Timestamp(),"h":NumLong(),"v":2,"op":"i","ns”:”foo.foo”,"o":{

"_id":1,a:1}

ScalingSitecore– MultiRegion

Mongo to Mongo Connector

ScalingSitecore– MultiRegion

Mongo to Mongo Connector

ScalingSitecore– MultiRegion

Mongo to Mongo Connector

BenchmarksBenchmark1: Single/ReplicasetMMAPvsSingleshard/ReplicasetWiredTiger(3.2.8)

Results:WiredTigeris9.5%faster

Benchmark2: ShardedclusterMMAPvsShardedclusterWiredTiger(Analyticsshardedon{_id:1})

Results: WiredTigeris9.4%faster

Sowhat?- Evaluate your MongoDB architecture to determine if it

would benefit from scaling- If scaling is in order, consider this talk as a

reference- Recognize how MongoDB’s versatility makes it

relevant to a wide variety of applications

Whatsnext?- Test MongoRocks (Percona Server) against Sitecore- Test In-Memory (Percona Server) for sessions or

cache(s)- Expand sharding recommendations on add-ons- Evaluate other Sitecore modules for suitability with

MongoDB- Re-invent our benchmarks

We’re Hiring! Looking to join a dynamic & innovative team?

Justine is here at Percona Live 2017,

Reach out directly to our Recruiter at [email protected]

Questions?Thankyou!!!

[email protected]@iamantonios

🍍

[email protected]@sitecoreagent