Dynamics CRM high volume systems - lessons from the field

40
High Volume Systems Lessons from the Field Stéphane Dorrekens, RealDolmen Philip Verlinden, RealDolmen

Transcript of Dynamics CRM high volume systems - lessons from the field

High Volume SystemsLessons from the Field

Stéphane Dorrekens, RealDolmen

Philip Verlinden, RealDolmen

Session Objectives And Takeaways

•Session Objectives:

•Demystify High Volumes• Stories from the Field !

•Key Takeaways

• Key attention points for high volumes•Draft design considerations

Field Stories

• Financial Institution (8.000 users, 2 implementations, 350Gb)

• Financial Institution (2.000 users; 2 implementations; 2500Gb)

• Financial Institution (1.000 users, 6 implementations, 450Gb+50Gb/Month for biggest implementation )

• Public Transport (500 users, 1 implementation, )

• Pharmaceutical (400 users , 2 implementations, 300Gb)

What is

High

Volume ?

What is High Volume ?• Typical inflexion points (where system behavior is impacted)

• High Volume Data• >1 Mio for base record tables (ie: Contact, Account,..)• >5000 activities/day or >1Mio/year

• High Volume Transactions

• High Volume Users/Security• >300 concurrent users • >1.000 teams, BU

• High Volume vs Enterprise Scale• Small/Medium companies can have huge amounts of data (ex:

associations/volunteers)• Small/Medium companies can have huge amounts of transactions/sec (ex: web

interactions)• Only Enterprise has huge number of users and complex security

• CRM Online vs On Premise• Data Cost & Max GB (<1Tb)

Impact of High Volume

•The Bad News

Where’s the high-volume switch?

High-Volume setting (on/off):

•The Good News

It does work …

with some effort…

Impact of High Volume

• Less Room for “Improvisation” or “Lack of Design”

• Every point not taken care of or thought over will be an issue

• Some out of the box features are not designed for high volume

• Charts limits of 50.000 records

• Team membership > 1.000 teams

• No out of the box archiving

• Reporting and Transactions share the same Database

• Queries are locking per default

• …

• Don’t load millions of Contacts into the system and expect it to just work.

Trending

• Omni –channel+ More Social

+ More Customer Interactions

Challenge the customer on possible:- Data Security Walls <> Performance

• Customer Centric• Customer expectations

• CRM & CEM

Infrastructure

Design

Infrastructure DesignHigh Volume of Data

• SQL Server is the Key point• Get the numbers ! (Size, Growth, Repartition of data between tables/entities )• Data Layer IOPS check (especially SAN); Disk is more important than RAM at

that level as the Disk/RAM ratio is lower than usual.• RAM (Absolute Minimum 10%; Recommended Minimum 30%)• SQL 2014 Std maximum is 128Gb (64 for 2012), SQL Enterprise maximum is 4Tb• Index but not over the top• Disks Configuration• Tempdb usage/location (Considers SSD)• Consider using a separate reporting database• set the MaxServerMemory setting in SQL SQL to reserve 4/6 Gb of system

memory for the OS and other applications

• Not to DO• Use non dedicated LUNs• Use an old SAN infrastructure (IOPS check)• Forget to reserve failover host resources• Forget Testing Maintenance (see last section)

MDFSQL Server 2014

1x4 Cores250 GB RAM

LDF

LUN A

LUN B

1024 Gb

50 Gb

Infrastructure DesignHigh Volume of Data

• … but SQL is not the only point impacted• Data Upload

• Front-end server sizing (even temporary)

• Bandwidth• a shared 1Gb/s can be overloaded

• Deployment• Backup/Restore to another environment can takes literally days

• Virtualization• Specific issues : Host reservation (RAM), Dedicated LUNs, ..

• Not to Do• Upload 50 Mio Records on a massive web server farm and not warning the

network team.• Hint: maybe they have automatic Denial of Service protection?

Infrastructure DesignHigh Volume of Users

• Performance vs Stability• Minimum of a pair front-end with LB

• LB is not enough, rapid failover is needed

• App pools• Limits

• Recycle

• Separate roles• Separate back-end servers

• Separate endpoints

• To do• Test Failover by hanging an app pool, don’t expect LB to work OOTB

• Reboot Front-End Daily

Front-End Server

1x2 Cores8 GB RAM

Organization Web ServiceRole

1x2 Cores8 GB RAM

Front-End Server

1x2 Cores8 GB RAM

Organization Web ServiceRole

1x2 Cores8 GB RAM

CRM.Contoso.local

CRMServices.Contoso.local

Back End Role

1x4 Cores8 GB RAM

Functional

Design

Functional Design

• 80/20 rule

• OOTB versus Code

• Where to invest and where not ?

• All Recommendations which follows are interlinked, example :• Deleting workflows history is good but faster if no POA record exists

Normalise but…

• Normalise your datamodel

• Map it on the CRM datamodel as much as possible

• Get and map the numbers

• And reflect…• Security Design (CRM)

• Volumes • B2C in Germany (+80 Mio or Belgium +11 Mio)

• Functional (MUI)

• Online-proof: FetchXML <> FilteredViews

Security, from performant to less performant… from standard to exception

• Business Unit / Roles• Access Teams

• Hierarchical Security• Team Ownership

• Sharing (User/Team)

Field story on Teams numbers• Non-Scalable: Strictly limit the number of Owner teams a user is a member of

• Scalable: Access Teams but think numbers and maintenance

Field story on Sharing: keep the POA under control• Disable the “Share record with the previous owner”

• Parental Relationships to Referential or remove ‘Reparent’ cascade

• Workflows ownership: Async Service user

• Keep monitoring the POA Size

Views 1/2• Only show the minimum of records needed for the feature

• Uses preemptive filtering instead of after the fact filtering (user based)• The rationale is that the Top X records are shown and ordered by AFTER the security

model is applied. This combination forbids use of a non-clustered index and we cannot change the clustered index (sequential GUIDs)

• Thus the less records you have to sort; the fastest

• Specify a period if possible (ie: Last Year, Last 6 Months)• This avoids degradation overtime and limits #Records to sort

• Remove sort order if not needed (via SDK, update FetchXML )• Uses the database order (sequential GUID) if sort is not needed

• The rationale is to remove the order by impact

• Don’t use the ‘ALL’ views as default• Make the default the most restrictive view needed (ie: ‘My’ or ‘My in that

period’)

Views 2/2• Don’t use a specific view for search reasons, search is ALWAYS global.

• If only search is needed, use a view returning no records• The optimal is to search for a non existing record; non existing value on an indexed field

• If you want to combine search and a default list; use the most restrictive list needed.

• Explain to the users the concept of global search

• Find Fields

• Use the most restrictive list of ‘find fields’ as ALL those fields will be searched

• Those principles are not only valid for ‘standard views’, they should be applied on:

• Dashboard views

• Lookups views

• Search Result View

QuickFind Filtering

• How to search in a view:• Simple: OOTB dashboard List

• Harder?

Field Story on Quicfkfind with 10 Mio contacts?• Filter with

• Datamodel

• User preference (custom entitiy: POS or global)

• PreRetrieveMultiple add “where”-clause

• Button to change scope

Plug-ins• Limit the Plug-ins registered per Events

Only one plug-in execute should fire for a given event; thus the plug-in class naming convention should be :

xx.Plugins.<Entity>.<Message>.<Type>. xx.Plugins.Account.Create.Sync

Only one plug-in dll should exists for a given entity: xx.Plugins.Account

This also allows sharing of information between the different steps of the same pipeline (ie: SharedVariables)

• Make sure to define the filtering attributes for Plug-ins.

Because CRM uses now the auto save every 30 seconds, they will be triggered frequently.

• Implement Tracing

Some issues only occurs under heavy stress (ie: bad instantiations); good tracing will helps you tremendously

Use FetchXML aggregation when possible (especially count)

• Do not use any kind of static instantiations, it will fail under heavy load

The sandbox tax

1st Test with Registration:Sandbox

[2015-03-30 20:14:34.858] Process: w3wp |Organization:e1f759d5-8cd6-e311-93f9-00155d1a7308 |Thread: 8> OrganizationSdkService starts processing request for user:0d57f89f-bf5e-4e4c-b4ab-8ad6d9b2e039Request Xml:Update<UpdateRequest

[2015-03-30 20:14:34.987] Process: w3wp |Organization:e1f759d5-8cd6-e311-93f9-00155d1a7308 |Thread: 8 |Category: Sandbox |User: 0d57f89f-bf5e-4e4c-b4ab-8ad6d9b2e039 |Level: Info |ReqId: 631e39a9-8527-4397-8fe1-275f1f9e1102 | SandboxCodeUnit.Execute ilOffset = 0x394> SandboxPlugin.Execute: exit

[2015-03-30 20:14:34.987] Process: w3wp |Organization:e1f759d5-8cd6-e311-93f9-00155d1a7308 |Thread: 8 |Category: Platform.Sdk |User: 0d57f89f-bf5e-4e4c-b4ab-8ad6d9b2e039 |Level: Verbose |ReqId: 631e39a9-8527-4397-> MessageProcessor start executing step 58e202dd-4b49-417a-b732-f95957ef99b0 of type 'Microsoft.Crm.ObjectModel.MultiCurrencyPlugin' synchronously for message 'Update' for entity 'contact'

129msec to complete the call

2nd Test with Registration:None

[2015-03-30 20:20:44.900] Process: w3wp |Organization:e1f759d5-8cd6-e311-93f9-00155d1a7308 |Thread: 33 > OrganizationSdkService starts processing request for user:0d57f89f-bf5e-4e4c-b4ab-8ad6d9b2e039As user:5c0e5ad5-8cd6-e311-93f9-00155d1a7308Request Xml:Update<UpdateRequest

[2015-03-30 20:20:44.963] Process: w3wp |Organization:e1f759d5-8cd6-e311-93f9-00155d1a7308 |Thread: 33 |Category: Platform.Sdk |User: 0d57f89f-bf5e-> MessageProcessor start executing step 58e202dd-4b49-417a-b732-f95957ef99b0 of type 'Microsoft.Crm.ObjectModel.MultiCurrencyPlugin' synchronously for message 'Update' for entity 'contact'

63 msec to complete the call

The sandbox issue

Update

Event

Pipeline

Retrieve

.

Pre StagePreEvent Plugin

Sandbox Sandbox

PostEvent Plugin

Retrieve

Event

Pipeline

Pre Stage

Platform

Post Stage

30 msec

Sandbox

30 msec

30 msec

30 msec

Async Workflows• Use with care

• Strictly limit the number of workflows registered per event, to avoid race conditions and system overload.

• Purge completed workflows as much as possible, but beware of the “delete tax”

Audit

• Restrict Scope• Do not audit when doing the initial import• Do not audit read only data (especially, if having a daily or more update

frequency)• Do not audit unneeded fields

• Manage Growth• Delete Audit information is only by date, not by type• Choose SQL Server version wisely

• SQL Enterprise audit uses Quarterly Partitions, SQL Standard doesn't• Allows splitting the audit logs into different filegroups for better I/O• Upgrade from Standard to Enterprise will not create partitions!

• Audit is always async so triggers high usage of back-end

Locks & Deadlocks• The SQL Server Default is locking*

• By default, define the queries with the no-lock hint• no-lock='true‘ in FetchXML• QueryExpression.NoLock• With (Nolock) in T-SQL

• That applies everywhere (ie: Views, Charts, Reports, Plugins, JS, etc..)

• Minimize lock contention when doing updates (ie: cascade restrictions)

• Export Excel is (currently) locking so give the right with care.

A good ref.: Microsoft Dynamics CRM 2015 Performance and Scalability

* You can change the SQL Server default (like to Snapshot isolation) but it’s not the recommended best practice and triggers its own issues to manage (ie: Tempdb usage in Snapshot isolation)

Integration & GUIDs

• Don't let external systems generate CRM GUIDs, as they won't be sequentials, even for Read-Only, back end originated data (cfr http://blogs.msdn.com/b/crminthefield/archive/2015/01/19/the-dangers-

of-guid-newguid.aspx)

• Using a sequential .NET generated GUID (cfr http://blogs.msdn.com/b/dbrowne/archive/2012/07/03/how-to-generate-

sequential-guids-for-sql-server-in-net.aspx), is dangerous as you need to be sure only one system/integration generates those GUIDs; which is implicit if generated by the Database Server.

Integration & Performance

• Web Services are not the bottleneck if handled properly

• Data Injection needs to be massively multi-threaded (Bulk API can be helpful as well) and scaled out

• Front-End processing need to be massively scaled out; the bottleneck being the IIS CPU Usage (+/- 50 ops/sec/core)

• Disable everything unnecessary (not only for initial load): Audit, WF, Plugins, etc..

• The true bottleneck is Network bandwidth and IOPS

Tip : When you use the Update message, do not set the OwnerId attribute on a record unless the owner has actually changed

Performance Test• Why?

• Sizing is always based on assumptions which may not be valid (ie: Bandwith)

• The system behaves differently under heavy stress• You will find issues in configuration, code, etc.. unseen in normal tests

• How?• End-Users Stress

• Choose the 5 most used scenarios, script them (see tool) and randomize the script.

• Execute the scenarios with accelerated users and after first initial load.

• Back-End Stress• Simulate a peak load from all real-time integration points

• Even if the test is a success, ramp up to failure to identify your weak points

• With what?• Visual Studio Web Performance Project (2013 Ultimate)

• Stress Stimulus (Fiddler Extension)

• HP LoadRunner

• CRM Performance Toolkit (CRM2011)

After

Go Live

Maintenance• Maintenance Service

• Maintenance Job Editor to Reschedule the jobs • Check for failure and timeouts; move to a SQL Maintenance Job if

needed (usually Reindex <> Shrink)

• Backups• Avoid Hourly Transactions Logs backup if not needed

• Don’t change the Connection Pooling (100 max)• if you need more you do have an issue with non closing connections

(reentrant code)

• Regulary checks for• Long Running Queries (Sys.dm_exec_query_stats,

Sys.dm_exec_query_plan ,Sys.dm_exec_sql_text)• Missing Indexes (min. 4 weeks)

(Sys.dm_db_missing_index_group_stats,Sys.dm_db_missing_index_groups,Sys.dm_db_missing_index_details,Sys.dm_db_index_usage_stats)

• Wait Statistics• Deadlocks (Sys.dm_tran_locks)

Application Performance Management

Manage Growth

• Move BloBs out of the CRM database• SharePoint Integration

• Others Binary repositories (ie: Documentum, PDF in File Servers, etc..)

• Database Purge• Define and Automate Data cleanup

• Bulk Delete is fine but :• Define indexes and split in small chunks as Bulk Delete easily timeout

• Truncate table is unsupported but can be used if you need a fast cleanup (ie: audits, jobs,..). The DB contraints limit the risks

Archiving, Delete rights and Restore• No OOTB archiving but keeping it all is not an option in high

growth systems.

• Even if storage cost decreases fast, the overall sizing cost and impacts does not (think about RAM, backups, etc..)

• Usually a lot of data is already “archived” as part of BAU; think about DHW, BI, Legal Archives, Disaster Recovery,…

• Archiving is not the expensive part, Accessing/Restoring the archived data is, also prevent accidental Delete!

• From easy to complex, implement a restore:• Archive CRM Organizations (Live)

• Archive CRM Instances (Offline or Live); Sandbox on CRM Online

• Backup CRM Database (Offline)

• Reports Snaphot history (Live/Offline but degraded access to information)

• …

Email RouterServer-Side Sync• Email Router

• Adjust Batch Size, the Default is 5; so 200 calls per 1000 users

• Server-Side Sync

• Settings in Deployment Properties for Throttling (Mailbox, EC items)

• Think of dedicating one server for Server-Side Sync "Email Integration Service Role"

Takeaways

Session Objectives And Takeaways

• Session Objective(s):

• Demystify High Volumes

• Stories from the Field !

• Key Takeaways

• Key attention points for high volumes

• Draft design considerations

QUESTIONS

[email protected]@RealDolmen.com

Please remember to fill out your session evaluation survey online!

The link to the survey was emailed to you, or go to: eXtremeCRM.com/eXtremeCRM2015Madrid/Surveys/tabid/1632/Default.aspx

Complete prior to the closing session to be included in today’s drawing!

Other sources

• http://blogs.msdn.com/b/emeadcrmsupport/archive/2014/07/16/dynamics-crm-quick-find-performance-part-ii.aspx

• https://code.msdn.microsoft.com/windowsdesktop/Multi-Language-Lookups-in-18fadc80

• https://www.microsoft.com/en-us/download/details.aspx?id=45905

• http://blogs.msdn.com/b/crminthefield/archive/2014/09/11/codeplex-dynamics-crm-quickfind-on-selected-view.aspx

• http://dynamics4.com/changing-search-behavior-in-microsoft-crm-2011-part-1/

http://crm.realdolmen.com

We’re looking forward to meet you !

The team at eXtremeCRM