How One Billion Salesforce records Can Be Replicated with Minimal API Usage
-
Upload
baruch-oxman -
Category
Software
-
view
934 -
download
3
description
Transcript of How One Billion Salesforce records Can Be Replicated with Minimal API Usage
How One Billion Salesforce recordsCan Be Replicated with Minimal API UsageBaruch Oxman
R&D Manager, Implisit
@implisithq, @baruchoxman
Place
Customer or
Partner logo in
white area of
slide, centered
Baruch OxmanR&D Manager
In this session…
• Implisit - Intro & Motivation
• Salesforce APIs Usage & Limits - Overview
• Efficient use of Salesforce APIs
• Scale and limitations
• Other pitfalls and tips
Implisit – The End of CRM Data Entry
• Implisit uses Data-Mining and Machine Learning to keep Salesforce updated:
– Updating emails and calendar events to Salesforce automatically
– Creating and updating Accounts, Opportunities, Contacts, Leads
– Keeping team informed on all client communications
• Using text analysis:
– Creating meaningful business insights
– Improving forecasting and sales pipeline management
• Requires Salesforce data replication for offline processing
Data Replication Goals
• Minimize your API usage
– Avoid reaching the API limit
– API limits are shared between all API-connected apps – other apps can be blocked
• Minimize sync cycle time
– Don’t make our customers wait for too long
Salesforce API Limits
• Daily API limits for Salesforce Editions:
– Unlimited/Performance: # of users x 5,000, up to 1,000,000
– Enterprise/Professional: # of users x 1,000
– Developer: 15,000
– Sandbox: 5,000,000
• In-parallel API calls limit (25 – production, 5 – dev)
Source & more info: https://help.salesforce.com/HTViewHelpDoc?id=integrate_api_rate_limiting.htm
Performance Stats
• Keeping over one billion Salesforce records replicated in-sync
– 27 Salesforce object types are replicated (e.g. Accounts, Contacts)
• Initial sync
– 600-1000 API calls in total
• Updates sync
– 200-400 API calls in total
– Performed every few hours
• Bulk (Async) API
– Large amounts of records in a single request
(fewer API calls)
– Slow, requires polling for results
– Implements internal retries
– Does not support some objects (e.g.
OpportunityHistory)
Salesforce API Types
• REST API
– Fast, synchronous queries
– Up to 2,000 records per request
– Each request – single API call
– Simple usage
https://developer.salesforce.com/blogs/tech-pubs/2011/10/salesforce-apis-what-they-are-when-to-use-them.html
Are you ready to replicate ?
Replication method
Initial FetchingChanges
Fetching
Replication method – Initial fetching
• Using Bulk API as much as possible
• Fetch all records for each relevant object type
– Lots of data
– Only non-deleted records
• Paginate by CreatedDate
• Example:
– 1st query: “…ORDER BY CreatedDate LIMIT 100000”
– Subsequent: “…WHERE CreatedDate > 2014-08-31T02:29:29Z ORDER BY CreatedDate LIMIT 100000”
Replication method – Changes fetching
• Fetch only records that changed since the previous fetch time
– Less data – only changes
– Take care of updates and deletions
• Using SystemModstamp as indicator for changes in record
• Same pagination logic as in initial fetching
• Example:
– 1st query: “…WHERE SystemModstamp > 2014-07-31T02:29:29Z AND ORDER BY CreatedDate LIMIT 100000”
– Subsequent: “…WHERE SystemModstamp > 2014-07-31T02:29:29Z AND CreatedDate > 2014-08-31T02:29:29Z ORDER BY CreatedDate LIMIT 100000”
• Bulk changes fetching VS getUpdated()
Deleted items
• Motivation:
– Required to maintain consistent sync
• Two implementation options
– Use getDeleted() call in SOAP API (our choice)
– Use queryAll(isDeleted = True) call in REST API
• Potentially more API calls
• Some objects can become “undeleted” !
Getting all fields
• No “SELECT *” support
• Get all fields for table using “describe”
– Optionally, filter the fields (skip custom fields, etc…)
– Non-visible fields (due to security restrictions)
• Use the field names in the query
• Limitation: query length cannot exceed 20,000 characters*
* http://www.salesforce.com/us/developer/docs/soql_sosl/Content/sforce_api_calls_soql_select.htm
User Access Restrictions
• Full access rights are strongly encouraged
– Full view of all objects
– Limited access rights → slower queries
• Reference Fields – special case
– Tasks / Events - WhoId, WhatId
– Attachment - ParentId
– Reference fields make access checks in Salesforce even slower
– Limited to 100,000 different values per query
– Solution: query in smaller chunks
Error handling
• Nothing is fail-safe
• Different APIs produce different errors
• Examples:
– Query too long (too many fields)
– Scale limitations
– Communication errors
– Salesforce maintenance windows
• Add support for anything you encounter
– “Rare” becomes “frequent” once you scale
• ABR (Always Be Retrying)
• Remember to clean up upon errors
– Close open bulk jobs
Unavailable Salesforce objects
• Some orgs make some of the objects unavailable
– Using security restriction
– For example, Lead or Opportunity
• Check using describeSObjects for each object, before fetching
• Safely skip when not supported
Summary
• Implisit - Intro & Motivation
• Salesforce APIs Overview
• Efficient use of API
• Scale and limitations
• Other pitfalls and tips
Additional Resources:
• API Call Basics
• Salesforce App Limits Cheat Sheet
• Understanding Execution Governors and Limits
• Query & Search Optimization Cheat Sheet
• Bulk Query Details