How One Billion Salesforce records Can Be Replicated with Minimal API Usage

22
How One Billion Salesforce records Can Be Replicated with Minimal API Usage Baruch Oxman R&D Manager, Implisit @implisithq, @baruchoxman

description

My presentation from Dreamforce '14

Transcript of How One Billion Salesforce records Can Be Replicated with Minimal API Usage

Page 1: How One Billion Salesforce records Can Be Replicated with Minimal API Usage

How One Billion Salesforce recordsCan Be Replicated with Minimal API UsageBaruch Oxman

R&D Manager, Implisit

@implisithq, @baruchoxman

Page 2: How One Billion Salesforce records Can Be Replicated with Minimal API Usage

Place

Customer or

Partner logo in

white area of

slide, centered

Baruch OxmanR&D Manager

Page 3: How One Billion Salesforce records Can Be Replicated with Minimal API Usage

In this session…

• Implisit - Intro & Motivation

• Salesforce APIs Usage & Limits - Overview

• Efficient use of Salesforce APIs

• Scale and limitations

• Other pitfalls and tips

Page 4: How One Billion Salesforce records Can Be Replicated with Minimal API Usage

Implisit – The End of CRM Data Entry

• Implisit uses Data-Mining and Machine Learning to keep Salesforce updated:

– Updating emails and calendar events to Salesforce automatically

– Creating and updating Accounts, Opportunities, Contacts, Leads

– Keeping team informed on all client communications

• Using text analysis:

– Creating meaningful business insights

– Improving forecasting and sales pipeline management

• Requires Salesforce data replication for offline processing

Page 5: How One Billion Salesforce records Can Be Replicated with Minimal API Usage

Data Replication Goals

• Minimize your API usage

– Avoid reaching the API limit

– API limits are shared between all API-connected apps – other apps can be blocked

• Minimize sync cycle time

– Don’t make our customers wait for too long

Page 6: How One Billion Salesforce records Can Be Replicated with Minimal API Usage

Salesforce API Limits

• Daily API limits for Salesforce Editions:

– Unlimited/Performance: # of users x 5,000, up to 1,000,000

– Enterprise/Professional: # of users x 1,000

– Developer: 15,000

– Sandbox: 5,000,000

• In-parallel API calls limit (25 – production, 5 – dev)

Source & more info: https://help.salesforce.com/HTViewHelpDoc?id=integrate_api_rate_limiting.htm

Page 7: How One Billion Salesforce records Can Be Replicated with Minimal API Usage

Performance Stats

• Keeping over one billion Salesforce records replicated in-sync

– 27 Salesforce object types are replicated (e.g. Accounts, Contacts)

• Initial sync

– 600-1000 API calls in total

• Updates sync

– 200-400 API calls in total

– Performed every few hours

Page 8: How One Billion Salesforce records Can Be Replicated with Minimal API Usage
Page 9: How One Billion Salesforce records Can Be Replicated with Minimal API Usage

• Bulk (Async) API

– Large amounts of records in a single request

(fewer API calls)

– Slow, requires polling for results

– Implements internal retries

– Does not support some objects (e.g.

OpportunityHistory)

Salesforce API Types

• REST API

– Fast, synchronous queries

– Up to 2,000 records per request

– Each request – single API call

– Simple usage

https://developer.salesforce.com/blogs/tech-pubs/2011/10/salesforce-apis-what-they-are-when-to-use-them.html

Page 10: How One Billion Salesforce records Can Be Replicated with Minimal API Usage

Are you ready to replicate ?

Page 11: How One Billion Salesforce records Can Be Replicated with Minimal API Usage

Replication method

Initial FetchingChanges

Fetching

Page 12: How One Billion Salesforce records Can Be Replicated with Minimal API Usage

Replication method – Initial fetching

• Using Bulk API as much as possible

• Fetch all records for each relevant object type

– Lots of data

– Only non-deleted records

• Paginate by CreatedDate

• Example:

– 1st query: “…ORDER BY CreatedDate LIMIT 100000”

– Subsequent: “…WHERE CreatedDate > 2014-08-31T02:29:29Z ORDER BY CreatedDate LIMIT 100000”

Page 13: How One Billion Salesforce records Can Be Replicated with Minimal API Usage

Replication method – Changes fetching

• Fetch only records that changed since the previous fetch time

– Less data – only changes

– Take care of updates and deletions

• Using SystemModstamp as indicator for changes in record

• Same pagination logic as in initial fetching

• Example:

– 1st query: “…WHERE SystemModstamp > 2014-07-31T02:29:29Z AND ORDER BY CreatedDate LIMIT 100000”

– Subsequent: “…WHERE SystemModstamp > 2014-07-31T02:29:29Z AND CreatedDate > 2014-08-31T02:29:29Z ORDER BY CreatedDate LIMIT 100000”

• Bulk changes fetching VS getUpdated()

Page 14: How One Billion Salesforce records Can Be Replicated with Minimal API Usage

Deleted items

• Motivation:

– Required to maintain consistent sync

• Two implementation options

– Use getDeleted() call in SOAP API (our choice)

– Use queryAll(isDeleted = True) call in REST API

• Potentially more API calls

• Some objects can become “undeleted” !

Page 15: How One Billion Salesforce records Can Be Replicated with Minimal API Usage

Getting all fields

• No “SELECT *” support

• Get all fields for table using “describe”

– Optionally, filter the fields (skip custom fields, etc…)

– Non-visible fields (due to security restrictions)

• Use the field names in the query

• Limitation: query length cannot exceed 20,000 characters*

* http://www.salesforce.com/us/developer/docs/soql_sosl/Content/sforce_api_calls_soql_select.htm

Page 16: How One Billion Salesforce records Can Be Replicated with Minimal API Usage

User Access Restrictions

• Full access rights are strongly encouraged

– Full view of all objects

– Limited access rights → slower queries

• Reference Fields – special case

– Tasks / Events - WhoId, WhatId

– Attachment - ParentId

– Reference fields make access checks in Salesforce even slower

– Limited to 100,000 different values per query

– Solution: query in smaller chunks

Page 17: How One Billion Salesforce records Can Be Replicated with Minimal API Usage

Error handling

• Nothing is fail-safe

• Different APIs produce different errors

• Examples:

– Query too long (too many fields)

– Scale limitations

– Communication errors

– Salesforce maintenance windows

• Add support for anything you encounter

– “Rare” becomes “frequent” once you scale

• ABR (Always Be Retrying)

• Remember to clean up upon errors

– Close open bulk jobs

Page 18: How One Billion Salesforce records Can Be Replicated with Minimal API Usage

Unavailable Salesforce objects

• Some orgs make some of the objects unavailable

– Using security restriction

– For example, Lead or Opportunity

• Check using describeSObjects for each object, before fetching

• Safely skip when not supported

Page 19: How One Billion Salesforce records Can Be Replicated with Minimal API Usage

Summary

• Implisit - Intro & Motivation

• Salesforce APIs Overview

• Efficient use of API

• Scale and limitations

• Other pitfalls and tips

Page 21: How One Billion Salesforce records Can Be Replicated with Minimal API Usage
Page 22: How One Billion Salesforce records Can Be Replicated with Minimal API Usage