How One Billion Salesforce records Can Be Replicated with Minimal API Usage

Post on 14-Jun-2015

934 views 3 download

Tags:

description

My presentation from Dreamforce '14

Transcript of How One Billion Salesforce records Can Be Replicated with Minimal API Usage

How One Billion Salesforce recordsCan Be Replicated with Minimal API UsageBaruch Oxman

R&D Manager, Implisit

@implisithq, @baruchoxman

Place

Customer or

Partner logo in

white area of

slide, centered

Baruch OxmanR&D Manager

In this session…

• Implisit - Intro & Motivation

• Salesforce APIs Usage & Limits - Overview

• Efficient use of Salesforce APIs

• Scale and limitations

• Other pitfalls and tips

Implisit – The End of CRM Data Entry

• Implisit uses Data-Mining and Machine Learning to keep Salesforce updated:

– Updating emails and calendar events to Salesforce automatically

– Creating and updating Accounts, Opportunities, Contacts, Leads

– Keeping team informed on all client communications

• Using text analysis:

– Creating meaningful business insights

– Improving forecasting and sales pipeline management

• Requires Salesforce data replication for offline processing

Data Replication Goals

• Minimize your API usage

– Avoid reaching the API limit

– API limits are shared between all API-connected apps – other apps can be blocked

• Minimize sync cycle time

– Don’t make our customers wait for too long

Salesforce API Limits

• Daily API limits for Salesforce Editions:

– Unlimited/Performance: # of users x 5,000, up to 1,000,000

– Enterprise/Professional: # of users x 1,000

– Developer: 15,000

– Sandbox: 5,000,000

• In-parallel API calls limit (25 – production, 5 – dev)

Source & more info: https://help.salesforce.com/HTViewHelpDoc?id=integrate_api_rate_limiting.htm

Performance Stats

• Keeping over one billion Salesforce records replicated in-sync

– 27 Salesforce object types are replicated (e.g. Accounts, Contacts)

• Initial sync

– 600-1000 API calls in total

• Updates sync

– 200-400 API calls in total

– Performed every few hours

• Bulk (Async) API

– Large amounts of records in a single request

(fewer API calls)

– Slow, requires polling for results

– Implements internal retries

– Does not support some objects (e.g.

OpportunityHistory)

Salesforce API Types

• REST API

– Fast, synchronous queries

– Up to 2,000 records per request

– Each request – single API call

– Simple usage

https://developer.salesforce.com/blogs/tech-pubs/2011/10/salesforce-apis-what-they-are-when-to-use-them.html

Are you ready to replicate ?

Replication method

Initial FetchingChanges

Fetching

Replication method – Initial fetching

• Using Bulk API as much as possible

• Fetch all records for each relevant object type

– Lots of data

– Only non-deleted records

• Paginate by CreatedDate

• Example:

– 1st query: “…ORDER BY CreatedDate LIMIT 100000”

– Subsequent: “…WHERE CreatedDate > 2014-08-31T02:29:29Z ORDER BY CreatedDate LIMIT 100000”

Replication method – Changes fetching

• Fetch only records that changed since the previous fetch time

– Less data – only changes

– Take care of updates and deletions

• Using SystemModstamp as indicator for changes in record

• Same pagination logic as in initial fetching

• Example:

– 1st query: “…WHERE SystemModstamp > 2014-07-31T02:29:29Z AND ORDER BY CreatedDate LIMIT 100000”

– Subsequent: “…WHERE SystemModstamp > 2014-07-31T02:29:29Z AND CreatedDate > 2014-08-31T02:29:29Z ORDER BY CreatedDate LIMIT 100000”

• Bulk changes fetching VS getUpdated()

Deleted items

• Motivation:

– Required to maintain consistent sync

• Two implementation options

– Use getDeleted() call in SOAP API (our choice)

– Use queryAll(isDeleted = True) call in REST API

• Potentially more API calls

• Some objects can become “undeleted” !

Getting all fields

• No “SELECT *” support

• Get all fields for table using “describe”

– Optionally, filter the fields (skip custom fields, etc…)

– Non-visible fields (due to security restrictions)

• Use the field names in the query

• Limitation: query length cannot exceed 20,000 characters*

* http://www.salesforce.com/us/developer/docs/soql_sosl/Content/sforce_api_calls_soql_select.htm

User Access Restrictions

• Full access rights are strongly encouraged

– Full view of all objects

– Limited access rights → slower queries

• Reference Fields – special case

– Tasks / Events - WhoId, WhatId

– Attachment - ParentId

– Reference fields make access checks in Salesforce even slower

– Limited to 100,000 different values per query

– Solution: query in smaller chunks

Error handling

• Nothing is fail-safe

• Different APIs produce different errors

• Examples:

– Query too long (too many fields)

– Scale limitations

– Communication errors

– Salesforce maintenance windows

• Add support for anything you encounter

– “Rare” becomes “frequent” once you scale

• ABR (Always Be Retrying)

• Remember to clean up upon errors

– Close open bulk jobs

Unavailable Salesforce objects

• Some orgs make some of the objects unavailable

– Using security restriction

– For example, Lead or Opportunity

• Check using describeSObjects for each object, before fetching

• Safely skip when not supported

Summary

• Implisit - Intro & Motivation

• Salesforce APIs Overview

• Efficient use of API

• Scale and limitations

• Other pitfalls and tips