Data Pipelines -Big Data Meets Salesforce

43
Data Pipeline: Big Data meets Salesforce Carolina Ruiz Medina Principal Developer on Product Innovation [email protected] @carolenlanube Agustina García Peralta Principal Developer on Platform Strategy [email protected] @agarciaodeian

Transcript of Data Pipelines -Big Data Meets Salesforce

Salesforce

Data Pipeline:Big Data meets SalesforceCarolina Ruiz MedinaPrincipal Developer on Product [email protected]@carolenlanubeAgustina Garca PeraltaPrincipal Developer on Platform [email protected]@agarciaodeian

Carolina Ruiz MedinaPrincipal Developer on Product InnovationFinancialForce.com , MVP@CarolEnLaNube @CodeCoffeeCloud

Agustina Garca PeraltaPrincipal Developer, Platform StrategyFinancialForce.com@agarciaodeian

About GREAT ALONE. BETTER TOGETHER.Native to Salesforce1 Platform since 2009Investors include Salesforce Ventures650+ employees, San Francisco based

4

AgendaData Pipeline - OverviewPipeline Use CasesHow Pipeline works DemosBig DataTake awayQ&A

Asynchronous apex@futureQueueableBatch ApexFlex Queue (since Summer 15)Common scenario Large amount of data

Any other option? Data Pipeline: New feature to integrate Apache Pig into Salesforce

Common scenario Large amount of data

DONE slide 7 & 8 New feature is integrating apqache pig into SFReduce the text ! == make it more visual

What does it do? Process massive amounts of data in parallel.Key elementsMapReduce software to write programs to run amounts of data in parallelHadopp cluster cluster for storing and analyzing amounts of data

Apache Pig Background

Enables Developers to create executions for analyzing LARGE AMOUNT of data in PARALLEL

How does it work? It uses Pig Latin Data-flow languageBetween SQL and JavaWe can create our own UDF (user define functions)

Apache Pig Background

Why is it relevant? Technology associated with Hadoop but can be used by other frameworks Salesforce

Is there anything unique to Apache Pig running in Salesforce?Running in multitenant environmentApache Pig Background

Under Pilot program GA by Summer 16 (Safe Harbor)How does Data Pipeline work?Run Pig Scripts written in Pig Latin language

What is Data Pipeline?

Data PipelinePig ScriptApex?

Execution featureRun asynchronouslyIn ParallelFrom where?Developer ConsoleDuring deployTooling API 33.0 onwards

What is Data Pipeline?

Anything else?It is an ETL (Extract Transform Load)Pig Scripts can be included into a package

What is Data Pipeline?

What is Data Pipeline?

1 . PerformanceData Pipeline Advantages vs other processes

2 . Ability to Execute Scripts in Parallel

3 . No hitting governor Limits

4 . De-couple On-line Transaction Processing and On-line Analytical Processing

5 . Allows you to think in terms of data flow

How Pipeline can help us?

. and we need to process them Now! We have a large volume of Financial Transactions. for our Users to be able to use them: Report, print, or for another quick process to finish revaluatePrepare data for Currency Revaluation SObject SObject

to

Complex process to run at the end of the month that consume lots of resources

In general terms,revaluationof acurrencyis a calculated adjustment to a country's official exchange rate relative to a chosen baseline. The baseline can be anything from wage rates to the price of gold to a foreigncurrency.

There are two situations in which you might want to perform a currency revaluation. At period end. You might want to revalue your income statement to eliminate the effect of exchange rate fluctuations. At year end. You might want to revalue the companys balance sheet so that it values the assets and liabilities of the company at the exchange rate applicable on the balance sheet date.

How Pipeline can help us?

. and we need to process them Now! We have a large volume of Financial Transactions. for our manager to look the progress, to export data quickly... Extracting information from large amount of Data SObject File

to

Get all the info from our ** weekly** extract large volumes transactionsThere are two situations in which you might want to perform a currency revaluation. At period end. You might want to revalue your income statement to eliminate the effect of exchange rate fluctuations. At year end. You might want to revalue the companys balance sheet so that it values the assets and liabilities of the company at the exchange rate applicable on the balance sheet date.

In general terms,revaluationof acurrencyis a calculated adjustment to a country's official exchange rate relative to a chosen baseline. The baseline can be anything from wage rates to the price of gold to a foreigncurrency.

To build the Solution lets See Pig Script firstWhat is Pig Script ?

OperatorsJOINGROUPDISTINCTORDER

Pigis a high levelscriptinglanguage that is used with Apache Hadoop.Pigenables data workers to write complex data transformations without knowing Java.Pig'ssimple SQL-like scriptinglanguage is calledPigLatin, and appeals to developers already familiar with scriptinglanguages and SQL.

Break it down to level that is even more basic . Before it gets to the slide leading to --- tunning slide

SolutionSObject SObject

to

SolutionSObject File

to

File created

File size

Demo

Use Case

LBX7/7/2015$150.00I-00000Other7/7/2015$250.00I-00001LBX7/7/2015$150.00I-00002LBX12/7/2015$350.00I-00003Other15/7/2015$550.00I-00004

LBX7/7/2015$150.00I-00000Other7/7/2015$250.00I-00001LBX7/7/2015$150.00I-00002LBX12/7/2015$350.00I-00003Other15/7/2015$550.00I-00004

LBX7/7/2015$150.00I-00000Other7/7/2015$250.00I-00001LBX7/7/2015$150.00I-00002LBX12/7/2015$350.00I-00003Other15/7/2015$550.00I-00004

LBX7/7/2015$150.00I-00000Other7/7/2015$250.00I-00001LBX7/7/2015$150.00I-00002LBX12/7/2015$350.00I-00003Other15/7/2015$550.00I-00004

7/7/2015LBX$300.007/7/2015Other$250.0012/7/2015Other$250.0015/7/2015Other$550.00

LBX7/7/2015$150.00I-00000Other7/7/2015$250.00I-00001LBX7/7/2015$150.00I-00002LBX12/7/2015$350.00I-00003Other15/7/2015$550.00I-00004

SObject

toFile

Use Case -

SObject

toFile

Use Case

No header!!SObject

toFile

Demo

Use Case

SObject

toFile

Use Case

SObject

toFile

Data Pipeline 2 more options

Join 2 objects

Data Pipeline 2 more optionsRead and Process a JSON file

Thousand of invoicesKeep them somewhere for audit processesNo need all information, just some field valuesBut that is not all!!

Big Data

#Big Data#Big Objects

Big Data Big ObjectsCustom ObjectBig ObjectCreationManual & MetadataMetadata

Under Pilot program GA by Summer 16 (Safe Harbor)

Big Data Big Objects

Big Data Big Objects

Big Data Big ObjectsCustom ObjectBig ObjectCreationManual & MetadataMetadataAPI namemyObject__cmyObject__bEnable Reports, Track Activities, Track Field History, etc.Options AvailableOptions No AvailableField TypesAllText ; Date/Time ; Lookup

Big Data Big ObjectsCustom ObjectBig ObjectAble to edit / delete fields?YesNoTriggers; Field Sets; etcOptions AvailableOptions no Available

Big Data Big ObjectsCustom ObjectBig ObjectHow to Populate recordsAll optionsBulk API; SOAP API; Data PipelineCan I amen a record?YesNo Only clone is availableCan I see data creating a TabYesNo Only via SOQLFor free?YesNo Talk with Salesfoce about itStorage?It count against storage limitationIt DOES NOT count against the storage limitation

Big Data Big Objects & Pipeline

Size complexity 20 operators, 20 loads and 10 stores / scriptRun up to 30 scripts a dayBulk APIStore calls it and its limits are in placeDoes not support some operators like CountCant break the rules on Salesforce Platform triggers, validations, required fields, etcOnce you run the process there is no way backData Pipeline - Limitations

Data Pipeline Take away1. New Feature is in Pilot

2. Run Scripts via: Developer Console Deploy Tooling API ( since API 33.0) 3. Run Scripts Asynchronously and in Parallel4. Better performance 5. Easy to use!!

Q&AISV Scale: Big Data for ISVsSession Date: 9/17/2015Session Time: 4:00 p.m. - 4:40 p.m.PSTLocation: Franciscan Ballroom, Park Central Hotel

https://pig.apache.org/http://goo.gl/h5N7Sahttps://goo.gl/KXQSKC

Links and moreCarolina Ruz [email protected]@[email protected]://www.meetup.com/es/South-Spain-Salesforce-Developer-Group/

Agustina Garca [email protected]@agarciaodeianwww.agarciaodeian.comhttp://www.meetup.com/es/Spain-Salesforce-Developer-User-Group/

Thank you

null3239.1877