Implementing data sync apis for mibile apps @cloudconf

Post on 16-Jul-2015

126 views 4 download

Tags:

Transcript of Implementing data sync apis for mibile apps @cloudconf

Implementing data synchronization API for

mobile apps

Michele OrselliCTO@Ideato

micheleorselli / ideatosrl

_orso_

mo@ideato.it

Agenda

scenario design choices

implementation alternative approaches

Dealing with conflicts

A1

A2

?

Brownfield project

several mobile apps for tracking user generated data (calendar, notes, bio data)

iOS & Android

~10 K users steadily growing at 1.2 K/month

Scenario

MongoDB

Legacy App based on Codeigniter

Existing RPC-wannabe-REST API for data sync

Scenario

For every resource

get updates:

POST /m/:app/get/:user_id/:res/:updated_from

create/send updates:

POST /m/:app/update/:user_id/:res_id/:dev_id/:res

Scenario

api

~6 different resources, ~12 calls per sync

apps sync by polling every 30 sec

every call sync little data

Scenario

Rebuild sync API for old apps + 2 incoming

Enable image synchronization

More efficient than previous API

Challenge

Existing Solutions

Tstamps, Vector clocks,

CRDTs

syncML, syncano

Azure Data sync

Algorithms Protocols/API

Platform

couchDB, riak

Storage

Not Invented Here?

Don't Reinvent The Wheel,Unless You Plan on Learning More About Wheels

J. Atwood

2 different mobile platforms

Several teams with different skill level

Changing storage wasn’t an option

Forcing a particular technology client side wasn’t an option

Architecture

Architecture

c1

server

c2

c3

sync logicconflicts resolution

thin clients

In the sync domain all resources are managed in the same way

Implementation

For every app:

one endpoint for getting new data

one endpoint for pushing changes

one endpoint for uploading images

Implementation

GET /apps/:app/users/:user_id/changes[?from=:from]

POST /apps/:app/users/:user_id/merge

POST /upload/:res_id/images

The new APIs

Silex Implementation

Silex Implementation

Col 1

Col 2

Col 3

Silex Implementation

Col 1

Col 2

Col 3

Sync Service

Silex Implementation

Col 1

Col 2

Col 3

Sync Service

Silex Implementation

Col 1

Col 2

Col 3

Sync Service

Silex Implementation

Col 1

Col 2

Col 3

Sync Service

Silex Implementation

Col 1

Col 2

Col 3

Sync Service

Silex Implementation$app->get(“/apps/{mApp}/users/{userId}/merge”,

function ($mApp, $userId, $app, $request){$lastSync = $request->get('from', null);$data = $request->get(‘data’, false);$syncService = $app[‘syncService’];$syncService->merge($data, $lastSync, $userId);

$response = new JsonResponse($syncService->getResult()

);

return $response;}

Silex Implementation$app->get(“/apps/{mApp}/users/{userId}/merge”,

function ($mApp, $userId, $app, $request){$lastSync = $request->get('from', null);$data = $request->get(‘data’, false);$syncService = $app[‘syncService’];$syncService->merge($data, $lastSync, $userId);

$response = new JsonResponse($syncService->getResult()

);

return $response;}

Silex Implementation$app->get(“/apps/{mApp}/users/{userId}/merge”,

function ($mApp, $userId, $app, $request){$lastSync = $request->get('from', null);$data = $request->get(‘data’, false);$syncService = $app[‘syncService’];$syncService->merge($data, $lastSync, $userId);

$response = new JsonResponse($syncService->getResult()

);

return $response;}

Silex Implementation$app->get(“/apps/{mApp}/users/{userId}/merge”,

function ($mApp, $userId, $app, $request){$lastSync = $request->get('from', null);$data = $request->get(‘data’, false);$syncService = $app[‘syncService’];$syncService->merge($data, $lastSync, $userId);

$response = new JsonResponse($syncService->getResult()

);

return $response;}

Silex Implementation$app->get(“/apps/{mApp}/users/{userId}/merge”,

function ($mApp, $userId, $app, $request){$lastSync = $request->get('from', null);$data = $request->get(‘data’, false);$syncService = $app[‘syncService’];$syncService->merge($data, $lastSync, $userId);

$response = new JsonResponse($syncService->getResult()

);

return $response;}

Silex Implementation

$app['mongodb'] = new MongoDb(…);

$app[‘changesRepo’] = new ChangesRepository( $app[‘mongodb’]);

$app[‘syncService’] ? new SyncService( $app[‘changesRepo’]);

GET /apps/:app/users/:user_id/changes?from=:from

Get changes

timestamp?

timestamp are inaccurate

server suggests the “from” parameter to be used in the next request

Server suggest the sync time

Server suggest the sync time

c1 server

GET /changes

{ ‘next’ : 12345, ‘data’: […] }

Server suggest the sync time

c1 server

GET /changes

{ ‘next’ : 12345, ‘data’: […] }

GET /changes?from=12345

{ ‘next’ : 45678, ‘data’: […] }

data format {id: ‘1’, ’type’: ‘measure’, ‘_deleted’: true}{id: 2’, ‘type’: ‘note’}{id: ‘3’, ‘type’: ‘note’}

ps: soft delete all the things!

what to transfer

How do we generate an unique id in a distributed system?

unique identifiers

How do we generate an unique id in a distributed system?

UUID (RFC 4122): several implementations in PHP (https://github.com/ramsey/uuid)

unique identifiers

How do we generate an unique id in a distributed system?

Local/Global Id: only the server generates GUIDsclients use local ids to manage their records

unique identifiers

unique identifiers

c1 server

POST /merge{ ‘data’: [ {’lid’: ‘1’, …}, {‘lid’: ‘2’, …}] }

{ ‘data’: [ {‘guid’: ‘58f0bdd7-1400’, ’lid’: ‘1’, …}, {‘guid’: ‘6f9f3ec9-1400’, ‘lid’: ‘2’, …}] }

mobile generated data are “temporary” until sync to server

server handles conflicts resolution

conflict resolution algorithm (plain data)

conflict resolution:

domain indipendent: e.g. last-write wins

domain dipendent: use domain knowledge to resolve

conflict resolution algorithm (plain data)

function sync($data) {

foreach ($data as $newRecord) {

$s = findByGuid($newRecord->getGuid());

if (!$s) {add($newRecord);send($newRecord);continue;

}

if ($newRecord->updated > $s->updated) {update($s, $newRecord);send($newRecord);continue;

}

updateRemote($newRecord, $s);}

conflict resolution algorithm (plain data)

function sync($data) {

foreach ($data as $newRecord) {

$s = findByGuid($newRecord->getGuid());

if (!$s) {add($newRecord);send($newRecord);continue;

}

if ($newRecord->updated > $s->updated) {update($s, $newRecord);send($newRecord);continue;

}

updateRemote($newRecord, $s);}

conflict resolution algorithm (plain data)

function sync($data) {

foreach ($data as $newRecord) {

$s = findByGuid($newRecord->getGuid());

if (!$s) {add($newRecord);send($newRecord);continue;

}

if ($newRecord->updated > $s->updated) {update($s, $newRecord);send($newRecord);continue;

}

updateRemote($newRecord, $s);}

conflict resolution algorithm (plain data)

no conflict

function sync($data) {

foreach ($data as $newRecord) {

$s = findByGuid($newRecord->getGuid());

if (!$s) {add($newRecord);send($newRecord);continue;

}

if ($newRecord->updated > $s->updated) {update($s, $newRecord);send($newRecord);continue;

}

updateRemote($newRecord, $s);}

conflict resolution algorithm (plain data)

remote wins

function sync($data) {

foreach ($data as $newRecord) {

$s = findByGuid($newRecord->getGuid());

if (!$s) {add($newRecord);send($newRecord);continue;

}

if ($newRecord->updated > $s->updated) {update($s, $newRecord);send($newRecord);continue;

}

updateRemote($newRecord, $s);}

conflict resolution algorithm (plain data)

server wins

conflict resolution algorithm (plain data)

c1

{ ‘lid’: ‘1’, ‘guid’: ‘af54d’, ‘data’ : ‘AAA’ ‘updated’: ’100’ }

{ ‘lid’: ‘2’, ‘data’ : ‘hello!’, ‘updated’: ’15’ }

server

{ ’guid’: ‘af54d’, ‘data’: ‘BBB’, ‘updated’ : ’20’ }

conflict resolution algorithm (plain data)

c1 server

{ ‘lid’: ‘1’, ‘guid’: ‘af54d’, ‘data’ : ‘AAA’ ‘updated’: ’100’ }

{ ‘lid’: ‘2’, ‘data’ : ‘hello!’, ‘updated’: ’15’ } POST /merge

{ ’guid’: ‘af54d’, ‘data’: ‘BBB’, ‘updated’ : ’20’ }

conflict resolution algorithm (plain data)

c1 server

{ ‘lid’: ‘1’, ‘guid’: ‘af54d’, ‘data’ : ‘AAA’ ‘updated’: ’100’ }

{ ‘lid’: ‘2’, ‘data’ : ‘hello!’, ‘updated’: ’15’ } POST /merge

{ ‘guid’: ‘e324f’, ‘data’ : ‘hello!’, ‘updated’: ’15’ }

{ ’guid’: ‘af54d’, ‘data’: ‘BBB’, ‘updated’ : ’20’ }

conflict resolution algorithm (plain data)

c1 server

{ ‘lid’: ‘1’, ‘guid’: ‘af54d’, ‘data’ : ‘AAA’ ‘updated’: ’100’ }

{ ‘lid’: ‘2’, ‘data’ : ‘hello!’, ‘updated’: ’15’ } POST /merge

{ ‘guid’: ‘e324f’, ‘data’ : ‘hello!’, ‘updated’: ’15’ }

{ ’guid’: ‘af54d’, ‘data’: ‘BBB’, ‘updated’ : ’20’ }

conflict resolution algorithm (plain data)

c1 server

{ ‘lid’: ‘1’, ‘guid’: ‘af54d’, ‘data’ : ‘AAA’ ‘updated’: ’100’ }

{ ‘lid’: ‘2’, ‘data’ : ‘hello!’, ‘updated’: ’15’ } POST /merge

{ ‘guid’: ‘e324f’, ‘data’ : ‘hello!’, ‘updated’: ’15’ }

{ ’guid’: ‘af54d’, ‘data’: ‘AAA’, ‘updated’ : ’100’ }

conflict resolution algorithm (plain data)

c1 server

{ ‘lid’: ‘1’, ‘guid’: ‘af54d’, ‘data’ : ‘AAA’ ‘updated’: ’100’ }

{ ’guid’: ‘af54d’, ‘data’: ‘AAA’, ‘updated’ : ’100’ }

{ ‘lid’: ‘2’, ‘data’ : ‘hello!’, ‘updated’: ’15’ } POST /merge

{ ‘guid’: ‘e324f’, ‘data’ : ‘hello!’, ‘updated’: ’15’ }

{‘ok’ : { ’guid’: ‘af54d’ }}

{‘update’ : { lid: ‘2’, ’guid’: ‘e324f’ }}

conflict resolution algorithm (hierarchical data)

How to manage hierarchical data?

{‘lid’ : ‘123456’,‘type’ : ‘baby’, …

}

{‘lid’ : ‘123456’,‘type’ : ‘temperature’, ‘baby_id : ‘123456’

}

conflict resolution algorithm (hierarchical data)

How to manage hierarchical data?1) sync root record2) update ids3) sync child records

{‘lid’ : ‘123456’,‘type’ : ‘baby’, …

}

{‘lid’ : ‘123456’,‘type’ : ‘temperature’, ‘baby_id : ‘123456’

}

function syncHierarchical($data) {

sortByHierarchy($data);

foreach ($data as $newRootRecord) {

$s = findByGuid($newRootRecord->getGuid());

if($newRecord->isRoot()) {

if (!$s) {add($newRootRecord);updateRecordIds($newRootRecord, $data);send($newRootRecord);continue;

}

conflict resolution algorithm (hierarchical data)

function syncHierarchical($data) {

sortByHierarchy($data);

foreach ($data as $newRootRecord) {

$s = findByGuid($newRootRecord->getGuid());

if($newRecord->isRoot()) {

if (!$s) {add($newRootRecord);updateRecordIds($newRootRecord, $data);send($newRootRecord);continue;

}

conflict resolution algorithm (hierarchical data)

parent records first

function syncHierarchical($data) {

sortByHierarchy($data);

foreach ($data as $newRootRecord) {

$s = findByGuid($newRootRecord->getGuid());

if($newRecord->isRoot()) {

if (!$s) {add($newRootRecord);updateRecordIds($newRootRecord, $data);send($newRootRecord);continue;

}

conflict resolution algorithm (hierarchical data)

function syncHierarchical($data) {

sortByHierarchy($data);

foreach ($data as $newRootRecord) {

$s = findByGuid($newRootRecord->getGuid());

if($newRecord->isRoot()) {

if (!$s) {add($newRootRecord);updateRecordIds($newRootRecord, $data);send($newRootRecord);continue;

}

conflict resolution algorithm (hierarchical data)

no conflict

if ($newRootRecord->updated > $s->updated) {update($s, $newRecord);updateRecordIds($newRootRecord, $data);send($newRootRecord);continue;

} else {updateRecordIds($s, $data);updateRemote($newRecord, $s);

}} else {

sync($data);}

}

conflict resolution algorithm (hierarchical data)

remote wins

if ($newRootRecord->updated > $s->updated) {update($s, $newRecord);updateRecordIds($newRootRecord, $data);send($newRootRecord);continue;

} else {updateRecordIds($s, $data);updateRemote($newRecord, $s);

}} else {

sync($data);}

}

conflict resolution algorithm (hierarchical data)

server wins

conflict resolution algorithm (hierarchical data)

{ ‘lid’: ‘1’, ‘data’ : ‘AAA’ ‘updated’: ’100’ }

{ ‘lid’: ‘2’, ‘parent’: ‘1’, ‘data’ : ‘hello!’, ‘updated’: ’15’ }

c1 serverPOST /merge

conflict resolution algorithm (hierarchical data)

c1

{ ‘lid’: ‘1’, ‘data’ : ‘AAA’ ‘updated’: ’100’ }

{ ‘lid’: ‘2’, ‘parent’: ‘1’, ‘data’ : ‘hello!’, ‘updated’: ’15’ }

serverPOST /merge

{ ‘lid’: ‘1’, ‘guid’ : ‘32ead’, ‘data’ : ‘AAA’ ‘updated’: ’100’ }

conflict resolution algorithm (hierarchical data)

c1

{ ‘lid’: ‘1’, ‘data’ : ‘AAA’ ‘updated’: ’100’ }

{ ‘lid’: ‘2’, ‘parent’: ‘32ead’, ‘data’ : ‘hello!’, ‘updated’: ’15’ }

serverPOST /merge

{ ‘lid’: ‘1’, ‘guid’ : ‘32ead’, ‘data’ : ‘AAA’ ‘updated’: ’100’ }

conflict resolution algorithm (hierarchical data)

c1

{ ‘lid’: ‘1’, ‘data’ : ‘AAA’ ‘updated’: ’100’ }

{ ‘lid’: ‘2’, ‘parent’: ‘32ead’, ‘data’ : ‘hello!’, ‘updated’: ’15’ }

serverPOST /merge

{ ‘lid’: ‘1’, ‘guid’ : ‘32ead’, ‘data’ : ‘AAA’ ‘updated’: ’100’ }

{ ‘lid’: ‘2’, ‘parent’: ‘32ead’, ‘data’ : ‘hello!’, ‘updated’: ’15’ }

{‘update’ : { ‘lid’: ‘1’, ’guid’: ‘af54d’ }}

{‘update’ : { lid: ‘2’, ’guid’: ‘e324f’ }}

e.g. “only one temperature can be registered in a given day”

how to we enforce domain constraints on data?

enforcing domain constraints

e.g. “only one temperature can be registered in a given day”

how to we enforce domain constraints on data?1) relax constraints

enforcing domain constraints

e.g. “only one temperature can be registered in a given day”

how to we enforce domain constraints on data?1) relax constraints2) integrate constraints in sync algorithm

enforcing domain constraints

from findByGuid to findSimilar

first lookup by GUID then by domain rules

“two measures are similar if are referred to the same date”

enforcing domain constraints

enforcing domain constraints

c1 server

enforcing domain constraints

c1 server

{ ’guid’: ‘af54d’, ‘when’: ‘20141005’ }

enforcing domain constraints

c1 server

{ ‘lid’: ‘1’, ‘when’: ‘20141005’ }

{ ’guid’: ‘af54d’, ‘when’: ‘20141005’ }

enforcing domain constraints

c1 server

{ ‘lid’: ‘1’, ‘when’: ‘20141005’ }

{ ’guid’: ‘af54d’, ‘when’: ‘20141005’ }

POST /merge

enforcing domain constraints

c1 server

{ ‘lid’: ‘1’, ‘when’: ‘20141005’ }

{ ’guid’: ‘af54d’, ‘when’: ‘20141005’ }

POST /merge

enforcing domain constraints

c1 server

{ ‘lid’: ‘1’, ‘when’: ‘20141005’ }

{ ’guid’: ‘af54d’, ‘when’: ‘20141005’ }

POST /merge

{ ’guid’: ‘af54d’, ‘when’: ‘20141005’ }

Binary data uploaded via custom endpoint

Sync data remains small

Uploads can be resumed

dealing with binary data

Two steps*1) data are synchronized2) related images are uploaded

* this means record without file for a given time

dealing with binary data

dealing with binary data

c1 server

POST /merge

POST /upload/ac435-f8345/image

{ ‘lid’ : 1, ‘type’ : ‘baby’, ‘image’ : ‘myimage.jpg’ }

{ ‘lid’ : 1, ‘guid’ : ‘ac435-f8345’ }

Implementing this stuff is tricky

Explore existing solution if you can

Understanding the domain is important

What we learned

vector clocks

Conflict-free Replicated Data Types (CRDTs)

Constraining the types of operations in order to:

- ensure convergence of changes to shared data by uncoordinated, concurrent actors

- eliminate network failure modes as a source of error

CRDT

Gateways handles sync

Data flows through channels

- partition data set

- authorization

- limit the data

Use revision trees

Couchbase Mobile

Distributed DBEventually/Strong Consistency

Data Types

Configurable conflict resolution- db level for built-in data types- application level for custom data

Riak

See you in Verona!

jsDay 13th-14th of May

http://2015.jsday.it/

phpDay 15th-16th of May

http://2015.phpday.it/

Questions?

http://www.objc.io/issue-10/sync-case-study.htmlhttp://www.objc.io/issue-10/data-synchronization.html

https://dev.evernote.com/media/pdf/edam-sync.pdfhttp://blog.helftone.com/clear-in-the-icloud/

http://strongloop.com/strongblog/node-js-replication-mobile-offline-sync-loopback/http://blog.denivip.ru/index.php/2014/04/data-syncing-in-core-data-based-ios-apps/?lang=en

http://inessential.com/2014/02/15/vesper_sync_diary_8_the_problem_of_unhttp://culturedcode.com/things/blog/2010/12/state-of-sync-part-1.htmlhttp://programmers.stackexchange.com/questions/206310/data-synchronization-in-mobile-apps-multiple-devices-multiple-users

http://bricklin.com/offline.htmhttp://blog.couchbase.com/why-mobile-sync

Links

Vector Clockshttp://basho.com/why-vector-clocks-are-easy/http://www.datastax.com/dev/blog/why-cassandra-doesnt-need-vector-clockshttp://basho.com/why-vector-clocks-are-hard/http://blog.8thlight.com/rylan-dirksen/2013/10/04/synchronization-in-a-distributed-system.html

CRDTshttp://christophermeiklejohn.com/distributed/systems/2013/07/12/readings-in-distributed-systems.htmlhttp://www.infoq.com/presentations/problems-distributed-systemshttps://www.youtube.com/watch?v=qyVNG7fnubQ

Riak http://docs.basho.com/riak/latest/dev/using/conflict-resolution/

Couchbase Sync Gatewayhttp://docs.couchbase.com/sync-gateway/http://www.infoq.com/presentations/sync-mobile-data

APIhttp://developers.amiando.com/index.php/REST_API_DataSynchttps://login.syncano.com/docs/rest/index.html

Links

phones https://www.flickr.com/photos/15216811@N06/14504964841wat http://uturncrossfit.com/wp-content/uploads/2014/04/wait-what.jpgdarth http://www.listal.com/viewimage/3825918hblueprint: http://upload.wikimedia.org/wikipedia/commons/5/5e/Joy_Oil_gas_station_blueprints.jpgbuilding: http://s0.geograph.org.uk/geophotos/02/42/74/2427436_96c4cd84.jpgbrownfield: http://s0.geograph.org.uk/geophotos/02/04/54/2045448_03a2fb36.jpgno connection: https://www.flickr.com/photos/77018488@N03/9004800239no internet con https://www.flickr.com/photos/roland/9681237793vector clocks: http://en.wikipedia.org/wiki/Vector_clockcrdts: http://www.infoq.com/presentations/problems-distributed-systems

Credits