Implementing Server Side Data Synchronization for Mobile Apps
-
Upload
michele-orselli -
Category
Software
-
view
790 -
download
1
description
Transcript of Implementing Server Side Data Synchronization for Mobile Apps
Implementing Server Side Data Synchronization for
Mobile Apps
Michele Orselli CTO@Ideato ! _orso_ ! micheleorselli ! [email protected]
Agenda
scenario design choices
implementation alternative approaches
Sync scenario
A
B
C
Sync scenario
A B C
A B C
A B C
Dealing with conflicts
A1
A2
?
Brownfield project !several mobile apps for tracking user generated data (calendar, notes, bio data) !iOS & Android !~10 K users steadily growing at 1.2 K/month
Scenario
MongoDB !Legacy App based on codeigniter !Existing RPC-wannabe-REST API for data sync
Scenario
get updates: !POST /m/<app>/get/<user_id>/<res>/<updated_from> !!!send updates: !POST /m/<app>/update/<user_id>/<res_id>/<dev_id>/<res> !!
Scenario
api
!!6 different resources, 12 calls per sync !apps sync by polling every 30 sec !every call sync little data !!
Scenario
!!rebuild sync API for old apps + 2 incoming !allow image synchronization !more efficient than previous API !!
Challenge
Existing Solutions
Tstamps, Vector clocks,
CRDTs
syncML, syncano
Azure Data sync
Algorithms Protocols/API
Platform
couchDB, riak
Storage
Not Invented Here?
Don't Reinvent The Wheel, Unless You Plan on Learning More About Wheels
!J. Atwood
!!2 different mobile platforms several teams with different skill level !changing storage wasn’t an option forcing a particular technology client side wasn’t an option
Architecture
Architecture
c1
server
c2
c3
sync logic conflicts resolution
thin clients
!!In the sync domain all resources are the same !For every app one endpoint for getting new data one endpoint for pushing changes one endpoint for uploading images
Implementation
!Get all changes (1st sync): !GET /apps/{app}/users/{user_id}/changes !Get latest changes: !GET /apps/{app}/users/{user_id}/changes?from={from}
Get changes
!Get all changes (1st sync): !GET /apps/{app}/users/{user_id}/changes !Get latest changes: !GET /apps/{app}/users/{user_id}/changes?from={from}
Get changes
timestamp?
timestamp are inaccurate (skew and developer errors) !server suggests the “from” parameter to be used in the next request
Server suggest the sync time
c1 server
GET /changes
{ ‘next’ : 123456, ‘data’: […] }
Server suggest the sync time
c1 server
GET /changes
{ ‘next’ : 12345, ‘data’: […] }
Server suggest the sync time
c1 server
GET /changes
{ ‘next’ : 12345, ‘data’: […] }
GET /changes?from=12345
{ ‘next’ : 45678, ‘data’: […] }
operations: {‘op’: ’add’, id: ‘1’, ’data’:[…]} {‘op’: ’update’, id: ‘1’, ’data’:[…]} {‘op’: ’delete’, id: ‘1’} {‘op’: ’add’, id: ‘2’, ’data’:[…]} !!states: {id: ‘1’, ’data’:[…]} {id: 2’, ’data’:[…]} {id: ‘3’, ’data’:[…]}
what to transfer
!we chosen to transfer states {id: ‘1’, ’type’: ‘measure’, ‘_deleted’: true} {id: 2’, ‘type’: ‘note’} {id: ‘3’, ‘type’: ‘note’} !!ps: soft delete all the things!
what to transfer
How do we generate an unique id in a distributed system? !UUID: several implementations (RFC 4122) !Local Ids/Global Id: server generates GUIDs clients use local ids to manage their records
unique identifiers
c1 server
GET /changes
{‘data’:{’guid’: ‘58f0bdd7-1481’}}
unique identifiers
c1 server
POST /merge{ ‘data’: [ {’lid’: ‘1’, …}, {‘lid’: ‘2’, …} ] }
{ ‘data’: [ {‘guid’: ‘58f0bdd7-1400’, ’lid’: ‘1’, …}, {‘guid’: ‘6f9f3ec9-1400’, ‘lid’: ‘2’, …} ] }
!server handles conflicts resolution mobile generated data are “temporary” until sync to server !conflict resolution: domain indipendent: last-write wins domain dipendent: use domain knowledge to resolve
conflict resolution algorithm (plain data)
function sync($data) {!!! foreach ($data as $newRecord) {!!! ! $s = findByGuid($newRecord->getGuid());!! !! ! if (!$s) {!! ! ! add($newRecord);!! ! ! send($newRecord);!! ! ! continue;!! ! }! !! ! !! ! if ($newRecord->updated > $s->updated) {!! ! ! update($s, $newRecord);!! ! ! send($newRecord);!! ! ! continue;!! ! }!! ! !! ! updateRemote($newRecord, $s);!}
conflict resolution algorithm (plain data)
function sync($data) {!!! foreach ($data as $newRecord) {!!! ! $s = findByGuid($newRecord->getGuid());!! !! ! if (!$s) {!! ! ! add($newRecord);!! ! ! send($newRecord);!! ! ! continue;!! ! }! !! ! !! ! if ($newRecord->updated > $s->updated) {!! ! ! update($s, $newRecord);!! ! ! send($newRecord);!! ! ! continue;!! ! }!! ! !! ! updateRemote($newRecord, $s);!}
conflict resolution algorithm (plain data)
function sync($data) {!!! foreach ($data as $newRecord) {!!! ! $s = findByGuid($newRecord->getGuid());!! !! ! if (!$s) {!! ! ! add($newRecord);!! ! ! send($newRecord);!! ! ! continue;!! ! }! !! ! !! ! if ($newRecord->updated > $s->updated) {!! ! ! update($s, $newRecord);!! ! ! send($newRecord);!! ! ! continue;!! ! }!! ! !! ! updateRemote($newRecord, $s);!}
conflict resolution algorithm (plain data)
no conflict
function sync($data) {!!! foreach ($data as $newRecord) {!!! ! $s = findByGuid($newRecord->getGuid());!! !! ! if (!$s) {!! ! ! add($newRecord);!! ! ! send($newRecord);!! ! ! continue;!! ! }! !! ! !! ! if ($newRecord->updated > $s->updated) {!! ! ! update($s, $newRecord);!! ! ! send($newRecord);!! ! ! continue;!! ! }!! ! !! ! updateRemote($newRecord, $s);!}
conflict resolution algorithm (plain data)
remote wins
function sync($data) {!!! foreach ($data as $newRecord) {!!! ! $s = findByGuid($newRecord->getGuid());!! !! ! if (!$s) {!! ! ! add($newRecord);!! ! ! send($newRecord);!! ! ! continue;!! ! }! !! ! !! ! if ($newRecord->updated > $s->updated) {!! ! ! update($s, $newRecord);!! ! ! send($newRecord);!! ! ! continue;!! ! }!! ! !! ! updateRemote($newRecord, $s);!}
conflict resolution algorithm (plain data)
server wins
conflict resolution algorithm (plain data)
c1
{ ‘lid’: ‘1’, ‘guid’: ‘af54d’, ‘data’ : ‘AAA’ ‘updated’: ’100’ }
{ ‘lid’: ‘2’, ‘data’ : ‘hello!’, ‘updated’: ’15’ }
server
{ ’guid’: ‘af54d’, ‘data’: ‘BBB’, ‘updated’ : ’20’ }
conflict resolution algorithm (plain data)
c1 server
{ ‘lid’: ‘1’, ‘guid’: ‘af54d’, ‘data’ : ‘AAA’ ‘updated’: ’100’ }
{ ‘lid’: ‘2’, ‘data’ : ‘hello!’, ‘updated’: ’15’ } POST /merge
{ ’guid’: ‘af54d’, ‘data’: ‘BBB’, ‘updated’ : ’20’ }
conflict resolution algorithm (plain data)
c1 server
{ ‘lid’: ‘1’, ‘guid’: ‘af54d’, ‘data’ : ‘AAA’ ‘updated’: ’100’ }
{ ‘lid’: ‘2’, ‘data’ : ‘hello!’, ‘updated’: ’15’ } POST /merge
{ ‘guid’: ‘e324f’, ‘data’ : ‘hello!’, ‘updated’: ’15’ }
{ ’guid’: ‘af54d’, ‘data’: ‘BBB’, ‘updated’ : ’20’ }
conflict resolution algorithm (plain data)
c1 server
{ ‘lid’: ‘1’, ‘guid’: ‘af54d’, ‘data’ : ‘AAA’ ‘updated’: ’100’ }
{ ‘lid’: ‘2’, ‘data’ : ‘hello!’, ‘updated’: ’15’ } POST /merge
{ ‘guid’: ‘e324f’, ‘data’ : ‘hello!’, ‘updated’: ’15’ }
{ ’guid’: ‘af54d’, ‘data’: ‘BBB’, ‘updated’ : ’20’ }
conflict resolution algorithm (plain data)
c1 server
{ ‘lid’: ‘1’, ‘guid’: ‘af54d’, ‘data’ : ‘AAA’ ‘updated’: ’100’ }
{ ‘lid’: ‘2’, ‘data’ : ‘hello!’, ‘updated’: ’15’ } POST /merge
{ ‘guid’: ‘e324f’, ‘data’ : ‘hello!’, ‘updated’: ’15’ }
{ ’guid’: ‘af54d’, ‘data’: ‘AAA’, ‘updated’ : ’100’ }
conflict resolution algorithm (plain data)
c1 server
{ ‘lid’: ‘1’, ‘guid’: ‘af54d’, ‘data’ : ‘AAA’ ‘updated’: ’100’ }
{ ’guid’: ‘af54d’, ‘data’: ‘AAA’, ‘updated’ : ’100’ }
{ ‘lid’: ‘2’, ‘data’ : ‘hello!’, ‘updated’: ’15’ } POST /merge
{ ‘guid’: ‘e324f’, ‘data’ : ‘hello!’, ‘updated’: ’15’ }
{‘ok’ : { ’guid’: ‘af54d’ }}
{‘update’ : { lid: ‘2’, ’guid’: ‘e324f’ }}
conflict resolution algorithm (hierarchical data)
!How to manage hierarchical data? !!
{ ‘lid’ : ‘123456’, ‘type’ : ‘baby’, … }
{ ‘lid’ : ‘123456’, ‘type’ : ‘temperature’, ‘baby_id : ‘123456’ }
conflict resolution algorithm (hierarchical data)
!How to manage hierarchical data? 1) sync root record 2) update ids 3) sync child records !!
{ ‘lid’ : ‘123456’, ‘type’ : ‘baby’, … }
{ ‘lid’ : ‘123456’, ‘type’ : ‘temperature’, ‘baby_id : ‘123456’ }
function syncHierarchical($data) {!!! sortByHierarchy($data);!!! foreach ($data as $newRootRecord) {!! ! !! ! $s = findByGuid($newRootRecord->getGuid());!! ! !! ! if($newRecord->isRoot()) {!!! ! ! if (!$s) {!! ! ! ! add($newRootRecord);!! ! ! ! updateRecordIds($newRootRecord, $data);!! ! ! ! send($newRootRecord);!! ! ! ! continue;!! ! ! }! !! ! !! ! ! …
conflict resolution algorithm (hierarchical data)
function syncHierarchical($data) {!!! sortByHierarchy($data);!!! foreach ($data as $newRootRecord) {!! ! !! ! $s = findByGuid($newRootRecord->getGuid());!! ! !! ! if($newRecord->isRoot()) {!!! ! ! if (!$s) {!! ! ! ! add($newRootRecord);!! ! ! ! updateRecordIds($newRootRecord, $data);!! ! ! ! send($newRootRecord);!! ! ! ! continue;!! ! ! }! !! ! !! ! ! …
conflict resolution algorithm (hierarchical data)
parent records first
function syncHierarchical($data) {!!! sortByHierarchy($data);!!! foreach ($data as $newRootRecord) {!! ! !! ! $s = findByGuid($newRootRecord->getGuid());!! ! !! ! if($newRecord->isRoot()) {!!! ! ! if (!$s) {!! ! ! ! add($newRootRecord);!! ! ! ! updateRecordIds($newRootRecord, $data);!! ! ! ! send($newRootRecord);!! ! ! ! continue;!! ! ! }! !! ! !! ! ! …
conflict resolution algorithm (hierarchical data)
function syncHierarchical($data) {!!! sortByHierarchy($data);!!! foreach ($data as $newRootRecord) {!! ! !! ! $s = findByGuid($newRootRecord->getGuid());!! ! !! ! if($newRecord->isRoot()) {!!! ! ! if (!$s) {!! ! ! ! add($newRootRecord);!! ! ! ! updateRecordIds($newRootRecord, $data);!! ! ! ! send($newRootRecord);!! ! ! ! continue;!! ! ! }! !! ! !! ! ! …
conflict resolution algorithm (hierarchical data)
no conflict
!! ! …! ! !!! ! if ($newRootRecord->updated > $s->updated) {! ! ! !! ! ! update($s, $newRecord);!! ! ! updateRecordIds($newRootRecord, $data);! ! !! ! ! send($newRootRecord);!! ! ! continue;!! ! } else {!! ! ! updateRecordIds($s, $data);!! ! ! updateRemote($newRecord, $s);!! ! }! !! } else {!! ! sync($data);!! }! !}!
conflict resolution algorithm (hierarchical data)
remote wins
!! ! …! ! !!! ! if ($newRootRecord->updated > $s->updated) {! ! ! !! ! ! update($s, $newRecord);!! ! ! updateRecordIds($newRootRecord, $data);! ! !! ! ! send($newRootRecord);!! ! ! continue;!! ! } else {!! ! ! updateRecordIds($s, $data);!! ! ! updateRemote($newRecord, $s);!! ! }! !! } else {!! ! sync($data);!! }! !}!
conflict resolution algorithm (hierarchical data)
server wins
conflict resolution algorithm (hierarchical data)
{ ‘lid’: ‘1’, ‘data’ : ‘AAA’ ‘updated’: ’100’ }
{ ‘lid’: ‘2’, ‘parent’: ‘1’, ‘data’ : ‘hello!’, ‘updated’: ’15’ }
c1 serverPOST /merge
conflict resolution algorithm (hierarchical data)
c1
{ ‘lid’: ‘1’, ‘data’ : ‘AAA’ ‘updated’: ’100’ }
{ ‘lid’: ‘2’, ‘parent’: ‘1’, ‘data’ : ‘hello!’, ‘updated’: ’15’ }
serverPOST /merge
{ ‘lid’: ‘1’, ‘guid’ : ‘32ead’, ‘data’ : ‘AAA’ ‘updated’: ’100’ }
conflict resolution algorithm (hierarchical data)
c1
{ ‘lid’: ‘1’, ‘data’ : ‘AAA’ ‘updated’: ’100’ }
{ ‘lid’: ‘2’, ‘parent’: ‘32ead’, ‘data’ : ‘hello!’, ‘updated’: ’15’ }
serverPOST /merge
{ ‘lid’: ‘1’, ‘guid’ : ‘32ead’, ‘data’ : ‘AAA’ ‘updated’: ’100’ }
conflict resolution algorithm (hierarchical data)
c1
{ ‘lid’: ‘1’, ‘data’ : ‘AAA’ ‘updated’: ’100’ }
{ ‘lid’: ‘2’, ‘parent’: ‘32ead’, ‘data’ : ‘hello!’, ‘updated’: ’15’ }
serverPOST /merge
{ ‘lid’: ‘1’, ‘guid’ : ‘32ead’, ‘data’ : ‘AAA’ ‘updated’: ’100’ }
{ ‘lid’: ‘2’, ‘parent’: ‘32ead’, ‘data’ : ‘hello!’, ‘updated’: ’15’ }
{‘update’ : { ‘lid’: ‘1’, ’guid’: ‘af54d’ }}
{‘update’ : { lid: ‘2’, ’guid’: ‘e324f’ }}
!!e.g. “only one temperature can be registered in a given day” !how to we enforce domain constraints on data?
enforcing domain constraints
!!e.g. “only one temperature can be registered in a given day” !how to we enforce domain constraints on data? 1) relax constraints
enforcing domain constraints
!!e.g. “only one temperature can be registered in a given day” !how to we enforce domain constraints on data? 1) relax constraints 2) integrate constraints in sync algorithm
enforcing domain constraints
!!from findByGuid to findSimilar !first lookup by GUID then by domain rules !“two measures are similar if are referred to the same date” !!! !
enforcing domain constraints
enforcing domain constraints
c1 server
enforcing domain constraints
c1 server
{ ’guid’: ‘af54d’, ‘when’: ‘20141005’ }
enforcing domain constraints
c1 server
{ ‘lid’: ‘1’, ‘when’: ‘20141005’ }
{ ’guid’: ‘af54d’, ‘when’: ‘20141005’ }
enforcing domain constraints
c1 server
{ ‘lid’: ‘1’, ‘when’: ‘20141005’ }
{ ’guid’: ‘af54d’, ‘when’: ‘20141005’ }
POST /merge
enforcing domain constraints
c1 server
{ ‘lid’: ‘1’, ‘when’: ‘20141005’ }
{ ’guid’: ‘af54d’, ‘when’: ‘20141005’ }
POST /merge
enforcing domain constraints
c1 server
{ ‘lid’: ‘1’, ‘when’: ‘20141005’ }
{ ’guid’: ‘af54d’, ‘when’: ‘20141005’ }
POST /merge
{ ’guid’: ‘af54d’, ‘when’: ‘20141005’ }
!Binary data uploaded via custom endpoint !Sync data remain small !Uploads can be resumed
dealing with binary data
!Two steps* 1) data are synched to server 2) related images are uploaded !* this means record without file for a given time
dealing with binary data
dealing with binary data
c1 server
POST /merge
POST /upload/ac435-f8345/image
{ ‘lid’ : 1, ‘type’ : ‘baby’, ‘image’ : ‘myimage.jpg’ }
{ ‘lid’ : 1, ‘guid’ : ‘ac435-f8345’ }
!Implementing this stuff is tricky !Explore existing solution if you can !Understanding the domain is important
What we learned
vector clocks
!
Conflict-free Replicated Data Types (CRDTs)
!
Constraining the types of operations in order to:
- ensure convergence of changes to shared data by uncoordinated, concurrent actors
- eliminate network failure modes as a source of error
CRDT
Math!!!
!
Bounded-join semilattices
- join operation defining a least upper bound
- partially order set
- always increasing
CRDT
Gateways handles sync
Data flows through channels
- partition data set
- authorization
- limit the data
!
Use revision trees
Couchbase Mobile
Distributed DB Eventually/Strong Consistency !Data Types !Configurable conflic resolution - db level for built-in data types - application level for custom data
Riak
!
Questions? !
Please leave feedback! https://joind.in/11797 !
That’s all folks!
Vector Clocks http://basho.com/why-vector-clocks-are-easy/ http://www.datastax.com/dev/blog/why-cassandra-doesnt-need-vector-clocks http://basho.com/why-vector-clocks-are-hard/ !CRDTs http://christophermeiklejohn.com/distributed/systems/2013/07/12/readings-in-distributed-systems.html http://www.infoq.com/presentations/problems-distributed-systems https://www.youtube.com/watch?v=qyVNG7fnubQ !Riak http://docs.basho.com/riak/latest/dev/using/conflict-resolution/ !Couchbase Sync Gateway http://docs.couchbase.com/sync-gateway/ http://www.infoq.com/presentations/sync-mobile-data !API http://developers.amiando.com/index.php/REST_API_DataSync https://login.syncano.com/docs/rest/index.html
Links
phones https://www.flickr.com/photos/15216811@N06/14504964841 wat http://uturncrossfit.com/wp-content/uploads/2014/04/wait-what.jpg darth http://www.listal.com/viewimage/3825918h blueprint: http://upload.wikimedia.org/wikipedia/commons/5/5e/Joy_Oil_gas_station_blueprints.jpg!building: http://s0.geograph.org.uk/geophotos/02/42/74/2427436_96c4cd84.jpg!brownfield: http://s0.geograph.org.uk/geophotos/02/04/54/2045448_03a2fb36.jpg!no connection: https://www.flickr.com/photos/77018488@N03/9004800239!no internet con https://www.flickr.com/photos/roland/9681237793!vector clocks: http://en.wikipedia.org/wiki/Vector_clock!crdts: http://www.infoq.com/presentations/problems-distributed-systems
Credits