Prophet - Beijing Perl Workshop
-
Upload
jesse-vincent -
Category
Technology
-
view
1.166 -
download
3
Transcript of Prophet - Beijing Perl Workshop
You may know me from...
RT (Request Tracker)JiftySVKHiveminderPerl 6T-shirts
I’ve been hacking on an open source database
called “Prophet”
It has an API like Amazon SimpleDB or Google App Engine’s...
It’s designed for “team-scale” apps
It’s built for P2P replication and
disconnected use
But first, a brief digression...
...about cloud computing
☁☔
Living in the cloud =
sharecropping(佃农)
(That’s bad)
☹
The bad old days:
Pic of sharecroppers
You farmed land you didn’t own...
...with tools you couldn’t really afford
You paid for it with part of your harvest...
It sounded like apretty sweet deal...
...until things got bad
(Things always got bad)
(Internet)
(Cloud)
(Internet)
In a bad year, you got further in debt tothe land owner
So, what does this have to do with software?
The (more recent)bad old days:
pic of mainframes
You ran code you didn’t own on hardware you
didn’t own
Things started to get better in the 1980s
Pic of PCs
Users started to be able to make choices about computing...
They weren’t all rosy
Pic of BSOD
Sometimes new versions of software
broke things...
...leaving you locked in to old versions
pic of win 31?
Things got ‘better’
rmsche
Now, things are getting worse again...
What happens when your favorite service
goes down?
pic of twitter being down
...or stops accepting new signups?
...or starts making arbitrary choices about what’s ‘safe’ content?
...or breaks
You don’t own the services you use
You probably don’t even have a contract
When a service provider cuts you off,
you lose
Not so secret shame:
I still need the cloud
My calendar lives at google.com
I make a Web 2.0 todo list service called Hiveminder.com
pic of hiveminder
☣ Using hosted appsis going to hurt you! ☣
What about Google Gears, Adobe Air, etc?
Great. now you can use your word processer while you’re offline!
Pic of wordperfect
Real offline appsshould not need servers
Real offline appsshould sync like you do
Back to that database thing...
Jesse Vincent
Chia-liang Kao
We work together
CL lives in TaipeiJesse lives in Boston
Sometimes we needto work face to face
TPE - BOS:TPE - HNL:BOS - HNL:
9410 mi5,095 mi5,069 mi
Step 1: Go to Hawaii for “work”Step 2: ???Step 3: Prophet!
Our Plan
The Plan Backfired
We were there for 8 days
We wrote 8000 lines of Perl
We figured out step 2
Step 2:
Build a Disconnected Syncable Database
Fallacies of Distributed Computing
1. The network is reliable.
2. Latency is zero.
3. Bandwidth is infinite.
4. The network is secure.
5. Topology doesn't change.
6. There is one administrator.
7. Transport cost is zero.
8. The network is homogeneous.
A grounded, peer to peer replicated, disconnected, versioned, property database with self-healing conflict resolution
Prophet
What do all thosebuzzwords mean?
grounded
Runs here
grounded
Not here
grounded
Runs at the edge
Doesn’t need to run in the cloud
Syncs with services you already use
(Adaptors talk to “Foreign Replicas”)
Update any replica
Pull from any replica
Push to any replica
Publish a replica
Changes will propagate
peer-to-peer replicated
Real-time replication is hard to scale
It only “works” with constant connectivity
I don’t have constant connectivity
Neither do you
Prophet sync can happen whenever
disconnected
Every update is recorded as a change set
Change sets don’t lose any data
(so you can use them to go backwards)
All history is introspectable
Replication just replays changesets
versioned
Atomic operations
CREATE, READ, UPDATE, DELETE, SEARCH
Record types can have optional validation and canonicalization
Records of the same type do not need to have the same properties
Add and remove properties at will
property database
Remembers all conflict resolutions
Syncs all resolutions with your peers
Detects identical conflicts
Uses your peers’ resolutions to “vote” for the winner of a conflict
self-healing conflict resolution
Working with Prophet
RESTy API
GET /records.json
GET /records/Cars.json
GET /records/Cars/716499-5F9-4AC4-827.json
GET /records/Cars/716499-5F9-4AC4-827/wheels.json
POST /records/Cars.json
POST /records/Cars/716499-5F9-4AC4-827.json
POST /records/Cars/716499-5F9-4AC4-827/wheels.json
RESTy API
Yes, we should be using PUT and DELETE
(or just provide an OpenRESTY adaptor)
Yes, you can have a commit bit and help us fix it :)
Native API(Yes, the core is Perl.)
my $cli = Prophet::CLI->new();
my $cxn = $cli->app_handle->handle;
my $record = Prophet::Record->new( handle => $cxn, type => 'Person' );
my $uuid = $record->create( props => { name => 'Jesse', age => 31 } );
$record->set_prop( name => 'age', value => 32 );
my $people = Prophet::Collection->new( handle => $cxn, type => 'Person' );
$people->matching( sub { shift->prop('species') ne 'cat' } );
What could you build with Prophet?
A bug tracker: “simple defects”
• id. Status, Summary
• (Arbitrary other properties too)
•History
•Comments
•Attachments
sd
Initialize
./bin/sd init
./bin/sd ticket create -- summary="Can't sync sd with Google Code" status=new
Created ticket 5 (93BF979E-08C1-11DD-94C3-D4B1FCEE7EC4)
Create
./bin/sd ticket search --regex publish
29 } new the online help doesn't describe publish
34 } new publish a static html view of records
35 } new publish should create a static rss file
List and Search
./bin/sd ticket update --uuid 93BF979E-08C1-11DD-94C3-D4B1FCEE7EC4 -- status=resolved
Updates
Bugs on my laptop aren’t interesting.
Jesse
sd publish --to fsck.com:public_html/sd/
CL
sd clone --from http://my.com/~jesse/sd
Sync!
Jesse
sd server
CL
sd pull --local
Hackathon mode!
My project has a bug tracker
Actually, mine use two:
• RT
• hiveminder.com
My project has a bug tracker
Foreign Replicas
Prophet makes Foreign Replicas easy
SD gets them "for free"
(Using only the public REST API)
It took an afternoon
Mirror an RT instance into SD
Share it with your peers using prophet
Sync changes back from your peers to RT
Supports Comments and Attachments
Wrote an RT Replica for SD
(Using only the public REST API)
...and one for Hiveminder
I can sync my bugs with RT or Hiveminder
Actually, it’s better
I can sync between RT and Hiveminder
I can sync between two different RTs, too
• Trac
• Launchpad
• Google Code
• SourceForge
• Bugzilla
• Jira
• GForge
• debbugs
• GNATS
• todo.txt
• Lighthouse
• Redmine
• FogBugz
• What else?
We need more replica definitions:
What else can you use Prophet for?
•CRM
•Bug tracking
•Sales orders
•Phone book
•Blog
•Trading Card Database
•Ideas?
All the databases you want while offline.
How about a P2P BBS?
Prophet doesn’t need a server.
You can sync over sneakernet.
“Private” Social Networks
A look inside Prophet
Anatomy of a Prophet Replica
The bits and pieces
Database UUID
Replica UUID
Record Store
Changeset Store
Resolution Database
Configuration metadata
The Record Store
Stores individual records by type
Not guaranteed to have all old versions
The Changeset Store
Stores every change to a set of records
Guaranteed to have all old changesets
Replaying all changesets will create an exact clone of the replica
Replica Backends
Filesystem
Readable
Flat files
Compact
Fast
(Not yet fully atomic)
HTTP
Designed to let you “publish” databases
Flat-files, Currently read-only.
Same format as the filesystem replica type.
Backends are pluggable!
The filesystem is cheap and easy
The filesystem is portable
Help us write new backends:
CouchDB, Postgres, SQLite, MySQL, S3, AppEngine, $YOUR_FAVORITE_DB
Prophet is designed to sync with “other” databases and systems
They don’t need to support all of Prophet’s features - Prophet knows how to interpret mumbo-jumbo from the Cloud
Foreign Replicas will usually be app specific
All current examples are for SD
Foreign Replicas
Synchronization
Publish
Serialize and export all of a replica's resolutions and changesets
Pull
Integrate unseen resolutions and then unseen changesets from a replica
Push
Integrate new resolutions and changesets into a replica
Conflicts
Figures out the best resolution
“Nullifies” the conflict so the changeset can be cleanly integrated
Integrates the conflicting changeset
Records the resolution as a new changeset
Records the resolution decision in the resolution database
Resolving Conflicts
Prophet has clever ways to figure out the best resolution.
If there are previous resolutions for the same conflict and a majority agree, use that
If the merger has specified a “prefer this side” choice, use that
Prompt the user to make a decision, giving them info about previous decisions for this conflict
“The Best Resolution”
Scaling
Scaling to giant clusters is boring
(Can I play the “They’re not Green” card here?)
Scales to many weakly connected peers
You are not Google
Does anyone here work for Google?
Current target is databases of O(50k) records
How does it scale?
We have a political agenda.
Cloud computing is not Open.
APIs for “export” are not good enough.
You should always have full control.
You probably don’t need to store 10 billion records in one database.
Why not, then?
Do you have 10 billion bugs, customer contacts
or sales orders?
Can I come work for you?
We would love a scalable, high performance Prophet backend
Getting Involved
Project Status
Simple, well-defined Perl API
RESTy web API (with microserver)
Fast, lightweight backend
Small, active dev community
Great test coverage
...less than great documentation coverage
Better ergonomics
Improved search and indexing
(Including full-text indexing)
Client libraries for other languages
Proper security model
More apps
Our Plans
Prophet
8225 lines of code and doc
2120 lines of tests
sd
2751 lines of code and doc
1121 lines of tests
Codebase
Prophet is very young
Prophet designed in April
Prophet core implemented in April
SD designed in April
SD built in June and July
We need your help!
Kick-ass functional and text indexing
Backend data store improvements
Slick GUIs for syncing
More Foreign Replicas for SD
Documentation improvements
A clever logo
New applications
Prophet
http://syncwith.us/prophet/download
SD
http://syncwith.us/sd/download
Getting Prophet