© 2011Geeknet Inc Rapid and Scalable Development with MongoDB, PyMongo, and Ming Rick
-
Upload
derrick-small -
Category
Documents
-
view
220 -
download
0
description
Transcript of © 2011Geeknet Inc Rapid and Scalable Development with MongoDB, PyMongo, and Ming Rick
© 2011Geeknet Inc
Rapid and Scalable Development with MongoDB,
PyMongo, and Ming
Rick Copeland@rick446
© 2011Geeknet Inc
Getting Acquainted
http://www.flickr.com/photos/fazen/9079179/
- Get started with PyMongo
- Sprinkle in some Ming schemas
- ORM: When a dict just won’t do
© 2011Geeknet Inc
PyMongo: Getting Started>>> import pymongo>>> conn = pymongo.Connection()>>> connConnection('localhost', 27017)
>>> conn.testDatabase(Connection('localhost', 27017), u'test')
>>> conn.test.fooCollection(Database(Connection('localhost', 27017), u'test'),
u'foo')
>>> conn['test-db']Database(Connection('localhost', 27017), u'test-db')
>>> conn['test-db']['foo-collection']Collection(Database(Connection('localhost', 27017), u'test-db'),
u'foo-collection')
>>> conn.test.foo.bar.bazCollection(Database(Connection('localhost', 27017), u'test'),
u'foo.bar.baz')
© 2011Geeknet Inc
PyMongo: Insert / Update / Delete>>> db = conn.test>>> id = db.foo.insert({'bar':1, 'baz':[ 1, 2, {’k':5} ] })>>> idObjectId('4e712e21eb033009fa000000')
>>> db.foo.find()<pymongo.cursor.Cursor object at 0x29c7d50>
>>> list(db.foo.find())[{u'bar': 1, u'_id': ObjectId('4e712e21eb033009fa000000'), u'baz': [1,
2, {k': 5}]}]
>>> db.foo.update({'_id':id}, {'$set': { 'bar':2}})>>> db.foo.find().next(){u'bar': 2, u'_id': ObjectId('4e712e21eb033009fa000000'), u'baz': [1, 2,
{k': 5}]}
>>> db.foo.remove({'_id':id})>>> list(db.foo.find())[ ]
© 2011Geeknet Inc
PyMongo: Queries, Indexes>>> db.foo.insert([ dict(x=x) for x in range(10) ])[ObjectId('4e71313aeb033009fa00000b'), … ]
>>> list(db.foo.find({ 'x': {'$gt': 3} }))[{u'x': 4, u'_id': ObjectId('4e71313aeb033009fa00000f')},
{u'x': 5, u'_id': ObjectId('4e71313aeb033009fa000010')},
{u'x': 6, u'_id': ObjectId('4e71313aeb033009fa000011')}, …]
>>> list(db.foo.find({ 'x': {'$gt': 3} }, { '_id':0 } ))[{u'x': 4}, {u'x': 5}, {u'x': 6}, {u'x': 7}, {u'x': 8},
{u'x': 9}]
>>> list(db.foo.find({ 'x': {'$gt': 3} }, { '_id':0 } ) .skip(1).limit(2))
[{u'x': 5}, {u'x': 6}]
>>> db.foo.ensure_index([ ('x', pymongo.ASCENDING), ('y', pymongo.DESCENDING) ] )
u'x_1_y_-1'
© 2011Geeknet Inc
PyMongo and Locking
One Rule (for now): Avoid Javascripthttp://www.flickr.com/photos/lizjones/
295567490/
© 2011Geeknet Inc
PyMongo: Aggregation et.al. You gotta write Javascript (for now) It’s pretty slow (single-threaded JS engine) Javascript is used by
$where in a query .group(key, condition, initial, reduce, finalize=None) .map_reduce(map, reduce, out, finalize=None, …)
If you shard, you can get some parallelism across multiple mongod instances with .map_reduce() (and possibly ‘$where’). Otherwise you’re single threaded.
© 2011Geeknet Inc
PyMongo: GridFS>>> import gridfs>>> fs = gridfs.GridFS(db)>>> with fs.new_file() as fp:... fp.write('The file')... >>> fp<gridfs.grid_file.GridIn object at 0x2cae910>>>> fp._idObjectId('4e727f64eb03300c0b000003')>>> fs.get(fp._id).read()'The file'
Arbitrary data can be stored in the ‘fp’ object – it’s just a Document (but please put it in ‘fp.metadata’) Mime type Filename
© 2011Geeknet Inc
PyMongo: GridFS Versioning>>> file_id = fs.put('Moar data!', filename='foo.txt')>>> fs.get_last_version('foo.txt').read()'Moar data!’>>> file_id = fs.put('Even moar data!', filename='foo.txt')>>> fs.get_last_version('foo.txt').read()'Even moar data!’>>> fs.get_version('foo.txt', -2).read()'Moar data!’>>> fs.list()[u'foo.txt']>>> fs.delete(fs.get_last_version('foo.txt')._id)>>> fs.list()[u'foo.txt']>>> fs.delete(fs.get_last_version('foo.txt')._id)>>> fs.list()[]
© 2011Geeknet Inc
- Get started with PyMongo
- Sprinkle in some Ming schemas
- ORM: When a dict just won’t do
© 2011Geeknet Inc
Why Ming? Your data has a schema
Your database can define and enforce it It can live in your application (as with MongoDB) Nice to have the schema defined in one place in the code
Sometimes you need a “migration” Changing the structure/meaning of fields Adding indexes, particularly unique indexes Sometimes lazy, sometimes eager
“Unit of work:” Queuing up all your updates can be handy
Python dicts are nice; objects are nicer
© 2011Geeknet Inc
Ming: Engines & Sessions>>> import ming.datastore>>> ds = ming.datastore.DataStore('mongodb://localhost:27017',
database='test')
>>> ds.dbDatabase(Connection('localhost', 27017), u'test')
>>> session = ming.Session(ds)>>> session.dbDatabase(Connection('localhost', 27017), u'test')
>>> ming.configure(**{'ming.main.master':'mongodb://localhost:27017', 'ming.main.database':'test'})
>>> Session.by_name('main').dbDatabase(Connection(u'localhost', 27017), u'test')
© 2011Geeknet Inc
Surprising Data
http://www.flickr.com/photos/pictureclara/5333266789/
© 2011Geeknet Inc
Ming: Define Your Schema
from ming import schema, Field
WikiDoc = collection(‘wiki_page', session, Field('_id', schema.ObjectId()), Field('title', str, index=True), Field('text', str))CommentDoc = collection(‘comment', session, Field('_id', schema.ObjectId()), Field('page_id', schema.ObjectId(), index=True), Field('text', str))
© 2011Geeknet Inc
Ming: Define Your Schema…Once more, with feeling
from ming import Document, Session, Fieldclass WikiDoc(Document): class __mongometa__: session=Session.by_name(’main')
name='wiki_page’
indexes=[ ('title') ]
title = Field(str)
text = Field(str)
…
Old declarative syntax continues to exist and be supported, but it’s not being actively improved
Sometimes nice when you want additional methods/attrs on your document class
© 2011Geeknet Inc
Ming: Use Your Schema>>> doc = WikiDoc(dict(title='Cats', text='I can haz cheezburger?'))
>>> doc.m.save()>>> WikiDoc.m.find()<ming.base.Cursor object at 0x2c2cd90>
>>> WikiDoc.m.find().all()[{'text': u'I can haz cheezburger?', '_id': ObjectId('4e727163eb03300c0b000001'), 'title': u'Cats'}]
>>> WikiDoc.m.find().one().textu'I can haz cheezburger?’
>>> doc = WikiDoc(dict(tietul='LOL', text='Invisible bicycle'))>>> doc.m.save()Traceback (most recent call last): File "<stdin>", line 1, …
ming.schema.Invalid: <class 'ming.metadata.Document<wiki_page>'>: Extra keys: set(['tietul'])
© 2011Geeknet Inc
Ming Bonus:Mongo-in-Memory
>>> ming.datastore.DataStore('mim://', database='test').dbmim.Database(test)
MongoDB is (generally) fast … except when creating databases … particularly when you preallocate
Unit tests like things to be isolated
MIM gives you isolation at the expense of speed & scaling
© 2011Geeknet Inc
- Get started with PyMongo
- Sprinkle in some Ming schemas
- ORM: When a dict just won’t do
© 2011Geeknet Inc
Ming ORM: Classes and Collections from ming import schema, Fieldfrom ming.orm import (mapper, Mapper, RelationProperty,
ForeignIdProperty)
WikiDoc = collection(‘wiki_page', session, Field('_id', schema.ObjectId()), Field('title', str, index=True), Field('text', str))CommentDoc = collection(‘comment', session, Field('_id', schema.ObjectId()), Field('page_id', schema.ObjectId(), index=True), Field('text', str))
class WikiPage(object): passclass Comment(object): pass
ormsession.mapper(WikiPage, WikiDoc, properties=dict( comments=RelationProperty('WikiComment')))ormsession.mapper(Comment, CommentDoc, properties=dict( page_id=ForeignIdProperty('WikiPage'), page=RelationProperty('WikiPage')))
© 2011Geeknet Inc
Ming ORM: Classes and Collections (declarative)
class WikiPage(MappedClass): class __mongometa__: session = main_orm_session name='wiki_page’ indexes = [ 'title' ]
_id=FieldProperty(S.ObjectId) title = FieldProperty(str) text = FieldProperty(str) comments = RelationProperty(‘Comment’)
class Comment(MappedClass): class __mongometa__: session = main_orm_session name='comment’ indexes = [ 'page_id' ]
_id=FieldProperty(S.ObjectId) page_id = ForeignIdProperty(WikiPage) page = RelationProperty(WikiPage) text = FieldProperty(str)
© 2011Geeknet Inc
Ming ORM: Sessions and Queries Session ORMSession My_collection.m… My_mapped_class.query… ORMSession actually does stuff
Track object identity Track object modifications Unit of work flushing all changes at once
>>> pg = WikiPage(title='MyPage', text='is here')>>> session.db.wiki_page.count()0
>>> main_orm_session.flush()>>> session.db.wiki_page.count()1
© 2011Geeknet Inc
Ming Plugins
http://www.flickr.com/photos/39747297@N05/5229733647/
© 2011Geeknet Inc
Ming ORM: Extending the Session Various plug points in the session
before_flush after_flush
Some uses Logging changes to sensitive data or for
analytics purposes Full-text search indexing “last modified” fields Performance instrumentation
© 2011Geeknet Inc
Ming ORM: Extending the Mapper Various plug points in the mapper
before_/after_: Insert Update Delete Remove
Some uses Collection/model-specific logging (user creation,
etc.) Anything you might want a SessionExtension for
but would rather do per-model
Related Projects
Minghttp://sf.net/projects/
merciless/MIT License
Zarkovhttp://sf.net/p/zarkov/
Apache License
Allurahttp://sf.net/p/allura/
Apache License
PyMongohttp://
api.mongodb.org/python
Apache License
© 2011Geeknet Inc
Rick Copeland@rick446
[email protected]://www.flickr.com/photos/f-oxymoron/
5005673112/