Introducing-MongoDB1
-
Upload
gautamojha -
Category
Documents
-
view
213 -
download
0
Transcript of Introducing-MongoDB1
-
8/7/2019 Introducing-MongoDB1
1/57
Introducing:
MongoDBDavid J. C. Beach
Sunday, August 1, 2010
-
8/7/2019 Introducing-MongoDB1
2/57
David Beach
Software Consultant (past 6 years)
Python since v1.4 (late 90s)Design, Algorithms, Data Structures
Sometimes Database stuff
not a frameworks guy
Organizer: Front Range Pythoneers
Sunday, August 1, 2010
-
8/7/2019 Introducing-MongoDB1
3/57
Outline
Part I: Trends in Databases
Part II: Mongo Basic Usage
Part III: Advanced Features
Sunday, August 1, 2010
-
8/7/2019 Introducing-MongoDB1
4/57
Part I:Trends in Databases
Sunday, August 1, 2010
-
8/7/2019 Introducing-MongoDB1
5/57
Database Trends
Past: Relational (RDBMS)
Data stored in Tables, Rows, Columns
Relationships designated by Primary, Foreign
keysData is controlled & queried via SQL
Sunday, August 1, 2010
-
8/7/2019 Introducing-MongoDB1
6/57
Trends:
Criticisms of RDBMSRigid data model
Hard to scale / distribute
Slow (transactions, disk seeks)
SQL not well standardized
Awkward for modern/dynamic languages
Sunday, August 1, 2010
-
8/7/2019 Introducing-MongoDB1
7/57
Trends:
FragmentationRelational with ORM (Hibernate, SQLAlchemy)
ODBMS / ORDBMS (push OO-concepts into database)
Key-Value Stores (MemcacheDB, Redis, Cassandra)
Graph (neo4j)Document Oriented (Mongo, Couch, etc...)
Sunday, August 1, 2010
-
8/7/2019 Introducing-MongoDB1
8/57
Where Mongo Fits
The Best Features ofDocument Databases,
Key-Value Stores,
and RDBMSes.
Sunday, August 1, 2010
-
8/7/2019 Introducing-MongoDB1
9/57
What is Mongo
Document-Oriented Database
Produced by 10gen / Implemented in C++Source Code Available
Runs on Linux, Mac, Windows, Solaris
Database: GNU AGPL v3.0 License
Drivers: Apache License v2.0
Sunday, August 1, 2010
http://www.10gen.com/http://www.10gen.com/http://www.10gen.com/ -
8/7/2019 Introducing-MongoDB1
10/57
Mongo
Advantagesjson-style documents(dynamic schemas)
flexible indexing (B-Tree)
replication and high-availability (HA)
automatic shardingsupport (v1.6)*
easy-to-use API
fast queries (auto-tuningplanner)
fast insert & deletes(sometimes trade-offs)
Sunday, August 1, 2010
-
8/7/2019 Introducing-MongoDB1
11/57
Mongo
Language Bindings
C, C++, JavaPython, Ruby, Perl
PHP, JavaScript
(many more community supported ones)
Sunday, August 1, 2010
-
8/7/2019 Introducing-MongoDB1
12/57
Mongo
Disadvantages
No Relational Model / SQL
No Explicit Transactions / ACID
Limited Query API
Sunday, August 1, 2010
-
8/7/2019 Introducing-MongoDB1
13/57
When to use Mongo
Rich semistructured records (Documents)
Transaction isolation not essential
Humongous amounts of data
Need for extreme speed
You hate schema migrations
Sunday, August 1, 2010
-
8/7/2019 Introducing-MongoDB1
14/57
Part II:Mongo Basic Usage
Sunday, August 1, 2010
-
8/7/2019 Introducing-MongoDB1
15/57
Installing Mongo
Use a 64-bit OS (Linux, Mac, Windows)
Get Binaries: www.mongodb.org
Run mongod process
Sunday, August 1, 2010
http://www.mongodb.org/http://www.mongodb.org/http://www.mongodb.org/ -
8/7/2019 Introducing-MongoDB1
16/57
Installing PyMongo
Download: http://pypi.python.org/pypi/pymongo/1.7
Build with setuptools
(includes C extension for speed)
# python setup.py install
# python setup.py --no-ext install
Sunday, August 1, 2010
http://pypi.python.org/pypi/pymongo/1.7http://pypi.python.org/pypi/pymongo/1.7 -
8/7/2019 Introducing-MongoDB1
17/57
Mongo Anatomy
Database
Collection
Document
Mongo Server
Sunday, August 1, 2010
-
8/7/2019 Introducing-MongoDB1
18/57
>>> import pymongo
>>> connection = pymongo.Connection(localhost)
Getting a Connection
Connection required for using Mongo
Sunday, August 1, 2010
-
8/7/2019 Introducing-MongoDB1
19/57
>>> db = connection.mydatabase
Finding a Database
Databases = logically separate stores
Navigation using propertiesWill create DB if not found
Sunday, August 1, 2010
-
8/7/2019 Introducing-MongoDB1
20/57
>>> blog = db.blog
Using a Collection
Collection is analogous to Table
Contains documentsWill create collection if not found
Sunday, August 1, 2010
-
8/7/2019 Introducing-MongoDB1
21/57
>>> entry1 = {title: Mongo Tutorial,
body: Heres a document to insert. }
>>> blog.insert(entry1)
ObjectId('4c3a12eb1d41c82762000001')
Inserting
collection.insert(document) => document_id
document
Sunday, August 1, 2010
-
8/7/2019 Introducing-MongoDB1
22/57
>>> entry1
{'_id': ObjectId('4c3a12eb1d41c82762000001'),
'body': "Here's a document to insert.",
'title': 'Mongo Tutorial'}
Inserting (contd.)
Documents must have _id field
Automatically generated unless assigned12-byte unique binary value
Sunday, August 1, 2010
-
8/7/2019 Introducing-MongoDB1
23/57
>>> entry2 = {"title": "Another Post",
"body": "Mongo is powerful",
"author": "David",
"tags": ["Mongo", "Power"]}
>>> blog.insert(entry2)
ObjectId('4c3a1a501d41c82762000002')
Inserting (contd.)
Documents may have different properties
Properties may be atomic, lists, dictionaries
another documentSunday, August 1, 2010
-
8/7/2019 Introducing-MongoDB1
24/57
>>> blog.ensure_index(author)
>>> blog.ensure_index(tags)
Indexing
May create index on any field
If field is list => index associates all values
index by single value
by multiple values
Sunday, August 1, 2010
-
8/7/2019 Introducing-MongoDB1
25/57
bulk_entries = [ ]
for i in range(100000):
entry = { "title": "Bulk Entry #%i" % (i+1),
"body": "What Content!",
"author": random.choice(["David", "Robot"]),
"tags": ["bulk",
random.choice(["Red", "Blue", "Green"])]
}
bulk_entries.append(entry)
Bulk Insert
Lets produce 100,000 fake posts
Sunday, August 1, 2010
-
8/7/2019 Introducing-MongoDB1
26/57
>>> blog.insert(bulk_entries)
[ObjectId(...), ObjectId(...), ...]
Bulk Insert (contd.)
collection.insert(list_of_documents)
Inserts 100,000 entries into blogReturns in 2.11 seconds
Sunday, August 1, 2010
-
8/7/2019 Introducing-MongoDB1
27/57
>>> blog.remove() # clear everything
>>> blog.insert(bulk_entries, safe=True)
Bulk Insert (contd.)
returns in 7.90 seconds (vs. 2.11 seconds)
driver returns early; DB is still working...unless you specify safe=True
Sunday, August 1, 2010
-
8/7/2019 Introducing-MongoDB1
28/57
>>> blog.find_one({title: Bulk Entry #12253})
{u'_id': ObjectId('4c3a1e411d41c82762018a89'),u'author': u'Robot',
u'body': u'What Content!',
u'tags': [u'bulk', u'Green'],
u'title': u'Bulk Entry #99999'}
Querying
collection.find_one(spec) => document
spec = document of query parameters
Sunday, August 1, 2010
-
8/7/2019 Introducing-MongoDB1
29/57
>>> blog.find_one({title: Bulk Entry #12253,
tags: Green})
{u'_id': ObjectId('4c3a1e411d41c82762018a89'),
u'author': u'Robot',
u'body': u'What Content!',
u'tags': [u'bulk', u'Green'],
u'title': u'Bulk Entry #99999'}
Querying
(Specs)Multiple conditions on document => AND
Value for tags is an ANY match
Sunday, August 1, 2010
-
8/7/2019 Introducing-MongoDB1
30/57
>>> green_items = [ ]
>>> for item in blog.find({tags: Green}):
green_items.append(item)
Querying
(Multiple)collection.find(spec) => cursor
new items are fetched in bulk (behind thescenes)
>>> green_items = list(blog.find({tags: Green}))
- or -
Sunday, August 1, 2010
-
8/7/2019 Introducing-MongoDB1
31/57
>>> blog.find({"tags": "Green"}).count()
16646
Querying
(Counting)Use the find() method + count()
Returns number of matches found
Sunday, August 1, 2010
-
8/7/2019 Introducing-MongoDB1
32/57
>>> item = blog.find_one({title: Bulk Entry #12253})>>> item.tags.append(New)
>>> blog.update({_id: item[_id]}, item)
Updating
collection.update(spec, document)
updates single document matching spec
multi=True => updates all matching docs
Sunday, August 1, 2010
-
8/7/2019 Introducing-MongoDB1
33/57
>>> blog.remove({"author":"Robot"}, safe=True)
Deleting
use remove(...)
it works like find(...)
Sunday, August 1, 2010
-
8/7/2019 Introducing-MongoDB1
34/57
Part III:Advanced Features
Sunday, August 1, 2010
-
8/7/2019 Introducing-MongoDB1
35/57
Advanced Querying
Regular Expressions
{tag : re.compile(r^Green|Blue$)}
Nested Values {foo.bar.x : 3}
$where Clause (JavaScript)
Sunday, August 1, 2010
-
8/7/2019 Introducing-MongoDB1
36/57
>>> blog.find({$or: [{tags: Green}, {tags:
Blue}]})
Advanced Querying
$lt, $gt, $lte, $gte, $ne
$in, $nin, $mod, $all, $size, $exists, $type
$or, $not
$elemmatch
Sunday, August 1, 2010
-
8/7/2019 Introducing-MongoDB1
37/57
>>> blog.find().limit(50) # find 50 articles
>>> blog.find().sort(title).limit(30) # 30 titles
>>> blog.find().distinct(author) # unique author names
Advanced Querying
collection.find(...)
sort(name) - sortinglimit(...) & skip(...) [like LIMIT & OFFSET]
distinct(...) [like SQLs DISTINCT]
collection.group(...) - like SQLs GROUP BY
Sunday, August 1, 2010
-
8/7/2019 Introducing-MongoDB1
38/57
Map/Reduce
collection.map_reduce(mapper, reducer)ultimate in querying power
distribute across multiple nodes
Sunday, August 1, 2010
-
8/7/2019 Introducing-MongoDB1
39/57
Map/Reduce
Visualized
Diagram Credit:
by Tom White; OReilly BooksChapter 2, page 20
also see:Map/Reduce : A Visual Explanation
1 2 3
Sunday, August 1, 2010
http://ayende.com/Blog/archive/2010/03/14/map-reduce-ndash-a-visual-explanation.aspxhttp://ayende.com/Blog/archive/2010/03/14/map-reduce-ndash-a-visual-explanation.aspxhttp://ayende.com/Blog/archive/2010/03/14/map-reduce-ndash-a-visual-explanation.aspx -
8/7/2019 Introducing-MongoDB1
40/57
db.runCommand({
mapreduce: "DenormAggCollection",
query: {
filter1: { '$in': [ 'A', 'B' ] },
filter2: 'C',
filter3: { '$gt': 123 }},
map: function() { emit(
{ d1: this.Dim1, d2: this.Dim2 },
{ msum: this.measure1, recs: 1, mmin: this.measure1,
mmax: this.measure2 < 100 ? this.measure2 : 0 }
);},
reduce: function(key, vals) {
var ret = { msum: 0, recs: 0, mmin: 0, mmax: 0 };
for(var i = 0; i < vals.length; i++) {
ret.msum += vals[i].msum;
ret.recs += vals[i].recs;
if(vals[i].mmin < ret.mmin) ret.mmin = vals[i].mmin;
if((vals[i].mmax < 100) && (vals[i].mmax > ret.mmax))
ret.mmax = vals[i].mmax;
}
return ret;
},
finalize: function(key, val) {
val.mavg = val.msum / val.recs;
return val;
},out: 'result1',
verbose: true
});
db.result1.
find({ mmin: { '$gt': 0 } }).
sort({ recs: -1 }).
skip(4).
limit(8);
SELECT
Dim1, Dim2,
SUM(Measure1) AS MSum,
COUNT(*) AS RecordCount,
AVG(Measure2) AS MAvg,
MIN(Measure1) AS MMin MAX(CASE
WHEN Measure2 < 100
THEN Measure2
END) AS MMax
FROM DenormAggTable
WHERE (Filter1 IN (A,B))
AND (Filter2 = C)
AND (Filter3 > 123)
GROUP BY Dim1, Dim2
HAVING (MMin > 0)
ORDER BY RecordCount DESC
LIMIT4, 8
!
"
#
$
%
!
&
'
!
"
#
$
%
()*+,-./.01-230*2/4*5+123/6)-/,+55-./
*+7/63/8-93/02/7:-/16,/;+2470*2-/*;/7:-/?*)802=/3-7@
A-63+)-3/1+37/B-/162+6559/6==)-=67-.@
C==)-=67-3/.-,-2.02=/*2/)-4*)./4*+273/
1+37/?607/+2705/;02650>670*2@
A-63+)-3/462/+3-/,)*4-.+)65/5*=04@
D057-)3/:6E-/62/FGAHC470E-G-4*).I
5**802=/3795-@
' C==)-=67-/;057-)02=/1+37/B-/6,,50-./7*/
7:-/)-3+57/3-7
-
8/7/2019 Introducing-MongoDB1
41/57
Map/ReduceExamples
Sunday, August 1, 2010
-
8/7/2019 Introducing-MongoDB1
42/57
Health Clinic Example
Person registers with the Clinic
Weighs in on the scale
1 year => comes in 100 times
Sunday, August 1, 2010
-
8/7/2019 Introducing-MongoDB1
43/57
Health Clinic Example
person = { name: Bob,
! weighings: [! ! {date: date(2009, 1, 15), weight: 165.0},! ! {date: date(2009, 2, 12), weight: 163.2},! ! ... ]}
Sunday, August 1, 2010
/
-
8/7/2019 Introducing-MongoDB1
44/57
for i in range(N):
person = { 'name': 'person%04i' % i }
weighings = person['weighings'] = [ ]
std_weight = random.uniform(100, 200)for w in range(100):
date = (datetime.datetime(2009, 1, 1) +
datetime.timedelta(
days=random.randint(0, 365))
weight = random.normalvariate(std_weight, 5.0)
weighings.append({ 'date': date,'weight': weight })
weighings.sort(key=lambda x: x['date'])
all_people.append(person)
Map/Reduce
Insert Script
Sunday, August 1, 2010
-
8/7/2019 Introducing-MongoDB1
45/57
Insert Data
Performance
1
10
100
1000
1k 10k 100k
3.14s
29.5s
292s
Insert
Sunday, August 1, 2010
/
-
8/7/2019 Introducing-MongoDB1
46/57
map_fn = Code("""function () {
this.weighings.forEach(function(z) {
emit(z.date, z.weight);
});
}""")
reduce_fn = Code("""function (key, values) {
var total = 0;
for (var i = 0; i < values.length; i++) {
total += values[i];
}return total;
}""")
result = people.map_reduce(map_fn, reduce_fn)
Map/Reduce
Total Weight by Day
Sunday, August 1, 2010
/ d
-
8/7/2019 Introducing-MongoDB1
47/57
>>> for doc in result.find():
print doc
{u'_id': datetime.datetime(2009, 1, 1, 0, 0), u'value':
39136.600753163315}
{u'_id': datetime.datetime(2009, 1, 2, 0, 0), u'value':
41685.341024046182}
{u'_id': datetime.datetime(2009, 1, 3, 0, 0), u'value':
38232.326554504165}
... lots more ...
Map/Reduce
Total Weight by Day
Sunday, August 1, 2010
t l i t b
-
8/7/2019 Introducing-MongoDB1
48/57
Total Weight by Day
Performance
1
10
100
1000
1k 10k 100k
4.29s
38.8s
384s
MapReduce
Sunday, August 1, 2010
/R d
-
8/7/2019 Introducing-MongoDB1
49/57
map_fn = Code("""function () {
var target_date = new Date(2009, 9, 5);
var pos = bsearch(this.weighings, "date",
target_date);
var recent = this.weighings[pos];emit(this._id, { name: this.name,
date: recent.date,
weight: recent.weight });
};""")
reduce_fn = Code("""function (key, values) {return values[0];
};""")
result = people.map_reduce(map_fn, reduce_fn,
scope={"bsearch": bsearch})
Map/Reduce
Weight on Day
Sunday, August 1, 2010
M /R d
-
8/7/2019 Introducing-MongoDB1
50/57
bsearch = Code("""function(array, prop, value) {
var min, max, mid, midval;
for(min = 0, max = array.length - 1; min midval) {
min = mid + 1;
} else {max = mid - 1;
}
}
return (midval > value) ? mid - 1 : mid;
};""")
Map/Reduce
bsearch() function
Sunday, August 1, 2010
W i ht D
-
8/7/2019 Introducing-MongoDB1
51/57
Weight on Day
Performance
1
10
100
1000
1k 10k 100k1.23s
10s
108s
MapReduce
Sunday, August 1, 2010
W i ht D
-
8/7/2019 Introducing-MongoDB1
52/57
target_date = datetime.datetime(2009, 10, 5)
for person in people.find():
dates = [ w['date'] for w in person['weighings'] ]
pos = bisect.bisect_right(dates, target_date)
val = person['weighings'][pos]
Weight on Day
(Python Version)
Sunday, August 1, 2010
M /R d
-
8/7/2019 Introducing-MongoDB1
53/57
Map/Reduce
Performance
0.1
1
10
100
1000
1k 10k 100k
0.37s
2.2s
26s
1.23s
10s
108s
MapReduce Python
Sunday, August 1, 2010
-
8/7/2019 Introducing-MongoDB1
54/57
Summary
Sunday, August 1, 2010
R
-
8/7/2019 Introducing-MongoDB1
55/57
Resources
www.10gen.com
www.mongodb.org
MongoDBThe Definitive Guide
OReilly
api.mongodb.org/python
PyMongo
Sunday, August 1, 2010
http://www.10gen.com/http://api.mongodb.org/python/http://api.mongodb.org/python/http://www.mongodb.org/http://www.mongodb.org/http://www.10gen.com/http://www.10gen.com/ -
8/7/2019 Introducing-MongoDB1
56/57
END OF SLIDES
Sunday, August 1, 2010
Ch lkb d
-
8/7/2019 Introducing-MongoDB1
57/57
Chalkboard
is notComic Sans
This is Chalkboard, not Comic Sans.
This isnt Chalkboard, its Comic Sans.
does it matter, anyway?