Introduction to MongoDB
-
Upload
sharmapradyumn -
Category
Software
-
view
173 -
download
0
description
Transcript of Introduction to MongoDB
Pragati Software Pvt. Ltd., 207, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 1
Introduction to MongoDB
Pradyumn Sharma Pragati Software Pvt. Ltd., India
[email protected] www.pragatisoftware.com
Pragati Software Pvt. Ltd., 207, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 2
What is NoSQL?
• Generic term … • … for various non-relational database alternatives
Pragati Software Pvt. Ltd., 207, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 3
Modern Applications Require…
1. Storage and handling of huge volumes of data Twitter (March 2013): 200 million active users, 400 million tweets
per day Facebook (Aug 2012): 100 PB of data as on that date; >500 TB of
data per day, 2.5 billion pieces of content, 2.7 billion Like actions, 300 million photos
Wal-Mart: 1 million customer transactions per hour And the grand-daddy of all: Google RDBMS: Big challenge RDBMS: Scalability beyond to a point is not practical or economical
Pragati Software Pvt. Ltd., 207, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 4
Modern Applications Require…
1. Storage and handling of huge volumes of data 2. Very high level of performance
RDBMS: Lack of linear performance Normalized tables lead to many joins => drop in performance
Pragati Software Pvt. Ltd., 207, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 5
Modern Applications Require…
1. Storage and handling of huge volumes of data 2. Very high level of performance 3. 100% uptime with no single point of failure
RDBMS: single point of failure Typically master-slave architecture Not designed for multi-DC, geo-clusters, cloud
Pragati Software Pvt. Ltd., 207, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 6
Modern Applications Require…
1. Storage and handling of huge volumes of data 2. Very high level of performance 3. 100% uptime with no single point of failure 4. Ease of managing the database
RDBMS: complex to manage Complex, old architecture, often requiring lot of administration and
tuning work
Pragati Software Pvt. Ltd., 207, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 7
Modern Applications Require…
1. Storage and handling of huge volumes of data 2. Very high level of performance 3. 100% uptime with no single point of failure 4. Ease of managing the database 5. Flexibility in schema design
RDBMS: Not easy to change schema online Limited support for new data type needs
Pragati Software Pvt. Ltd., 207, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 8
Modern Applications Require…
1. Storage and handling of huge volumes of data 2. Very high level of performance 3. 100% uptime with no single point of failure 4. Ease of managing the database 5. Flexibility in schema design
Pragati Software Pvt. Ltd., 207, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 9
What is NoSQL?
• Architected from ground-up, with considerations for High performance Huge data volumes => high, linear scalability High availability High flexibility of database structure
• Don’t use the relational model at all, or use very little of it • Compromise on various features of RDBMS (including joins,
normalization, ACID transactions in most cases) • Schema-less, or flexible schemas • Mostly open source • Mostly distributed systems, run on clusters
Pragati Software Pvt. Ltd., 207, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 10
Types of NoSQL Databases
• Column family (Apache Cassandra, HBase) • Document (MongoDB, CouchDB) • Graph (Neo4J, Titan) • Key value (Riak, DynamoDB)
Pragati Software Pvt. Ltd., 207, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 11
MongoDB
Pragati Software Pvt. Ltd., 207, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 12
Introducing MongoDB
• A Document database • Stores JSON documents • Developed by MongoDB Inc • Latest version is 2.6.1, released on May 5, 2014
Pragati Software Pvt. Ltd., 207, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 13
MongoDB Terminology
RDBMS Mongo DB
ID Name Gender Dept 1 Ahmad M Fin 2 Bajrang M Sales 3 Catherine F HR 4 Dostoyevski M Prod
ID Name Unit 1 Pens NO 2 Biscuits KG
Database
Row / Record
Database
Collection
Document
Employees
Products
Table
Pragati Software Pvt. Ltd., 207, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 14
JSON Documents
{ _id:1, name: ‘Ahmad’, gender: ‘M’, dept: ‘Fin’} { _id:2, name: ‘Bajrang’, gender: ‘M’, dept: ‘Sales’} { _id:3, name: ‘Catherine’, gender: ‘F’, dept: ‘HR’} { _id:4, name: ‘Dostoyevski’, gender: ‘M’, dept: ‘Prod’}
Pragati Software Pvt. Ltd., 207, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 15
Replication
{ _id:1, name: ‘Ahmad’, ...} { _id:2, name: ‘Bajrang’, ...} { _id:3, name: ‘Catherine’, ...} { _id:4, name: ‘Dostoyevski’, ...}
{ _id:1, name: ‘Ahmad’, ...} { _id:2, name: ‘Bajrang’, ...} { _id:3, name: ‘Catherine’, ...} { _id:4, name: ‘Dostoyevski’, ...}
{ _id:1, name: ‘Ahmad’, ...} { _id:2, name: ‘Bajrang’, ...} { _id:3, name: ‘Catherine’, ...} { _id:4, name: ‘Dostoyevski’, ...}
Primary
Secondary
Secondary
Pragati Software Pvt. Ltd., 207, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 16
Sharding and Replication
{ _id:1, name: ‘Ahmad’, …}
{ _id:4, name: ‘Dostoyevski’, …}
{ _id:2, name: ‘Bajrang’, …}
{ _id:3, name: ‘Catherine’, …}
{ _id:1, name: ‘Ahmad’, …}
{ _id:4, name: ‘Dostoyevski’, …}
{ _id:2, name: ‘Bajrang’, …}
{ _id:3, name: ‘Catherine’, …}
{ _id:1, name: ‘Ahmad’, …}
{ _id:4, name: ‘Dostoyevski’, …}
{ _id:2, name: ‘Bajrang’, …}
{ _id:3, name: ‘Catherine’, …}
Pragati Software Pvt. Ltd., 207, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 17
Some Prominent Users of MongoDB
• Aadhaar project of UIDAI • MTV Networks • Craigslist • Sourceforge • SAP
Pragati Software Pvt. Ltd., 207, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 18
Inserting Simple Documents
db.persons.insert ( { name: 'Narayan Subramanian', gender: 'M', currentCity: 'Jaipur' } ) db.persons.insert ( { name: 'Pushpa Maheshwari', gender: 'F', email: '[email protected]', worksAt: 'Indian Railways' } )
Pragati Software Pvt. Ltd., 207, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 19
Persons Collection
Person
- FirstName: int- LastName: int- Gender: int- YearOfBirth: int- LivesIn: int- Married: int- CountriesVisited: List <Country>- LanguagesKnown: List <LanguageKnown>
Country
- CountryName: int
LanguageKnown
- LanguageName: int- Profiency: int
*
*
Pragati Software Pvt. Ltd., 207, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 20
Persons Documents
db.persons.insert ( { name: { first: 'Harish', last: 'Chandra'}, gender: 'M', yearOfBirth: 1962, livesIn: 'Mumbai', countriesVisited: [ 'India', 'Singapore', 'Thailand', 'United Kingdom', 'Spain', 'Denmark', 'United States of America'], languages: [ {name: 'Hindi', proficiency: 'Fluent'}, {name: 'English', proficiency: 'Fluent'}, {name: 'Sanskrit', proficiency: 'Intermediate'} ] } )
Pragati Software Pvt. Ltd., 207, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 21
Querying Data
db.persons.find( {gender: 'F'} ) db.persons.find( {gender: 'F'} ).pretty()
Pragati Software Pvt. Ltd., 207, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 22
Querying Data
db.persons.find( {gender: 'F'} )
Pragati Software Pvt. Ltd., 207, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 23
Querying Data
db.persons.find( {gender: 'F'} ) db.persons.find( {gender: 'F'} ).pretty()
Pragati Software Pvt. Ltd., 207, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 24
The _id Field
• Reserved for use as a primary key • If a value is not specified by you, the insert () method adds it
to the document with a unique ObjectId for its value • ObjectId is a 12-byte unique identifier. • Value must be unique in a collection • Is immutable • May be of any type other an array
Pragati Software Pvt. Ltd., 207, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 25
Querying Data
db.persons.find( {gender: 'F'} ) db.persons.find( {gender: 'F'} ).pretty() db.persons.find( {gender: 'F'} ).count() db.persons.find( {gender: 'F'}, {name: 1} ) db.persons.find( {gender: 'F'}, {name: 1, _id: 0} ) db.persons.find( {gender: 'F'}, {name: 1, yearOfBirth: 1} ) db.persons.find( {gender: 'F'}, {name: 1, yearOfBirth: 1, _id: 0} )
Pragati Software Pvt. Ltd., 207, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 26
Operators
db.persons.find ( {livesIn: {$in: ['Mumbai', 'Jaipur'] } } ) db.persons.find ( {countriesVisited: {$all: ['India', 'United States of America', 'Singapore'] } } )
Pragati Software Pvt. Ltd., 207, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 27
Operators
-- All people born before 1980 db.persons.find ( {yearOfBirth: {$lt: 1980} } ) -- All persons not living in Jaipur db.persons.find ( {livesIn: {$ne: 'Jaipur'} } )
Pragati Software Pvt. Ltd., 207, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 28
Operators
-- Find all persons who either live in Mumbai or have visited India db.persons.find ( {$or: [ {livesIn: 'Mumbai'}, {countriesVisited: 'India'} ] } )
-- Find all persons who have visited India or know Hindi db.persons.find ( {$or: [ {countriesVisited: 'India'}, {'languages.name': 'Hindi'} ] } )
Pragati Software Pvt. Ltd., 207, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 29
Operators
-- Find all persons who have visited India and know Hindi db.persons.find ( {$and: [ {countriesVisited: 'India'}, {'languages.name': 'Hindi'} ] })
Pragati Software Pvt. Ltd., 207, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 30
Querying on Subdocument Fields
db.persons.find ( {'name.first': 'Sapna'} ) db.persons.find ( {'name.first': 'Jenny', 'name.last': 'Jones'} )
Pragati Software Pvt. Ltd., 207, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 31
A Complex Query
-- All males born before 1970 and all females born before 1980. db.persons.find ( {$or: [ {$and: [ {gender: 'M'}, {yearOfBirth: {$lt: 1970} } ] }, {$and: [ {gender: 'F'}, {yearOfBirth: {$lt: 1980} } ] } ] } )
Pragati Software Pvt. Ltd., 207, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 32
Sorting the Documents
-- Show all persons, sorted in ascending order of their year of birth db.persons.find().sort ( {yearOfBirth: 1} )
-- and further, in the descending order of their last name. db.persons.find( {}, {yearOfBirth: 1, name: 1, _id: 0} ).sort ( {yearOfBirth: 1, name.last: -1} )
Pragati Software Pvt. Ltd., 207, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 33
Aggregation Framework
• All cities with population more than 10000. db.zips.find( {pop: {$gte: 10000}}).pretty()
• State-wise population db.zips.aggregate ( {$group: {_id: "$state", totalpop: {$sum: "$pop"} } })
• All states with population more than 10 million db.zips.aggregate ( {$group: {_id: "$state", totalpop: {$sum: "$pop"} } }, {$match: {totalpop: {$gte: 10*1000*1000}}} )
Pragati Software Pvt. Ltd., 207, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 34
Aggregation Framework
• Sort by total population: db.zips.aggregate ( {$group: {_id: "$state", totalpop: {$sum: "$pop"} } }, {$match: {totalpop: {$gte: 10*1000*1000}}}, {$sort: {totalpop: -1}} )
Pragati Software Pvt. Ltd., 207, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 35
Update
db.countries.insert ( {name: 'Denmark', capital: 'unknown', currency: 'unknown'} ) db.countries.update ( {name: 'Denmark'}, {name: 'Denmark', capital: 'Copenhagen', continent: 'Europe'} )
Pragati Software Pvt. Ltd., 207, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 36
Upsert
db.persons.update ( {name: {first: 'Merilyn', last: 'Holmes'} }, {$set: {gender: 'F', yearOfBirth: 1997, married: 'N'} }, {upsert: true} )
Pragati Software Pvt. Ltd., 207, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 37
FindAndModify
db.books.findAndModify ( { query: { _id: 123456789, available: { $gt: 0 } }, update: { $inc: { available: -1 }, $push: { checkout: { by: "abc", date: new Date() } } } } )
Pragati Software Pvt. Ltd., 207, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 38
Indexing
db.persons.ensureIndex ( {yearOfMarriage : 1} ) db.persons.ensureIndex ( {yearOfMarriage : 1, country : -1} ) db.persons.ensureIndex ( {name : 1} ) db.persons.ensureIndex ( {"name.last" : 1} )
Pragati Software Pvt. Ltd., 207, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 39
Time To Live (TTL)
db.persons.ensureIndex ( {tempPassword : 1}, {expireAfterSeconds : 7200} )
• A background task, that runs once a minute, removes the data.
• TTL background thread runs only on primary members of a replica set. Secondaries replicate deletion from the primary.
Pragati Software Pvt. Ltd., 207, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 40
Text Indexing and Search
db.persons.ensureIndex ( {"qualifications.institute" : "text"} ) db.collection.runCommand( "text" , { search: <string> } )
Pragati Software Pvt. Ltd., 207, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 41
Geospatial Indexing and Search
• Indexes and query mechanism to handle geospatial information.
• Spherical (earth-like) surfaces, as well as flat surfaces (Euclidean planes) are supported.
• You can query for things like: Locations contained entirely within a specified polygon Locations that intersect with a given geometry Points nearest to another point
Pragati Software Pvt. Ltd., 207, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 42
Capped Collections
• Example: you need to only consider recently added documents; older ones can be safely discarded.
• Capped Collections: fixed-size collections • Guarantee preservation of the insertion order. • Order of documents on disk is identical to the insertion order. • Oldest documents get automatically removed. • Update causing the document size to grow will fail. • Deletion of selected documents not possible. • Cannot shard a capped collection. • You can create a tailable cursor, which tails the end of a
capped collection. You can continue retrieving documents using this.
Pragati Software Pvt. Ltd., 207, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 43
Replication
• Replica Set: cluster of mongod instances that replicate amongst one another and ensure automated failover.
• Up to 12 members in a replica set, only up to 7 have votes.
• Master-slave replication, with one primary and the rest as secondary members.
Pragati Software Pvt. Ltd., 207, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 44
Replication
• Clients direct write to the primary; the secondary members replicate from the primary asynchronously.
• Oplog is a special capped collection… • …that keeps a rolling record of all changes to the database. • MongoDB applies changes to the primary… • …and then records the operations on the primary’s oplog. • Secondary members then replicate the oplog… • …and apply the operations to themselves in an asynchronous
manner.
Pragati Software Pvt. Ltd., 207, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 45
Failover
• Members are interconnected with each other to exchange heartbeat messages.
• A crashed server with missing heartbeat is detected by other members and is removed from the replica set membership.
• When a server recovers in future, it can rejoin the cluster by connecting to the primary to replicate the changes since it crashed.
• If a primary member fails, the remaining members automatically try to elect a new primary, without human intervention.
Pragati Software Pvt. Ltd., 207, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 46
Failover and Rollback
• A possible scenario: The primary has completed a write request None of the secondaries have replicated it The primary crashes Remaining secondaries elect a new primary and continue operating,
unaware that they have lost a write request already acknowledged by the former primary.
It has to roll back the write operation to maintain database consistency across the replica set
• MongoDB write the rollback data to a BSON file. You have to manually intervene to apply the rollback data for the former primary to rejoin the cluster as a secondary.
Pragati Software Pvt. Ltd., 207, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 47
Failover and Rollback
• Alternatively, you can specify the number of secondaries to receive the modification before the primary acknowledges to the client.
• Tradeoff: between latency and reliability.
Pragati Software Pvt. Ltd., 207, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 48
Read Operations
• By default all read operations against a replica set are returned from the primary. Users may configure on a per-connection basis to prefer read operations from a secondary.
• All read operations issued to the primary of a replica set are consistent with the last write operation.
• Strict consistency for read operations from secondary members cannot be guaranteed.
Pragati Software Pvt. Ltd., 207, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 49
Replication
• Members can be designated as: Secondary-only: for dedicated backup, for off-site data centers Hidden: invisible to client application, an isolated member for
reporting and monitoring, cannot become primary, but they vote in elections
Delayed: replication after a specified delay, a form of rolling backup, protection against human errors and change control; they must also be hidden and secondary-only
Arbiters: no data but only participate in elections; cannot become primary
Non-voting Default: can become primary, hold data, replicate immediately, have
vote
Pragati Software Pvt. Ltd., 207, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 50
Chained Replication
• A secondary replicating from another secondary • Reduces the load on primary… • …but can increase replication lag.
Pragati Software Pvt. Ltd., 207, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 51
Journaling
• Write-ahead logging to an on-disk journal… • …to guarantee write operation durability and to provide crash
resiliency. • When a write operation occurs:
MongoDB data about the write to the private view in RAM and then copies the same to the journal on disk, in batches called group commits, by default every 100 milliseconds
It then applies the changes to the shared view, which now becomes inconsistent with the data files.
At default intervals of 60 seconds, MongoDB flushes the shared view to disk, and removes the write operations from the journal.
Pragati Software Pvt. Ltd., 207, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 52
Sharding
• Partitions a collection and stores different portions on different machines.
• To run sharding you set up a sharded cluster.
• Within a cluster, sharding is enabled on a per collection basis.
Pragati Software Pvt. Ltd., 207, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 53
Sharding
• Typically each shard is a replica set, though not mandatory. • Sharding options:
Using a shard key, a field that exists in every document of a collection Hash-based sharding
• Documents are distributed according to the range of values in the shard key.
• A shard key can be a single field or a composite.
Pragati Software Pvt. Ltd., 207, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 54
Sharding
Pragati Software Pvt. Ltd., 207, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 55
Sharding
Pragati Software Pvt. Ltd., 207, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 56
Security
• Role based access control • Auditing • Encryption in flight: SSL • Encryption at rest: Gazzang • Supports Kerberos authentication
Pragati Software Pvt. Ltd., 207, Lok Center, Marol-Maroshi Road, Marol, Andheri (East), Mumbai 400 059. www.pragatisoftware.com 57
Thank You!
[email protected] [email protected]
www.pragatisoftware.com www.twitter.com/PradyumnSharma