Group13: Siddhant Deshmukh, Sudeep Rege, Sharmila …Replication Replica set • 1 primary node •...

67
Group13: Siddhant Deshmukh, Sudeep Rege, Sharmila Prakash, Dhanusha Varik

Transcript of Group13: Siddhant Deshmukh, Sudeep Rege, Sharmila …Replication Replica set • 1 primary node •...

Page 1: Group13: Siddhant Deshmukh, Sudeep Rege, Sharmila …Replication Replica set • 1 primary node • 1+ secondary nodes • Optional Arbiter node Minimum replica set configuration

Group13: Siddhant Deshmukh, Sudeep Rege, Sharmila Prakash, Dhanusha Varik

Page 2: Group13: Siddhant Deshmukh, Sudeep Rege, Sharmila …Replication Replica set • 1 primary node • 1+ secondary nodes • Optional Arbiter node Minimum replica set configuration

mongoDB (humongous)

Page 3: Group13: Siddhant Deshmukh, Sudeep Rege, Sharmila …Replication Replica set • 1 primary node • 1+ secondary nodes • Optional Arbiter node Minimum replica set configuration

Introduction

• What is MongoDB?

• Why MongoDB?

• MongoDB Terminology

• Why Not MongoDB?

Page 4: Group13: Siddhant Deshmukh, Sudeep Rege, Sharmila …Replication Replica set • 1 primary node • 1+ secondary nodes • Optional Arbiter node Minimum replica set configuration

What is MongoDB?DOCUMENT STORE

Page 5: Group13: Siddhant Deshmukh, Sudeep Rege, Sharmila …Replication Replica set • 1 primary node • 1+ secondary nodes • Optional Arbiter node Minimum replica set configuration

Why MongoDB?

Page 6: Group13: Siddhant Deshmukh, Sudeep Rege, Sharmila …Replication Replica set • 1 primary node • 1+ secondary nodes • Optional Arbiter node Minimum replica set configuration

Popularity of MongoDB

Page 7: Group13: Siddhant Deshmukh, Sudeep Rege, Sharmila …Replication Replica set • 1 primary node • 1+ secondary nodes • Optional Arbiter node Minimum replica set configuration

MongoDB Terminology

Page 8: Group13: Siddhant Deshmukh, Sudeep Rege, Sharmila …Replication Replica set • 1 primary node • 1+ secondary nodes • Optional Arbiter node Minimum replica set configuration

MongoDB is not all good...

It does not support

• Joins

• Transactions across multiple collections

• Data size in mongoDB is higher document

store field names

Atomic transactions supported at single

document level

REASON???

Page 9: Group13: Siddhant Deshmukh, Sudeep Rege, Sharmila …Replication Replica set • 1 primary node • 1+ secondary nodes • Optional Arbiter node Minimum replica set configuration

Document Stores

• Data Model

• Storage Model

• Collections

• Capped Collections

Page 10: Group13: Siddhant Deshmukh, Sudeep Rege, Sharmila …Replication Replica set • 1 primary node • 1+ secondary nodes • Optional Arbiter node Minimum replica set configuration

Data Model Design

• Embedded data Models

• Normalized data Models

Page 11: Group13: Siddhant Deshmukh, Sudeep Rege, Sharmila …Replication Replica set • 1 primary node • 1+ secondary nodes • Optional Arbiter node Minimum replica set configuration

Embedded Data Model

Page 12: Group13: Siddhant Deshmukh, Sudeep Rege, Sharmila …Replication Replica set • 1 primary node • 1+ secondary nodes • Optional Arbiter node Minimum replica set configuration

Normalized Data Model

Page 13: Group13: Siddhant Deshmukh, Sudeep Rege, Sharmila …Replication Replica set • 1 primary node • 1+ secondary nodes • Optional Arbiter node Minimum replica set configuration

How is the data stored?

BSON

• Lightweight

• Traversable

• Efficient

Page 14: Group13: Siddhant Deshmukh, Sudeep Rege, Sharmila …Replication Replica set • 1 primary node • 1+ secondary nodes • Optional Arbiter node Minimum replica set configuration
Page 15: Group13: Siddhant Deshmukh, Sudeep Rege, Sharmila …Replication Replica set • 1 primary node • 1+ secondary nodes • Optional Arbiter node Minimum replica set configuration

Sample JSON Data

Page 16: Group13: Siddhant Deshmukh, Sudeep Rege, Sharmila …Replication Replica set • 1 primary node • 1+ secondary nodes • Optional Arbiter node Minimum replica set configuration

MongoDB Object Id format

Page 17: Group13: Siddhant Deshmukh, Sudeep Rege, Sharmila …Replication Replica set • 1 primary node • 1+ secondary nodes • Optional Arbiter node Minimum replica set configuration

MongoDB Collections

• MongoDB stores documents in collections

Page 18: Group13: Siddhant Deshmukh, Sudeep Rege, Sharmila …Replication Replica set • 1 primary node • 1+ secondary nodes • Optional Arbiter node Minimum replica set configuration

Capped Collections

Page 19: Group13: Siddhant Deshmukh, Sudeep Rege, Sharmila …Replication Replica set • 1 primary node • 1+ secondary nodes • Optional Arbiter node Minimum replica set configuration

• Circular buffers

FRONT A[2]

REAR A[0]

Capped Collections

Page 20: Group13: Siddhant Deshmukh, Sudeep Rege, Sharmila …Replication Replica set • 1 primary node • 1+ secondary nodes • Optional Arbiter node Minimum replica set configuration

Humongous data: 2 main needs/issues?

1. Data backup

2. Scaling

Page 21: Group13: Siddhant Deshmukh, Sudeep Rege, Sharmila …Replication Replica set • 1 primary node • 1+ secondary nodes • Optional Arbiter node Minimum replica set configuration

Replication and Sharding in mongoDB

SHARDING

REPLICATION

Page 22: Group13: Siddhant Deshmukh, Sudeep Rege, Sharmila …Replication Replica set • 1 primary node • 1+ secondary nodes • Optional Arbiter node Minimum replica set configuration

Replication

Replica set

• 1 primary node

• 1+ secondary nodes

• Optional Arbiter node

Minimum replica set configuration

Page 23: Group13: Siddhant Deshmukh, Sudeep Rege, Sharmila …Replication Replica set • 1 primary node • 1+ secondary nodes • Optional Arbiter node Minimum replica set configuration

Replication

Replica set

• 1 primary node

• 1+ secondary nodes

• Optional Arbiter node

Replica set with 1 primary and 2 secondaries

Page 24: Group13: Siddhant Deshmukh, Sudeep Rege, Sharmila …Replication Replica set • 1 primary node • 1+ secondary nodes • Optional Arbiter node Minimum replica set configuration

Replication

Data synchronization• oplog: capped collection

Write acknowledgement• primary only (default)• custom

Read concern• local (default)• majority

Page 25: Group13: Siddhant Deshmukh, Sudeep Rege, Sharmila …Replication Replica set • 1 primary node • 1+ secondary nodes • Optional Arbiter node Minimum replica set configuration

Replication

Automatic failover

• Heartbeats

• Elections• Priorities

• Rollbacks

Page 26: Group13: Siddhant Deshmukh, Sudeep Rege, Sharmila …Replication Replica set • 1 primary node • 1+ secondary nodes • Optional Arbiter node Minimum replica set configuration

Replication

Replica set secondary members

• Priority 0 - can’t be primary. Use: standbys• Hidden• Delayed

Page 27: Group13: Siddhant Deshmukh, Sudeep Rege, Sharmila …Replication Replica set • 1 primary node • 1+ secondary nodes • Optional Arbiter node Minimum replica set configuration

ReplicationReplica set secondary members

• Priority 0 - can’t be primary. Use: standbys• Hidden - priority 0 + invisible to client• Delayed

Page 28: Group13: Siddhant Deshmukh, Sudeep Rege, Sharmila …Replication Replica set • 1 primary node • 1+ secondary nodes • Optional Arbiter node Minimum replica set configuration

ReplicationReplica set secondary members

• Priority 0 - can’t be primary. Use: standbys• Hidden - Priority 0 + invisible to client• Delayed - Hidden. Historical snapshot. Use: error

recovery

Page 29: Group13: Siddhant Deshmukh, Sudeep Rege, Sharmila …Replication Replica set • 1 primary node • 1+ secondary nodes • Optional Arbiter node Minimum replica set configuration

Sharding

• Horizontal portioning

• Distributes data over

multiple servers/shards

• Done at Collection level

Page 30: Group13: Siddhant Deshmukh, Sudeep Rege, Sharmila …Replication Replica set • 1 primary node • 1+ secondary nodes • Optional Arbiter node Minimum replica set configuration

Sharding

Sharded cluster

• Shard

• “mongos” query

router

• Config server

Page 31: Group13: Siddhant Deshmukh, Sudeep Rege, Sharmila …Replication Replica set • 1 primary node • 1+ secondary nodes • Optional Arbiter node Minimum replica set configuration

Sharding

• Shard key - used to partition collection

• Range based partitioning

• Hash based partitioning

Page 32: Group13: Siddhant Deshmukh, Sudeep Rege, Sharmila …Replication Replica set • 1 primary node • 1+ secondary nodes • Optional Arbiter node Minimum replica set configuration

Sharding

• Shard key - indexed field

• Range based partitioning

• Hash based partitioning

Page 33: Group13: Siddhant Deshmukh, Sudeep Rege, Sharmila …Replication Replica set • 1 primary node • 1+ secondary nodes • Optional Arbiter node Minimum replica set configuration

Sharding

Data balancing

• Splitting

• Balancing

Page 34: Group13: Siddhant Deshmukh, Sudeep Rege, Sharmila …Replication Replica set • 1 primary node • 1+ secondary nodes • Optional Arbiter node Minimum replica set configuration

Querying in MongoDB

• Uses mongo shell for querying data

• DB and Collections created automatically when first

referenced

• show dbs list of dbs

• show collections collections in db

Page 35: Group13: Siddhant Deshmukh, Sudeep Rege, Sharmila …Replication Replica set • 1 primary node • 1+ secondary nodes • Optional Arbiter node Minimum replica set configuration

CRUD operations

• To insert document in a database:

e.g. db.gryffindor.insert({name: "Harry Potter",age: 11

})

db.collection.insert(document)

Collection

Document

Page 36: Group13: Siddhant Deshmukh, Sudeep Rege, Sharmila …Replication Replica set • 1 primary node • 1+ secondary nodes • Optional Arbiter node Minimum replica set configuration

CRUD operations

• To search in collection:

• Filter => Boolean expression

e.g {‘age’ : 14} or {‘age’ : { $lt : 18}}

• $lt, $gt, $in, $nin, $all etc..

• Projection => Fields to display

e.g. db.gryffindor.find( { name: "Harry Potter" } )

db.collection.find(<filter>,<projection>)

Page 37: Group13: Siddhant Deshmukh, Sudeep Rege, Sharmila …Replication Replica set • 1 primary node • 1+ secondary nodes • Optional Arbiter node Minimum replica set configuration

CRUD operations

• To update a document :

• Applies update operation to matches

• $set, $unset, $inc, $dec, $rename

e.g. db.gryffindor.update( { name: "Harry Potter" },

{ age: 19 } )

db.collection.update(<filter>,<upd_oper>)

Update operation

filter

Page 38: Group13: Siddhant Deshmukh, Sudeep Rege, Sharmila …Replication Replica set • 1 primary node • 1+ secondary nodes • Optional Arbiter node Minimum replica set configuration

CRUD operations

• To delete a document:

• To delete all documents in collection:

e.g. db.gryffindor.remove({age: { $gt: 18 }})

db.collection.remove(<filter>)

db.collection.drop()

filter

Page 39: Group13: Siddhant Deshmukh, Sudeep Rege, Sharmila …Replication Replica set • 1 primary node • 1+ secondary nodes • Optional Arbiter node Minimum replica set configuration

Document Relationships

• 1-to-many relationships allowed

• Embedding or Referencing

• Embed one document inside

another

• Link two docs by using their ids

Page 40: Group13: Siddhant Deshmukh, Sudeep Rege, Sharmila …Replication Replica set • 1 primary node • 1+ secondary nodes • Optional Arbiter node Minimum replica set configuration

Document Relationships

id : 14

id : 14

Embedding Referencing

{ "_id" : “896”,"name" : “The Goblet of Fire”, “author" :

{“name” : “J.K. Rowling”,“age” : 51

} }

db.collection.find(“author.name” : “J.K. Rowling”)

{ “id” : “886”,“name” : “The Goblet Of Fire”,“author_id” : “896” }

{ “id” : “896”,“name” : “J. K. Rowling”,“age” : 51 }

Page 41: Group13: Siddhant Deshmukh, Sudeep Rege, Sharmila …Replication Replica set • 1 primary node • 1+ secondary nodes • Optional Arbiter node Minimum replica set configuration

Advanced features

• Geospatial queries :

• Objects with GeoJSON format

• $near, $geoWithin, $geoIntersects

• Text search:

• Using text indexes

• $text => Full-text search

• Indexing:

• On single, compound, embedded, arrayed obj

Page 42: Group13: Siddhant Deshmukh, Sudeep Rege, Sharmila …Replication Replica set • 1 primary node • 1+ secondary nodes • Optional Arbiter node Minimum replica set configuration

Aggregation in MongoDB

• 3 ways to aggregate !!

• group(), count(), distinct() etc…

• Simple grouping of documents

• Map/Reduce framework

• Runs inside MongoDB

• Outputs to document or collection

Page 43: Group13: Siddhant Deshmukh, Sudeep Rege, Sharmila …Replication Replica set • 1 primary node • 1+ secondary nodes • Optional Arbiter node Minimum replica set configuration

MapReduce in MongoDB

• Two functions: Map and Reduce

• Map => emits a key-value pair from processed document

• Reduce => Reduce all key-value pairs to single object

Page 44: Group13: Siddhant Deshmukh, Sudeep Rege, Sharmila …Replication Replica set • 1 primary node • 1+ secondary nodes • Optional Arbiter node Minimum replica set configuration

Aggregation pipeline

• Multi-stage pipeline

• Faster than MapReduce

• More flexible

• Support for sharded

clusters

Page 45: Group13: Siddhant Deshmukh, Sudeep Rege, Sharmila …Replication Replica set • 1 primary node • 1+ secondary nodes • Optional Arbiter node Minimum replica set configuration

Aggregation framework

• You can

• reshape document structure

• filter documents

• remove embedded documents

• even ( kind of ) create joins on

docs

All in aggregate!

Page 46: Group13: Siddhant Deshmukh, Sudeep Rege, Sharmila …Replication Replica set • 1 primary node • 1+ secondary nodes • Optional Arbiter node Minimum replica set configuration

Where is MongoDB used?

Page 47: Group13: Siddhant Deshmukh, Sudeep Rege, Sharmila …Replication Replica set • 1 primary node • 1+ secondary nodes • Optional Arbiter node Minimum replica set configuration
Page 48: Group13: Siddhant Deshmukh, Sudeep Rege, Sharmila …Replication Replica set • 1 primary node • 1+ secondary nodes • Optional Arbiter node Minimum replica set configuration

Schema design - Real world use case

• Message Inbox

• History

• Multiple Identities

Page 49: Group13: Siddhant Deshmukh, Sudeep Rege, Sharmila …Replication Replica set • 1 primary node • 1+ secondary nodes • Optional Arbiter node Minimum replica set configuration

Message Inbox

Design Goals :

• Efficiently send new messages to recipients

• Efficiently read inbox

Page 50: Group13: Siddhant Deshmukh, Sudeep Rege, Sharmila …Replication Replica set • 1 primary node • 1+ secondary nodes • Optional Arbiter node Minimum replica set configuration

Considerations

• Each “inbox” document is an array of messages

• Append a message onto “inbox” of recipient

• Bucket inboxes

• Can shard on recipient, so inbox reads hit one shard

• 1 or 2 documents to read the whole inbox

Page 51: Group13: Siddhant Deshmukh, Sudeep Rege, Sharmila …Replication Replica set • 1 primary node • 1+ secondary nodes • Optional Arbiter node Minimum replica set configuration
Page 52: Group13: Siddhant Deshmukh, Sudeep Rege, Sharmila …Replication Replica set • 1 primary node • 1+ secondary nodes • Optional Arbiter node Minimum replica set configuration

History

Page 53: Group13: Siddhant Deshmukh, Sudeep Rege, Sharmila …Replication Replica set • 1 primary node • 1+ secondary nodes • Optional Arbiter node Minimum replica set configuration

Design Goals

• Need to retain a limited amount of history

• e.g. Hours, Days, Weeks

• Need to query efficiently by

• match

• ranges

Page 54: Group13: Siddhant Deshmukh, Sudeep Rege, Sharmila …Replication Replica set • 1 primary node • 1+ secondary nodes • Optional Arbiter node Minimum replica set configuration

Considerations

• TTL

Page 55: Group13: Siddhant Deshmukh, Sudeep Rege, Sharmila …Replication Replica set • 1 primary node • 1+ secondary nodes • Optional Arbiter node Minimum replica set configuration

Multiple Identities

Design Goals :

• Ability to look up by a number of different

identities

• Username

• Email address

• FB Handle

• LinkedIn URL

Page 56: Group13: Siddhant Deshmukh, Sudeep Rege, Sharmila …Replication Replica set • 1 primary node • 1+ secondary nodes • Optional Arbiter node Minimum replica set configuration

Approaches

• Identifiers in a single document

• Separate Identifiers from Content

Page 57: Group13: Siddhant Deshmukh, Sudeep Rege, Sharmila …Replication Replica set • 1 primary node • 1+ secondary nodes • Optional Arbiter node Minimum replica set configuration

Single Document by User

Page 58: Group13: Siddhant Deshmukh, Sudeep Rege, Sharmila …Replication Replica set • 1 primary node • 1+ secondary nodes • Optional Arbiter node Minimum replica set configuration
Page 59: Group13: Siddhant Deshmukh, Sudeep Rege, Sharmila …Replication Replica set • 1 primary node • 1+ secondary nodes • Optional Arbiter node Minimum replica set configuration
Page 60: Group13: Siddhant Deshmukh, Sudeep Rege, Sharmila …Replication Replica set • 1 primary node • 1+ secondary nodes • Optional Arbiter node Minimum replica set configuration

Document per Identity

Page 61: Group13: Siddhant Deshmukh, Sudeep Rege, Sharmila …Replication Replica set • 1 primary node • 1+ secondary nodes • Optional Arbiter node Minimum replica set configuration
Page 62: Group13: Siddhant Deshmukh, Sudeep Rege, Sharmila …Replication Replica set • 1 primary node • 1+ secondary nodes • Optional Arbiter node Minimum replica set configuration

Real World Use Cases

Page 63: Group13: Siddhant Deshmukh, Sudeep Rege, Sharmila …Replication Replica set • 1 primary node • 1+ secondary nodes • Optional Arbiter node Minimum replica set configuration
Page 64: Group13: Siddhant Deshmukh, Sudeep Rege, Sharmila …Replication Replica set • 1 primary node • 1+ secondary nodes • Optional Arbiter node Minimum replica set configuration

Reasons

Their arguments center around a few core themes:

• Product Maturity

• Design Decisions

• Wrong Trade-Offs

Page 65: Group13: Siddhant Deshmukh, Sudeep Rege, Sharmila …Replication Replica set • 1 primary node • 1+ secondary nodes • Optional Arbiter node Minimum replica set configuration

Less Suited Applications

• Complex transactions such as

banking systems and accounting.

• Traditional relational data

warehouses

• Problem requiring SQL

Page 66: Group13: Siddhant Deshmukh, Sudeep Rege, Sharmila …Replication Replica set • 1 primary node • 1+ secondary nodes • Optional Arbiter node Minimum replica set configuration

Best Suited Applications

• Archiving and Event Logging

• Content Management System

• Gaming

• Mobile

• Real time stats/Analytics

Page 67: Group13: Siddhant Deshmukh, Sudeep Rege, Sharmila …Replication Replica set • 1 primary node • 1+ secondary nodes • Optional Arbiter node Minimum replica set configuration

To mongoDB or not to mongoDB?