MongoDB - csuohio.edugrail.cba.csuohio.edu/~sschung/CIS433/LectureNotes_MongoDB.pdf · MongoDB...

26
MongoDB SUNNIE CHUNG CIS 612

Transcript of MongoDB - csuohio.edugrail.cba.csuohio.edu/~sschung/CIS433/LectureNotes_MongoDB.pdf · MongoDB...

MongoDBSUNNIE CHUNG

CIS 612

MongoDB

� MongoDB is an open-source database and classified as a NoSQL database.

� The primary reason for the development of MongoDB is to make scaling easier as well as the need for semi-structured data.

� MongoDB belongs to the type of document-oriented database in which data is organized as JSON document, and store into a collection.

2

Architecture

� MongoDB is a NoSQL database, which means the mechanism for storage and retrieval of data is modeled in means other than tabular relation used in relational database.

� It has rich data structures with dynamic attributes, mixed structure, text, media, arrays and other complex types.

� MongoDB is flexible as it evolves over time to accommodate new features and requirements.

� Object-oriented programming languages interact with data in structures that are dramatically different from the way is stored in a relational database.

3

Features

� Data is stored in a structure that maps to object in modern programming language

� Rich index and query support, including secondary, geospatial and text search indexes, native MapReduce…

� MongoDB system capacity can dynamically increase

� Support data replication, failure tolerance

� Data is read and written in RAM providing fast performance.

4

Data Model

� MongoDB stores data as documents in a binary representation call BSON (Binary JSON) so it’s called document-oriented database.

� BSON extends the JSON (JavaScript Object Notation) representation to include additional types such as int, long, and floating point.

� BSON documents contain one or more fields, and each field contains a value of a specific data type, including arrays, binary data and sub-documents.

� Document and Collection can be seen as equivalent to Record and table in relational database system.

� A document is an ordered set of keys with associated values. The values could be one of several different data types: string, integer, etc. But the keys are strings and documents in MongoDB cannot contain duplicate keys.

{"greeting" : "Hello, world!", "foo" : 3}

� A collections is a group of documents and has a dynamic schema.

5

Storage Model

� MongoDB uses a memory map file that directly map a data file on disk to byte array in memory where data access is implemented using pointer arithmetic.

� Each document collection is stored in one namespace file as well as multiple extent data files.

� Each collection is organized in a linked list of extents each of which represents a contiguous disk space, and each document contains alinked list to other documents as well as the actual encoded in BSON format.

� MongoDB’s high availability is achieved via Replica Set which provides data redundancy across multiple physical servers including a single primary DB as well as multiple secondary DBs.

� All modifications request go to the primary DB then each modification is made and replicated asynchronously to the secondary DBs.

6

ACID in MongoDB

� Data that read is treated as a snapshot, which means it may has been changed in the database.

� In order to maintain consistency, a condition is attached along with modification request so that the DB server can validate the condition before applying the modification request.

� One way to achieve this isolation is to use findAndModify operation. This command returns either the previous or updated values of the documents.

� Transaction concept also missing in MongoDB, which there is no guarantee multiple documents update. In this case, developers are responsible to implement multi-update across multiple documents.

� A separate document is created and links all documents that need to be modified. Then all the modifications are done in sequence for each document.

7

Major Differences from RDBMS

� RDBMS has fixed number of data type, while MongoDB documents can contains multiple-value field because it has nested structure.

� Documents of any structure can be stored in the same collection without a defined schema.

� MongoDB has no join operations, transactions and atomicity is guaranteed only at document level.

� There is also no concept of isolation, which means any data read by one client may have its value modified by another concurrent client.

8

Installation

MongoDB 2.4.9 (mongodb-osx-x86_64-2.4.9)

To start a MongoDB instance:

$ mongod

mongod --help for help and startup options

Tue Apr 1 15:19:17.445 [initandlisten] MongoDB starting : pid=616 port=27017 dbpath=/data/db/ 64-bit host=Thuats-MacBook-Pro.local

Tue Apr 1 15:19:17.445 [initandlisten]

Tue Apr 1 15:19:17.445 [initandlisten] ** WARNING: soft rlimits too low. Number of files is 256, should be at least 1000

Tue Apr 1 15:19:17.445 [initandlisten] db version v2.4.9

Tue Apr 1 15:19:17.445 [initandlisten] git version:

9

MongoDB Shell� MongoDB comes with a JavaScript shell that allows interaction with a MongoDB

instance from the command line.

� The shell is a full-featured JavaScript interpreter, capable of running JavaScript programs.

� To start the shell:

$ mongo

MongoDB shell version: 2.4.9

connecting to: test

Welcome to the MongoDB shell.

For interactive help, type "help".

For more comprehensive documentation, see

http://docs.mongodb.org/

Questions? Try the support group

http://groups.google.com/group/mongodb-user

Server has startup warnings:

Tue Apr 1 15:19:17.445 [initandlisten]

Tue Apr 1 15:19:17.445 [initandlisten] ** WARNING: soft rlimits too low. Number of files is 256, should be at least 1000

>

10

MongoDB Command

� To show current databases

> show dbs

local 0.078125GB

� To create a new database:

> use blog

If there is a database exists, then it switches to that one.

11

MongoDB CRUD Query

� The CRUD operations used to manipulate and view data in the shell.

� Create a new document:

> post = {"title": "My Blog Post",

"content" : "This is a blog post.",

"data" : new Date()}

{

"title" : "My Blog Post",

"content" : "This is a blog post.",

"data" : ISODate("2014-04-01T19:39:36.521Z")

}

� ‘post’ is a JavaScript object represents the documents, there are three keys ‘title’, ‘content’, and ‘date’

12

MongoDB CRUD Query

� Insert into collection:

> db.blog.insert(post)

� To see the collection:

> db.blog.find()

{ "_id" : ObjectId("533b16898bce20d2fd851cfc"), "title" : "My Blog Post", "content" : "This is a blog post.", "data" : ISODate("2014-04-01T19:39:36.521Z") }

> db.blog.findOne()

{

"_id" : ObjectId("533b16898bce20d2fd851cfc"),

"title" : "My Blog Post",

"content" : "This is a blog post.",

"data" : ISODate("2014-04-01T19:39:36.521Z")

}

13

MongoDB CRUD QueryTo see how MongoDB created that document:

> db.blog.find().explain()

{

"cursor" : "BasicCursor",

"isMultiKey" : false,

"n" : 1,

"nscannedObjects" : 1,

"nscanned" : 1,

"nscannedObjectsAllPlans" : 1,

"nscannedAllPlans" : 1,

"scanAndOrder" : false,

"indexOnly" : false,

"nYields" : 0,

"nChunkSkips" : 0,

"millis" : 0,

"indexBounds" : {

},

"server" : "Thuats-MacBook-Pro.local:27017"

}

14

MongoDB CRUD Query

� To update:

> post.comments = []

[ ]

> db.blog.update({title: "My Blog Post"}, post)

> db.blog.findOne()

{

"_id" : ObjectId("533b16898bce20d2fd851cfc"),

"title" : "My Blog Post",

"content" : "This is a blog post.",

"data" : ISODate("2014-04-01T19:39:36.521Z"),

"comments" : [ ]

}

15

MongoDB CRUD Query

� To delete:

> db.blog.remove({title : "My Blog Post"})

> db.blog.findOne()

null

� To build index:

> db.blog.ensureIndex({title:1})

16

MongoDB CRUD QueryTo show all existing indexes:

> db.blog.getIndexes()

[

{

"v" : 1,

"key" : {

"_id" : 1

},

"ns" : "blog.blog",

"name" : "_id_"

},

{

"v" : 1,

"key" : {

"title" : 1

},

"ns" : "blog.blog",

"name" : "title_1"

}

17

MongoDB CRUD Query

� To remove index:

> db.blog.dropIndex({title:1})

{ "nIndexesWas" : 2, "ok" : 1 }

18

MongoDB Application

� MongoDB Drivers and Client Libraries:

� MongoDB supports variety of modern programming languages including C, C++, C#, Java, Node.js, PHP, Python…

19

MongoDB Import/Export

� MongoDB can import input files of formats JSON, CSV or TSV and also can export database to those format using mongoimport and mongoexport respectively.

� Syntax:

mongoimport --collection collection --file collection.json

mongoexport --collection collection --out collection.json

20

MongoDB Import/Export

� Import a CSV file (NASDAQ_daily_prices_B.csv) into MongoDB collection stocks

$ cat NASDAQ_daily_prices_B.csv

exchange,stock_symbol,date,stock_price_open,stock_price_high,stock_price_low,stock_price_close,stock_volume,stock_price_adj_close

NASDAQ,BBND,2010-02-08,2.92,2.98,2.86,2.96,483800,2.96

NASDAQ,BBND,2010-02-05,2.85,2.94,2.79,2.93,884000,2.93

NASDAQ,BBND,2010-02-04,2.83,2.88,2.78,2.83,1333300,2.83

NASDAQ,BBND,2010-02-03,2.98,3.03,2.80,2.83,1015800,2.83

NASDAQ,BBND,2010-02-02,3.05,3.10,2.96,2.97,513100,2.97

NASDAQ,BBND,2010-02-01,3.11,3.13,3.00,3.04,997000,3.04

NASDAQ,BBND,2010-01-29,3.01,3.14,2.96,3.14,1132900,3.14

21

MongoDB Import/Export$ mongoimport --db stocks --collection nasdaq_daily_prices --type csv --file /Users/nqt289/Desktop/NASDAQ_daily_prices_B.csv --headerline

connected to: 127.0.0.1

Thu Apr 10 05:24:46.009 Progress: 780677/21998523 3%

Thu Apr 10 05:24:46.009 14000 4666/second

Thu Apr 10 05:24:49.004 Progress: 2011431/21998523 9%

Thu Apr 10 05:24:49.004 36200 6033/second

Thu Apr 10 05:24:52.004 Progress: 3300955/21998523 15%

Thu Apr 10 05:24:52.004 58600 6511/second

Thu Apr 10 05:24:55.005 Progress: 4575925/21998523 20%

Thu Apr 10 05:24:55.006 81300 6775/second

Thu Apr 10 05:24:58.009 Progress: 5845580/21998523 26%

Thu Apr 10 05:24:58.009 104000 6933/second

Thu Apr 10 05:25:34.005 374000 7333/second

Thu Apr 10 05:25:35.956 check 9 388777

Thu Apr 10 05:25:35.956 imported 388776 objects

22

MongoDB Import/Export

� Check result collection in the shell:

> show dbs

blog0.203125GB

local 0.078125GB

stocks 0.453125GB

> use stocks

switched to db stocks

> show tables

nasdaq_daily_prices

system.indexes

23

MongoDB Import/Export

> db.nasdaq_daily_prices.find().limit(5)

{ "_id" : ObjectId("5346635c6857e587111a2466"), "exchange" : "NASDAQ", "stock_symbol" : "BBND", "date" : "2010-02-08", "stock_price_open" : 2.92, "stock_price_high" : 2.98, "stock_price_low" : 2.86, "stock_price_close" : 2.96, "stock_volume" : 483800, "stock_price_adj_close" : 2.96 }

{ "_id" : ObjectId("5346635c6857e587111a2467"), "exchange" : "NASDAQ", "stock_symbol" : "BBND", "date" : "2010-02-05", "stock_price_open" : 2.85, "stock_price_high" : 2.94, "stock_price_low" : 2.79, "stock_price_close" : 2.93, "stock_volume" : 884000, "stock_price_adj_close" : 2.93 }

{ "_id" : ObjectId("5346635c6857e587111a2468"), "exchange" : "NASDAQ", "stock_symbol" : "BBND", "date" : "2010-02-04", "stock_price_open" : 2.83, "stock_price_high" : 2.88, "stock_price_low" : 2.78, "stock_price_close" : 2.83, "stock_volume" : 1333300, "stock_price_adj_close" : 2.83 }

{ "_id" : ObjectId("5346635c6857e587111a2469"), "exchange" : "NASDAQ", "stock_symbol" : "BBND", "date" : "2010-02-03", "stock_price_open" : 2.98, "stock_price_high" : 3.03, "stock_price_low" : 2.8, "stock_price_close" : 2.83, "stock_volume" : 1015800, "stock_price_adj_close" : 2.83 }

{ "_id" : ObjectId("5346635c6857e587111a246a"), "exchange" : "NASDAQ", "stock_symbol" : "BBND", "date" : "2010-02-02", "stock_price_open" : 3.05, "stock_price_high" : 3.1, "stock_price_low" : 2.96, "stock_price_close" : 2.97, "stock_volume" : 513100, "stock_price_adj_close" : 2.97 }

24

MongoDB Import/Export

� Export that collection to JSON format:

$ mongoexport -d stocks -c nasdaq_daily_prices -q "{stock_price_open: { \$gte: 50 }}" --out /Users/nqt289/Desktop/gte50.json

connected to: 127.0.0.1

25

MongoDB Import/Export

� exported 9911 records

$ cat gte50.json

{ "_id" : { "$oid" : "5346635d6857e587111a4cda" }, "exchange" : "NASDAQ", "stock_symbol" : "BOLT", "date" : "2007-07-25", "stock_price_open" : 51,

"stock_price_high" : 51.47, "stock_price_low" : 44.1, "stock_price_close" : 47.04, "stock_volume" : 1109600, "stock_price_adj_close" : 31.36 }

{ "_id" : { "$oid" : "5346635d6857e587111a4cdb" }, "exchange" : "NASDAQ", "stock_symbol" : "BOLT", "date" : "2007-07-24", "stock_price_open" : 52.4, "stock_price_high" : 52.4, "stock_price_low" : 48.55, "stock_price_close" : 49.43, "stock_volume" : 650600, "stock_price_adj_close" : 32.95 }

26