NoSQL continued CMSC 461 Michael Wilson. MongoDB MongoDB is another NoSQL solution Provides a bit...

12
NoSQL continued CMSC 461 Michael Wilson

Transcript of NoSQL continued CMSC 461 Michael Wilson. MongoDB MongoDB is another NoSQL solution Provides a bit...

Page 1: NoSQL continued CMSC 461 Michael Wilson. MongoDB  MongoDB is another NoSQL solution  Provides a bit more structure than a solution like Accumulo  Data.

NoSQL continuedCMSC 461Michael Wilson

Page 2: NoSQL continued CMSC 461 Michael Wilson. MongoDB  MongoDB is another NoSQL solution  Provides a bit more structure than a solution like Accumulo  Data.

MongoDB MongoDB is another NoSQL solution

Provides a bit more structure than a solution like Accumulo

Data is stored as BSON (Binary JSON) Binary encoded JSON, extends JSON

Allows storage of large amounts of data

Page 3: NoSQL continued CMSC 461 Michael Wilson. MongoDB  MongoDB is another NoSQL solution  Provides a bit more structure than a solution like Accumulo  Data.

SQL vs. MongoDB SQL has databases, tables, rows,

columns Monbo has databases, collections,

documents, fields Both have primary keys, indexes Collection structures are not enforced

heavily Inserts automatically create schemas

Page 4: NoSQL continued CMSC 461 Michael Wilson. MongoDB  MongoDB is another NoSQL solution  Provides a bit more structure than a solution like Accumulo  Data.

Interacting with MongoDB Multiple databases within MongoDB

Switch databases use newDb

New databases will be stored after an insert

Create collection db.createCollection(“collectionName”) Not necessary, collections are implicitly

created on insert

Page 5: NoSQL continued CMSC 461 Michael Wilson. MongoDB  MongoDB is another NoSQL solution  Provides a bit more structure than a solution like Accumulo  Data.

BSON MongoDB uses BSON very heavily

Binary JSON Like JSON with a binary serialization

method Has extensions so that it can represent

data types that JSON cannot Used to represent documents, provide

input to queries

Page 6: NoSQL continued CMSC 461 Michael Wilson. MongoDB  MongoDB is another NoSQL solution  Provides a bit more structure than a solution like Accumulo  Data.

Selects/queries In MongoDB, querying typically consists of

providing an appropriately crafted BSON SELECT * FROM collectionName

db.collectionName.find() SELECT * FROM collectionName WHERE field =

value db.collectionName.find( {field: value} )

SELECT * FROM collectionName WHERE field > 5 db.collectionName.find( {field: {$gt: 5} } )

Other functions that take a query argument have queries that are formatted this way

Page 7: NoSQL continued CMSC 461 Michael Wilson. MongoDB  MongoDB is another NoSQL solution  Provides a bit more structure than a solution like Accumulo  Data.

Interacting with MongoDB Insert

db.collectionName.insert( {queryBSON} ) Update

db.collectionName.update( {queryBSON}, {updateBSON}, {optionBSON} ) updateBSON

Set field to 5: {$set: {field: 5}} Increment field by 1 {$inc: {field: 1}}

optionBSON Options that determine whether or not to create new

documents, update more than one document, write concerns

Page 8: NoSQL continued CMSC 461 Michael Wilson. MongoDB  MongoDB is another NoSQL solution  Provides a bit more structure than a solution like Accumulo  Data.

Interacting with MongoDB Delete

db.collectionName.remove( {queryBSON} )

Page 9: NoSQL continued CMSC 461 Michael Wilson. MongoDB  MongoDB is another NoSQL solution  Provides a bit more structure than a solution like Accumulo  Data.

Apache Hive Also runs on Hadoop, uses HDFS as a

data store Queryable like SQL

Using an SQL-inspired language, HiveQL

Page 10: NoSQL continued CMSC 461 Michael Wilson. MongoDB  MongoDB is another NoSQL solution  Provides a bit more structure than a solution like Accumulo  Data.

Hive data organization Databases Tables Partitions

Tables are broken down into partitions Partition keys allow data to be stored into

separate data files on HDFS Can query on particular partitions

Buckets Can bucket by column to sample data

Page 11: NoSQL continued CMSC 461 Michael Wilson. MongoDB  MongoDB is another NoSQL solution  Provides a bit more structure than a solution like Accumulo  Data.

Purpose of Hive Provide analytics, query large volumes

of data NOT to be used for real time queries like

Postgres or Oracle Hive queries take forever

Partitions and buckets can help reduce this amount of time

Page 12: NoSQL continued CMSC 461 Michael Wilson. MongoDB  MongoDB is another NoSQL solution  Provides a bit more structure than a solution like Accumulo  Data.

Hive queries Hive queries actually generate

MapReduce jobs MapReduce jobs take a while to set up

and run MapReduce jobs can be run manually,

but for structured data and analytics, Hive can be used