MongoDB and Spring Data - Meetupfiles.meetup.com/4247302/RIC Mongo Meetup.2014-02-20.pdf2014/02/20...

75
MongoDB and Spring Data By Jimmy Ray © 2014 CloudBees, Inc. All Rights Reserved

Transcript of MongoDB and Spring Data - Meetupfiles.meetup.com/4247302/RIC Mongo Meetup.2014-02-20.pdf2014/02/20...

MongoDB and Spring Data

By Jimmy Ray

© 2014 CloudBees, Inc. All Rights Reserved

Who Am I

•  Solutions Architect with CloudBees (www.cloudbees.com) •  Blog: www.techsand.com •  LinkedIn: www.linkedin.com/in/iamjimmyray •  I spend my time with Java, Jenkins, CI/CD, Cloud Computing

and … MongoDB

© 2014 CloudBees, Inc. All Rights Reserved

2

Tonight’s Agenda

•  Quick introduction to MongoDB and related tools •  Introduction to Spring Data

–  Configuration (Spring and MongoDB) –  Templates and Repositories

•  Metadata Mapping •  Finder Methods •  Custom Repos

© 2014 CloudBees, Inc. All Rights Reserved

3

Tonight’s Agenda (continued)

•  Spring Data –  Fluent API –  Aggregation Functions

•  Indexes •  GridFS •  MongoDB in the cloud

© 2014 CloudBees, Inc. All Rights Reserved

4

Why MongoDB

•  Multiple platforms (Linux, Win, Solaris, Apple) •  Language Drivers (C, C++, C#, Java, Erlang, JS, Ruby, etc.) •  Explicitly de-normalized (schema-less) •  Document-centric •  Easy for developers and admins to get started.

–  Because schema-less approach is more flexible, MongoDB is intrinsically ready for iterative (Agile) projects

© 2014 CloudBees, Inc. All Rights Reserved

5

Why MongoDB (continued)

•  Ease of scalability (replica sets), auto-sharding •  Manages complex and polymorphic data •  Great for CDN and document-based SOA solutions •  Great for location-based and geospatial data solutions •  Fast

–  (low latency) –  Fast access to data –  Low CPU overhead

© 2014 CloudBees, Inc. All Rights Reserved

6

Schema-less, Schema-free, Flexible-schema

•  It means that MongoDB does not enforce a column data type on the fields within your document, nor does it confine your document to specific columns defined in a table definition.

•  The schema is actually controlled via the application API layers and is implied by the “shape” (content) of your documents.

•  This means that different documents in the same collection can have different fields.

•  Only the _id field is mandatory in all documents.

© 2014 CloudBees, Inc. All Rights Reserved

7

Is MongoDB Really Schema-free •  Technically no. •  There is the System Catalog of system collections

–  <database>.system.namespaces –  <database>.system.indexes –  <database>.system.profile –  <database>.system.users

•  And…because of the nature of how docs are stored in collections (JSON/BSON), field labels are stored in every doc*

© 2014 CloudBees, Inc. All Rights Reserved

8

MongoDB Schema Tips •  MongoDB has ObjectID, can be placed in _id

–  If you have a natural unique ID, use that instead •  De-normalize when needed

–  For example: Compound indexes cannot contain parallel arrays •  Create indexes that cover queries

–  Mongo only uses one index at a time for a query –  Watch out for sorts –  What out for field sequence in compound indexes.

•  Reduce size of collections (watch out for label sizes)

© 2014 CloudBees, Inc. All Rights Reserved

9

MongoDB Data Modeling

•  Understand your concerns •  Document embedding (fastest and atomic) vs. references

(normalized) •  Atomicity – Document Level Only •  Data Durability

© 2014 CloudBees, Inc. All Rights Reserved

10

Why not MongoDB

•  High speed and deterministic transactions (FIN): •  Where SQL or joins are absolutely required •  If your organization lacks the controls and rigor to place

schema and document definition at the application level without compromising data integrity

© 2014 CloudBees, Inc. All Rights Reserved

11

My Favorite MongoDB Design Features •  Fast Querying (atomic operations, embedded data) •  In place updates (physical writes lag in-memory changes) •  Full Index support (including compound indexes) •  Replication/High Availability (see CAP Theorem) •  Auto Sharding (range-based portioning, based on shard key) for

scalability •  BSON •  GridFS

© 2014 CloudBees, Inc. All Rights Reserved

12

In-place Updates

•  Physical disk writes lag in-memory changes. •  MongoDB uses an adaptive allocation algorithm for storing

its objects.

© 2014 CloudBees, Inc. All Rights Reserved

13

“Keys” to Sharding

•  Need to choose the right key –  Easily divisible (“splittable”– see cardinality) so that Mongo can

distribute data among shards

•  Enable distributed write operations between cluster nodes –  Prevents single-shard bottle-necking

© 2014 CloudBees, Inc. All Rights Reserved

14

Cardinality •  Higher cardinality is preferred (usually, except for range queries)

–  Example: Address data components •  State – Low Cardinality •  Zip Code – Potentially low or high, depending population •  Phone Number – High Cardinality

•  High cardinality is a good start for sharding, but.. –  …it does not guarantee query isolation –  …it does not guarantee write scaling

•  Consider computed keys (MD5, etc.)

© 2014 CloudBees, Inc. All Rights Reserved

15

Container Model (RDBMS vs. MongoDB)

•  RDBMS: Servers > Databases > Schemas > Tables > Rows –  Joins, Group By, ACID

•  MongoDB: Servers > Databases > Collections > Documents –  No Joins –  Instead: Db References (Linking) and Nested Documents

(Embedding)

© 2014 CloudBees, Inc. All Rights Reserved

16

CAP Theorem

•  Consistency – all nodes see the same data at the same time

•  Availability – all requests receive responses, guaranteed •  Partition Tolerance (network partition tolerance) •  The theorem states that you can never have all three, so

you plan for two and make the best of the third.

© 2014 CloudBees, Inc. All Rights Reserved

17

Example MongoDB Setup

© 2014 CloudBees, Inc. All Rights Reserved

18

MongoDB Collections

•  Schema-less •  Can have up to 24000

–  100 nesting levels (version 2.2)

•  Are namespaces, like indexes •  Can be “Capped”

–  Limited in max size with rotating overwrites of oldest entries

•  TTL Collections

© 2014 CloudBees, Inc. All Rights Reserved

19

MongoDB Documents

•  JSON (what you see) –  Actually BSON (Internal - Binary JSON - http://bsonspec.org/)

•  Elements are name/value pairs •  16 MB maximum size •  What you see is what is stored

–  No default fields (columns)

© 2014 CloudBees, Inc. All Rights Reserved

20

JSON Syntax

•  Curly braces are used for documents/objects – {…} •  Square brackets are used for arrays – […] •  Colons are used to link keys to values – key:value •  Commas are used to separate multiple objects or elements

or key/value pairs – {ke1:value1, key2:value2…}

© 2014 CloudBees, Inc. All Rights Reserved

21

BSON

•  Adds data types that JSON did not support •  Optimized for performance •  Adds compression •  http://bsonspec.org/#/specification

© 2014 CloudBees, Inc. All Rights Reserved

22

MongoDB Shell

•  Interactive JavaScript shell to mongod •  Command-line interface to MongoDB (sort of like SQL*Plus

for Oracle) •  JavaScript Interpreter, behaves like a read-eval-print loop •  Can be run without database connection (use –nodb) •  Uses a fluent API with lazy cursor evaluation

–  db.locations.find({state:'MN'},{city:1,state:1,_id:0}).sort({city:-1}).limit(5).toArray();

© 2014 CloudBees, Inc. All Rights Reserved

23

MongoDB and Mac OS X

•  Installed/upgraded with HomeBrew –  brew install/upgrade mongodb –  http://docs.mongodb.org/manual/tutorial/install-mongodb-on-os-

x/

•  Run with shell command: exec mongod --port 29009 --rest

•  Run MongoDB Shell: mongo –port 29009

© 2014 CloudBees, Inc. All Rights Reserved

24

MongoDB Tools for Mac OS X

•  Install and run Genghis PHP application on Apache •  Install MongoHub for Mac

–  https://code.google.com/p/mongohub/

•  Shutdown server from inside Mongo Shell –  use admin –  db.shutdownServer()

© 2014 CloudBees, Inc. All Rights Reserved

25

Other MongoDB Tools

•  Edda – Log Visualizer –  Requires Python

•  MongoDB Monitoring Service –  Cloud (or on premise) based service that monitors MongoDB

instances via configured agents. –  Requires Python

© 2014 CloudBees, Inc. All Rights Reserved

26

MongoImport

•  Syntax: mongoimport --stopOnError --port 29009 --db geo --collection geos --file C:\UserData\Docs\JUGs\TwinCities\zips.json

•  Don’t use for backup or restore in production –  Use mongodump and mongorestore

© 2014 CloudBees, Inc. All Rights Reserved

27

MongoDB Web Admin Interface

•  Enabled with REST switch in startup config. •  Port location is main mongod port + 1000 •  Quick stats viewer •  Run commands

© 2014 CloudBees, Inc. All Rights Reserved

28

Project Configuration

•  MongoDB 2.4.9* •  Java 1.6 •  Maven 3 •  Jackson JSON Processor 1.9.4 •  Spring Framework 3.2.1.RELEASE •  Spring Data 1.3.2.RELEASE •  MongoDB Java API 2.11.3

© 2014 CloudBees, Inc.

All Rights Reserved 29

Project Code Location

•  Git Hub –  https://github.com/jimmyraywv/mongodb-spring-data

© 2014 CloudBees, Inc. All Rights Reserved

30

Spring Data

•  Large Spring project with many subprojects –  Category: Document Stores, Subproject MongoDB

•  “…aims to provide a familiar and consistent Spring-based programming model…”

•  Like other Spring projects, Data is POJO Oriented –  BEANS

•  Provides access to high-level and low-level APIs for managing MongoDB documents.

© 2014 CloudBees, Inc. All Rights Reserved

31

Spring Data (continued)

•  Provides annotation-driven meta-mapping •  Will allow you into bowels of API if you choose to hang

out there

© 2014 CloudBees, Inc. All Rights Reserved

32

Spring Framework Configuration Profiles

•  Uses a system level property to choose the profile defined in the Spring Configuration –  -Dspring.profiles.active=local –  … <beans profile="local"> …

© 2014 CloudBees, Inc. All Rights Reserved

33

Spring Data – The Logical Stack

© 2014 CloudBees, Inc. All Rights Reserved

34

Spring Data Templates

•  Main purpose is resource allocation and exception translation

•  Implements MongoOperations (mongoOps) interface –  mongoOps defines the basic set of MongoDB operations for the

Spring Data API.

•  Wraps the lower-level MongoDB API –  Provides access to the lower-level API –  Provides foundation for upper-level Repository API.

© 2014 CloudBees, Inc. All Rights Reserved

35

Spring Data Repositories

•  Convenience for data access •  Spring does ALL the work (unless you customize) •  Convention over configuration •  Uses a method-naming convention that Spring interprets

during implementation •  Hides complexities of Spring Data templates and

underlying API

© 2014 CloudBees, Inc. All Rights Reserved

36

Spring Data Repositories (continued)

•  Builds implementation for you based on interface design –  Implementation is built during Spring container load.

•  Is typed (parameterized via generics) to the model objects you want to store. –  When extending MongoRepository

•  Otherwise uses @RepositoryDefinition annotation

© 2014 CloudBees, Inc. All Rights Reserved

37

Custom Repositories

•  Hooks into Spring Data bean type hierarchy that allows you to add functionality to repositories

•  Important: You must write the implementation for part of this custom repository

•  And…your Spring Data repository interface must extend this custom interface, along with the appropriate Spring Data repository

© 2014 CloudBees, Inc. All Rights Reserved

38

Creating a Custom Repository

© 2014 CloudBees, Inc. All Rights Reserved

39

Spring Data Metadata Mapping

•  Annotation-driven mapping of model object fields to Spring Data elements in specific database dialect.

•  Maps Java POJOs to MongoDB documents –  Controls how POJO fields are mapped to MongoDB document fields –  Maps document index settings –  Maps Java types to MongoDB collections

•  Handy when you consider that MongoDB field labels are stored in each document.*

© 2014 CloudBees, Inc. All Rights Reserved

40

Bulk Inserts

•  All things being equal, bulk inserts in MongoDB are faster than inserting one record at a time.

•  As of MongoDB 1.8, the max BSON size of a batch insert was increased from 4MB to 16MB –  You can check this with the shell command: db.isMaster() or

mongo.getMaxBsonObjectSize() in the Java API

•  Batch sizes can be tuned for performance

© 2014 CloudBees, Inc.

All Rights Reserved 41

Transformers

•  Does the “heavy lifting” by preparing MongoDB objects for insertion

•  Transforms Java domain objects into MongoDB DBObjects.

© 2014 CloudBees, Inc. All Rights Reserved

42

Converters

•  For read and write, overrides default mapping of Java objects to MongoDB documents

•  Implements the Spring…Converter interface •  Registered with MongoDB configuration in Spring context •  Handy when integrating MongoDB to existing application. •  Can be used to manipulate fields inline with reads/writes

© 2014 CloudBees, Inc. All Rights Reserved

43

MongoDB DBRef

•  Optional •  Instead of nesting documents •  Have to save the “referenced” document first, so that DBRef

exists before adding it to the “parent” document •  Know the tradeoffs

© 2014 CloudBees, Inc. All Rights Reserved

44

MongoDB Queries

•  In mongos using JS: db.collection.find( <query>, <projection> )

•  Use the projection to limit fields returned, and therefore network traffic

•  Examples: –  db["employees"].find({"title":"Senior Engineer"}) –  db.employees.find({"title":"Senior Engineer"},{"_id":0}) –  db.employees.find({"title":"Senior Engineer"},{"_id":0,"title":1})

© 2014 CloudBees, Inc. All Rights Reserved

45

MongoDB Queries (continued)

•  In Java use DBObject or Spring Data Query for mapping queries.

•  You can include and exclude fields in the projection argument. –  You either include (1) or exclude (0) –  You can not include and exclude in the same projection, except for

the “_id” field.

© 2014 CloudBees, Inc.

All Rights Reserved 46

DBObject and BasicDBObject

•  For the Mongo Java driver, DBObject is the Interface, BasicDBObject is the class –  This is essentially a map with additional Mongo functionality –  See partial objects when up-serting

•  DBObject is used to build commands, queries, projections, and documents

•  DBObjects are used to build out the JS queries that would normally run in the shell. Each {…} is a potential DBObject.

© 2014 CloudBees, Inc. All Rights Reserved

47

MongoDB Advanced Queries

•  http://www.mongodb.org/display/DOCS/Advanced+Queries#AdvancedQueries-%24all

•  May use Mongo Java driver and BasicDBObjectBuilder •  Spring Data fluent API is much easier •  Demo - $in, $nin, $gt ($gte), $lt ($lte), $all, ranges

© 2014 CloudBees, Inc. All Rights Reserved

48

Logical Queries Using and/or •  Comma denotes “and”, and you can use $and

–  db.employees.find({"title":"Senior Engineer","lastName":"Bashian"},{"_id":0,"title":1})

•  For Or, you must use the $or operator –  db.employees.find({$or:[{"lastName":"Bashian"},{"lastName":"Baik"}]},

{"_id":0,"title":1,"lastName":1}) •  In Java, use DBObjects and ArrayLists…

–  Nest or/and ArrayLists for compound queries –  Or use the Spring Data Query and Criteria classes with “or” criteria

•  Also see QueryBuilder class

© 2014 CloudBees, Inc. All Rights Reserved

49

Array Queries db.misc.insert({users:["jimmy", "griffin"]}) db.misc.find({users:"griffin"}) { "_id" : ObjectId("518a5b7e18aa54b5cf8fc333"), "users" : [ "jimmy", "griffin" ]} db.misc.find({users:{$elemMatch:{name:"jimmy",gender:"male"}}}) { "_id" : ObjectId("518a599818aa54b5cf8fc332"), "users" : [ { "name" : "jimmy", "gender" : "male" }, { "name" : "griffin", "gender": "male" } ] }

© 2014 CloudBees, Inc.

All Rights Reserved 50

Array Updates db.misc.insert({"users":[{"name":"jimmy","gender":"male"},{"name":"griffin","gender":"male"}]}) db.misc.update({"_id":ObjectId("518276054e094734807395b6"),"users.name":"jimmy"}, {$set:{"users.$.name":"george"}}) db.employees.update({products:"Softball"}, {$pull:{products:"Softball" }},false,true) db.employees.find({products:"Softball"}).count() 0

© 2014 CloudBees, Inc.

All Rights Reserved 51

Does Field Exist

•  $exists db.locations.find({user:{$exists:false}}) •  Type “it” for more – iterates over documents - paging

© 2014 CloudBees, Inc. All Rights Reserved

52

RegEx Queries

•  In JS: db.employees.find({ "title" : { "$regex" : "seNior EngIneer" , "$options" : "i"}})

•  In Java use java.util.regex.Pattern

© 2014 CloudBees, Inc. All Rights Reserved

53

Optimizing Queries

•  Use $hint or hint() in JS to tell MongoDB to use specific index

•  Use hint() in Java API with fluent API •  Use $explain or explain() to see MongoDB query explain

plan –  Number of scanned objects should be close to the number of

returned objects

© 2014 CloudBees, Inc.

All Rights Reserved 54

Aggregation Queries

•  Aggregation Framework •  Map/Reduce - Demo •  Distinct - Demo •  Group - Demo

–  Similar to SQL Group By function

•  Count

© 2014 CloudBees, Inc.

All Rights Reserved 55

Collection Callbacks

•  MongoDB Java API provides callback functionality –  This is implmented in Java via anonymous inner classes –  Accessible via the Spring Data Template (MongoOperations)

•  Can be used in lieu of converters for inline DBObject convertion.

© 2014 CloudBees, Inc. All Rights Reserved

56

Unwind

•  $unwind •  Useful command to convert arrays of objects, within

documents, into sub-documents that are then searchable by query.

db.depts.aggregate({"$project":{"employees":"$employees"}},{"$unwind":"$employees"},{"$match":{"employees.lname":"Vural"}});

© 2014 CloudBees, Inc. All Rights Reserved

57

GridFS

•  “…specification for storing large files in MongoDB.” •  As the name implies, “Grid” allows the storage of very large

files divided across multiple MongoDB documents. •  Uses native BSON binary formats •  16MB per document •  Large files added to GridFS get chunked and spread across

multiple documents.

© 2014 CloudBees, Inc. All Rights Reserved

58

Indexes

•  Similar to RDBMS Indexes, Btree (support range queries) •  Can have many and can be compound •  Including indexes of array fields in document •  Makes searches, aggregates, and group functions faster •  Makes writes slower

–  Sparse = true •  Only include documents in this index that actually contain a value in

the indexed field.

© 2014 CloudBees, Inc. All Rights Reserved

59

Text Indexes

•  Introduced in 2.4 •  Requires enabled in mongod

–  --setParameter textSearchEnabled=true

•  In mongo (shell) –  db["employees"].ensureIndex({"title":"text"})

•  Index “title” field with text index

•  At least 2x the storage space

© 2014 CloudBees, Inc. All Rights Reserved

60

GEO Spatial Ops

•  One of MongoDB’s sweet spots •  Used to store, index, search on geo-spatial data for GIS

operations. •  Requires special indexes, 2d and 2dsphere (new with 2.4) •  Requires Longitude and Latitude (in that order) coordinates

contained in double precision array within documents.

© 2014 CloudBees, Inc.

All Rights Reserved 61

Query Pagination

•  Use Spring Data and QueryDSL - http://www.querydsl.com/ •  Modify Spring Data repo extend

QueryDslPredicateExecutor •  Add appropriate Maven POM entries for QueryDSL •  Use Page and PageRequest objects to page through result

sets •  QueryDSL will create Q<MODEL> Java classes •  Precludes developers from righting pagination code

© 2014 CloudBees, Inc. All Rights Reserved

62

Save vs. Update

•  Java driver save() saves entire document. •  Use “update” to save time and bandwidth, and possibly

indexing. •  Spring Data is slightly slower than lower level mongo Java

driver •  Spring data fluent API is very helpful.

© 2014 CloudBees, Inc.

All Rights Reserved 63

MongoDB Security

•  Default is trusted mode, no security –  --auth –  --keyfile

•  Replica sets require this option

•  New with 2.4: –  Kerberos Support

© 2014 CloudBees, Inc. All Rights Reserved

64

MongoDB Auth Security

•  Use –auth switch to enable •  Create users with roles •  Use db.authenticate in the code (if need be)

© 2014 CloudBees, Inc. All Rights Reserved

65

MongoDB Write Concerns

•  Describes quality of writes (or write assurances) •  Application (MongoDB client) is concerned with this

quality •  Write concerns describe the durability of a write, and can

be tuned based on application and data needs •  Adjusting write concerns can have an affect (maybe

deleterious) on write performance.

© 2014 CloudBees, Inc. All Rights Reserved

66

Encryption

•  MongoDB does not support data encryption, per se •  Or…use TDE (Transparent Data Encryption) from Gazzang •  Use application-level encryption and store encrypted data

in BSON fields –  **If you absolutely need encryption and you cannot get TDE**

© 2014 CloudBees, Inc. All Rights Reserved

67

New JavaScript Engine – V8

•  MongoDB 2.4 uses the Google V8 JavaScript Engine –  https://code.google.com/p/v8/ –  Open source, written in C++, –  High performance, with improved concurrency for multiple

JavaScript operations in MongoDB at the same time.

© 2014 CloudBees, Inc. All Rights Reserved

68

Some Useful Commands

•  use <db> - connects to a DB •  use admin; db.runCommand({top:1})

–  Returns info about collection activity

•  db.currentOp() – returns info about operations currently running in mongo db

•  db.serverStatus() •  use admin; db.shutdownServer();

© 2014 CloudBees, Inc. All Rights Reserved

69

More Useful Commands

•  db.hostInfo() •  db.isMaster() •  db.runCommand({"buildInfo":1}) •  it •  db.runCommand({touch:"employees",data:true,index:true}) { "ok" : 1 }

© 2014 CloudBees, Inc.

All Rights Reserved 70

JS Benchmarking Harness

•  Benchrun command •  http://www.mongodb.org/about/contributors/js-

benchmarking-harness/ •  “QA baseline perf measurement tool”

© 2014 CloudBees, Inc. All Rights Reserved

71

MongoDB in the Cloud

•  Some of the top Service Providers –  MongoHQ – Integrated Partner with CloudBees –  Amazon (AWS) EC2 –  MongoLab –  Object Rocket

•  REST APIs

•  http://www.mongodb.com/partners/cloud

© 2014 CloudBees, Inc. All Rights Reserved

72

MongoHQ Info

•  Integrated Partner with CloudBees •  https://www.mongohq.com/home •  http://www.cloudbees.com/platform-service-mongohq.cb •  Works with CloudBees Weave@cloud services •  Access it with DB URI just like other mongod •  Has REST API

–  Requires API key from account

© 2014 CloudBees, Inc.

All Rights Reserved 73

MongoLab Info

•  https://mongolab.com/welcome/ •  Access it with DB URI just like other mongod •  Has REST API

–  Requires API key from account

© 2014 CloudBees, Inc. All Rights Reserved

74

Questions?

•  Thank you for your attention!

© 2014 CloudBees, Inc. All Rights Reserved

75