MongoDB 2.4 and spring data
-
date post
18-Oct-2014 -
Category
Technology
-
view
6.095 -
download
0
description
Transcript of MongoDB 2.4 and spring data
1
MongoDB 2.4 and Spring Data
June 10h, 2013
2
Who Am I?
Solutions Architect with ICF Ironworks
Part-time Adjunct Professor
Started with HTML and Lotus Notes in 1992• In the interim there was C, C++, VB, Lotus Script, PERL, LabVIEW,
Oracle, MS SQL Server, etc.
Not so much an Early Adopter as much as a Fast Follower of Java Technologies
Alphabet Soup (MCSE, ICAAD, ICASA, SCJP, SCJD, PMP, CSM)
LinkedIn: http://www.linkedin.com/in/iamjimmyray
Blog: http://jimmyraywv.blogspot.com/ Avoiding Tech-sand
3
MongoDB 2.4 and Spring Data
4
Tonight’s Agenda
Quick introduction to NoSQL and MongoDB• Configuration• MongoView
Introduction to Spring Data and MongoDB support• Spring Data and MongoDB configuration• Templates• Repositories
• Query Method Conventions• Custom Finders• Customizing Repositories
• Metadata Mapping (including nested docs and DBRef)• Aggregation Functions• GridFS File Storage• Indexes
5
What is NoSQL?
Official: Not Only SQL• In reality, it may or may not use SQL*, at least in its truest form• Varies from the traditional RDBMS approach of the last few decades• Not necessarily a replacement for RDBMS; more of a solution for more
specific needs where is RDBMS is not a great fit• Content Management (including CDNs), document storage, object storage,
graph, etc.
It means different things to different folks.• It really comes down to a different way to view our data domains for
more effective storage, retrieval, and analysis…albeit with tradeoffs that effect our design decisions.
6
From NoSQL-Database.org
“NoSQL DEFINITION: Next Generation Databases mostly addressing some of the points: being non-relational, distributed, open-source and horizontally scalable. The original intention has been modern web-scale databases. The movement began early 2009 and is growing rapidly. Often more characteristics apply such as: schema-free, easy replication support, simple API, eventually consistent / BASE (not ACID), a huge amount of data and more.”
7
Some NoSQL Flavors
Document Centric• MongoDB• Couchbase
Wide Column/Column Families• Cassandra• Hadoop Hbase
XML (JSON, etc.)• MarkLogic
Graph• Neo4J
Key/Value Stores• Redis
Object• DB4O
Other• LotusNotes/Domino
8
Why MongoDB
Open Source
Multiple platforms (Linux, Win, Solaris, Apple) and Language Drivers
Explicitly de-normalized
Document-centric and Schema-less (for the most part)
Fast (low latency)• Fast access to data• Low CPU overhead
Ease of scalability (replica sets), auto-sharding
Manages complex and polymorphic data
Great for CDN and document-based SOA solutions
Great for location-based and geospatial data solutions
9
Why MongoDB (more)
Because of schema-less approach is more flexible, MongoDB is intrinsically ready for iterative (Agile) projects.
Eliminates “impedance-mismatching” with typical RDBMS solutions
“How do I model my object/document based application in 3NF?”
If you are already familiar with JavaScript and JSON, MongoDB storage and document representation is easier to understand.
Near-real-time data aggregation support
10gen has been responsive to the MongoDB community
10
What is schema-less?
A.K.A. schema-free, 10gen says “flexible-schema”
It means that MongoDB does not enforce a column data type on the fields within your document, nor does it confine your document to specific columns defined in a table definition.
The schema “can be” actually controlled via the application API layers and is implied by the “shape” (content) of your documents.
This means that different documents in the same collection can have different fields.• So the schema is flexible in that way• Only the _id field is mandatory in all documents.
Requires more rigor on the application side.
11
Is MongoDB really schema-less?
Technically no.
There is the System Catalog of system collections• <database>.system.namespaces• <database>.system.indexes• <database>.system.profile• <database>.system.users
And…because of the nature of how docs are stored in collections (JSON/BSON), field labels are store in every doc*
12
Schema tips
MongoDB has ObjectID, can be placed in _id• If you have a natural unique ID, use that instead
De-normalize when needed (you must know MongoDB restrictions)• For example: Compound indexes cannot contain parallel arrays
Create indexes that cover queries• Mongo only uses one index at a time for a query• Watch out for sorts• What out for field sequence in compound indexes.
Reduce size of collections (watch out for label sizes)
13
MongoDB Data Modeling and Node Setups
Schema Design is still important
Understand your concerns• Do you have read-intensive or write-intensive data• Document embedding (fastest and atomic) vs. references (normalized)• Atomicity – Document Level Only
• Can use 2-Phase Commit Pattern
• Data Durability• Not “truly” available in a single-server setup• Requires write concern tuning• Need sharding and/or replicas
10gen offers patterns and documentation:• http://docs.mongodb.org/manual/core/data-modeling/
14
Why Not MongoDB
High speed and deterministic transactions:• Banking and accounting
• See MongoDB Global Write Locking– Improved by better yielding in 2.0
Where SQL is absolutely required• Where true Joins are needed*
Traditional non-real-time data warehousing ops*
If your organization lacks the controls and rigor to place schema and document definition at the application level without compromising data integrity**
15
MongoDB
Was designed to overcome some of the performance shortcomings of RDBMS
Some Features• Memory Mapped IO (32bit vs. 64bit)• Fast Querying (atomic operations, embedded data)• In place updates (physical writes lag in-memory changes)
• Depends on Write Concern settings
• Full Index support (including compound indexes, text, spherical)• Replication/High Availability (see CAP Theorem)• Auto Sharding (range-based portioning, based on shard key) for
scalability• Aggregation, MapReduce, geo-spatial• GridFS
16
MongoDB – In Place Updates
No need to get document from the server, just send update
Physical disk writes lag in-memory changes. • Lag depends on Write-Concerns (Write-through)• Multiple writes in memory can occur before the object is updated on
disk
MongoDB uses an adaptive allocation algorithm for storing its objects.• If an object changes and fits in it’s current location, it stays there.• However, if it is now larger, it is moved to a new location. This moving
is expensive for index updates• MongoDB looks at collections and based on how many times items
grow within a collection, MongoDB calculates a padding factor that trys to account for object growth
• This minimizes object relocation
17
MongoDB – A Word About Sharding…
Need to choose the right key• Easily divisible (“splittable”– see cardinality) so that Mongo can
distribute data among shards• “all documents that have the same value in the state field must reside on the
same shard” – 10Gen
• Enable distributed write operations between cluster nodes• Prevents single-shard bottle-necking
• Make it possible for “Mongos” return most query operations from multiple shards (or single shard if you can guarantee contiguous storage in that shard**)
• Distribute write evenly among mongos• Minimize disk seeks per mongos• “users will generally have a unique value for this field (Phone)
– MongoDB will be able to split as many chunks as needed” – 10Gen
Watch out for the need to perform range queries.
18
MongoDB – Cardinality…
In most cases, when sharding for performance, you want higher cardinality to allow chunks of data to be split among shards• Example: Address data components
• State – Low Cardinality• ZipCode – Potentially low or high, depending population• Phone Number – High Cardinality
High cardinality is a good start for sharding, but..• …it does not guarantee query isolation• …it does not guarantee write scaling
• Consider computed keys (Hashed , MD5, etc.)
19
CAP Theorem
Consistency – all nodes see the same data at the same time
Availability – all requests receive responses, guaranteed
Partition Tolerance (network partition tolerance)
The theorem states that you can never have all three, so you plan for two and make the best of the third.• For example: Perhaps “eventual consistency” is OK for a CDN
application.• For large scalability, you would need partitioning. That leaves C & A to
choose from• Would you ever choose consistency over availability?
How does CLOUD implementations change this?
20
Example MongoDB Isolated Setup
21
Container Models: RDBMS vs. MongoDB
RDBMS: Servers > Databases > Schemas > Tables > Rows• Joins, Group By, ACID
MongoDB: Servers > Databases > Collections > Documents• No Joins**• Instead: Db References (Linking) and Nested Documents (Embedding)
22
MongoDB Collections
Schema-less
Can have up to 24000 (according to 10gen)• Cheap to resource
Contain documents (…of varying shapes)• 100 nesting levels (version 2.2)
Are namespaces, like indexes
Can be “Capped”• Limited in max size with rotating overwrites of oldest entries
• Logging anyone?
• Example: MongoDB oplog
TTL Collections
23
MongoDB Documents
JSON (what you see)• Actually BSON (Internal - Binary JSON - http://bsonspec.org/)
Elements are name/value pairs
16 MB maximum size
What you see is what is stored• No default fields (columns)
24
MongoDB Documents
25
JSON Syntax
Curly braces are used for documents/objects – {…}
Square brackets are used for arrays – […]
Colons are used to link keys to values – key:value
Commas are used to separate multiple objects or elements or key/value pairs – {ke1:value1, key2:value2…}
JavaScript has how many data types?• 6 – Text, Number, Array, Object, null, Boolean
26
JSON Syntax Example
{
“application”:”HR System”,
"users" : [{"name" : "bill",“age" : 60},
{"name" : "fred","age" : 29}]
}
27
Why BSON?
Adds data types that JSON did not support – (ISO Dates, ObjectId, etc.)
Optimized for performance
Adds compression
http://bsonspec.org/#/specification
28
MongoDB Install
Extract MongoDB
Build config file, or use startup script• Need dbpath configured• Need REST configured for Web Admin tool
Start Mongod (daemon) process
Use Shell (mongo) to access your database
Use MongoVUE (or other) for GUI access and to learn shell commands
29
MongoDB Install
30
Mongo Shell
In Windows, mongo.exe
Interactive JavaScript shell to mongod
Command-line interface to MongoDB (sort of like SQL*Plus for Oracle)
JavaScript Interpreter, behaves like a read-eval-print loop
Can be run without database connection (use –nodb)
Uses a fluent API with lazy cursor evaluation• db.locations.find({state:'MN'},{city:1,state:1,_id:0}).sort({city:-
1}).limit(5).toArray();
31
MongoVUE
GUI around MongoDB Shell
Current version 1.61 (May 2013)
Makes it easy to learn MongoDB Shell commands• db.employee.find({ "lastName" : "Smith", "firstName" :
"John" }).limit(50);• show collections
Not sure if development is continuing, but very handy still.
Demo…
32
Mongo Explorer
Silverlight GUI
Current development has stopped – for now.
http://mongoexplorer.com/
Demo…
33
Web Admin Interface
Localhost:<mongod port + 1000>
Quick stats viewer
Run commands
Demo
There is also Sleepy Mongoose• http://www.kchodorow.com/blog/2010/02/22/sleepy-mongoose-a-
mongodb-rest-interface/
34
Web Admin Interface
35
Other MongoDB Tools
Edda – Log Visualizer• http://
blog.mongodb.org/post/28053108398/edda-a-log-visualizer-for-mongodb
• Requires Python
MongoDB Monitoring Service• Free Cloud based service that monitors MongoDB instances via
configrued agents.• Requires Python• http://www.10gen.com/products/mongodb-monitoring-service
Splunk• www.splunk.com
36
MongoImport
Binary mongoimport
Syntax: mongoimport --stopOnError --port 29009 --db geo --collection geos --file C:\UserData\Docs\JUGs\TwinCities\zips.json
Don’t use for backup or restore in production• Use mongodump and mongorestore
37
Spring Data
Large Spring project with many subprojects• Category: Document Stores, Subproject MongoDB
“…aims to provide a familiar and consistent Spring-based programming model…”
Like other Spring projects, Data is POJO Oriented
For MongoDB, provides high-level API and access to low-level API for managing MongoDB documents.
Provides annotation-driven meta-mapping
Will allow you into bowels of API if you choose to hang out there
38
Spring Data MongoDB Templates
Implements MongoOperations (mongoOps) interface• mongoOps defines the basic set of MongoDB operations for the Spring
Data API.• Wraps the lower-level MongoDB API
Provides access to the lower-level API
Provides foundation for upper-level Repository API.
Demo
39
Spring Data MongoDB Templates - Configuration
See mongo-config.xml
40
Spring Data MongoDB Templates - Configuration
Or…see the config class
41
Spring Data MongoDB Templates - Configuration
42
Spring Data Repositories
Convenience for data access• Spring does ALL the work (unless you customize)
Convention over configuration• Uses a method-naming convention that Spring interprets during implementation
Hides complexities of Spring Data templates and underlying API
Builds implementation for you based on interface design• Implementation is built during Spring container load.
Is typed (parameterized via generics) to the model objects you want to store.• When extending MongoRepository• Otherwise uses @RepositoryDefinition annotation
Demo
43
Spring Data Bulk Inserts
All things being equal, bulk inserts in MongoDB can be faster than inserting one record at a time, if you have batch inserts to perform.
As of MongoDB 1.8, the max BSON size of a batch insert was increased from 4MB to 16MB• You can check this with the shell command: db.isMaster() or
mongo.getMaxBsonObjectSize() in the Java API
Batch sizes can be tuned for performance
Demo
44
Transformers
Does the “heavy lifting” by preparing MongoDB objects for insertion
Transforms Java domain objects into MongoDB DBObjects.
Demo
45
Converters
For read and write, overrides default mapping of Java objects to MongoDB documents
Implements the Spring…Converter interface
Registered with MongoDB configuration in Spring context
Handy when integrating MongoDB to existing application.
Can be used to remove “_class” field
46
Spring Data Meta Mapping
Annotation-driven mapping of model object fields to Spring Data elements in specific database dialect. – Demo
47
MongoDB DBRef
Optional
Instead of nesting documents
Have to save the “referenced” document first, so that DBRef exists before adding it to the “parent” document
48
MongoDB DBRef
49
MongoDB DBRef
50
MongoDB Custom Spring Data Repositories
Hooks into Spring Data bean type hierarchy that allows you to add functionality to repositories
Important: You must write the implementation for part of this custom repository
And…your Spring Data repository interface must extend this custom interface, along with the appropriate Spring Data repository
Demo
51
Creating a Custom Repository
Write an interface for the custom methods
Write the implementation for that interface
Write the traditional Spring Data Repository application interface, extending the appropriate Spring Data interface and the (above) custom interface
When Spring starts, it will implement the Spring Data Repository normally, and include the custom implementation as well.
52
MongoDB Queries
In mongos using JS: db.collection.find( <query>, <projection> )• Use the projection to limit fields returned, and therefore network traffic
Example: db["employees"].find({"title":"Senior Engineer"})
Or: db.employees.find({"title":"Senior Engineer"},{"_id":0})
Or: db.employees.find({"title":"Senior Engineer"},{"_id":0,"title":1})
In Java use DBObject or Spring Data Query for mapping queries.
You can include and exclude fields in the projection argument.• You either include (1) or exclude (0)• You can not include and exclude in the same projection, except for the
“_id” field.
53
DBObject and BasicDBObject
For the Mongo Java driver, DBObject is the Interface, BasicDBObject is the class
This is essentially a map with additional Mongo functionality• See partial objects when up-serting
DBObject is used to build commands, queries, projections, and documents
DBObjects are used to build out the JS queries that would normally run in the shell. Each {…} is a potential DBObject.
54
MongoDB Queries – And & Or
Comma denotes “and”, and you can use $and• db.employees.find({"title":"Senior Engineer","lastName":"Bashian"},
{"_id":0,"title":1})
For Or, you must use the $or operator• db.employees.find({$or:[{"lastName":"Bashian"},{"lastName":"Baik"}]},
{"_id":0,"title":1,"lastName":1})
In Java, use DBObjects and ArrayLists…• Nest or/and ArrayLists for compound queries
Or use the Spring Data Query and Criteria classes with or criteria
Also see QueryBuilder class
Demo
55
MongoDB Array Queries
db.misc.insert({users:["jimmy", "griffin"]})
db.misc.find({users:"griffin"})• { "_id" : ObjectId("518a5b7e18aa54b5cf8fc333"), "users" :
[ "jimmy", "griffin" ]}
db.misc.find({users:{$elemMatch:{name:"jimmy",gender:"male"}}})
{ "_id" : ObjectId("518a599818aa54b5cf8fc332"), "users" : [ { "name" : "jimmy", "gender" : "male" }, { "name" : "griffin", "gender": "male" } ] }
56
MongoDB Array Updates
db.misc.insert({"users":[{"name":"jimmy","gender":"male"},{"name":"griffin","gender":"male"}]})
db.misc.update({"_id":ObjectId("518276054e094734807395b6"),"users.name":"jimmy"}, {$set:{"users.$.name":"george"}})
db.employees.update({products:"Softball"}, {$pull:{products:"Softball" }},false,true)
db.employees.find({products:"Softball"}).count()
0
57
Does Field Exist
$exists
db.locations.find({user:{$exists:false}})
Type “it” for more – iterates over documents - paging
58
MongoDB Advanced Queries
http://www.mongodb.org/display/DOCS/Advanced+Queries#AdvancedQueries-%24all
May use Mongo Java driver and BasicDBObjectBuilder
Spring Data fluent API is much easier
Demo - $in, $nin, $gt ($gte), $lt ($lte), $all, ranges
59
MongoDB RegEx Queries
In JS:
db.employees.find({ "title" : { "$regex" : "seNior EngIneer" , "$options" : "i"}})
In Java use java.util.regex.Pattern
60
Optimizing Queries
Use $hint or hint() in JS to tell MongoDB to use specific index
Use hint() in Java API with fluent API
Use $explain or explain() to see MongoDB query explain plan• Number of scanned objects should be close to the number of returned
objects
61
MongoDB Aggregation Functions
Aggregation Framework
Map/Reduce - Demo
Distinct - Demo
Group - Demo• Similar to SQL Group By function
Count
Demo #7
62
More Aggregation
$unwind• Useful command to convert arrays of objects, within documents, into
sub-documents that are then searchable by query.
db.depts.aggregate({"$project":{"employees":"$employees"}},{"$unwind":"$employees"},{"$match":{"employees.lname":"Vural"}});
Demo
63
More Aggregation
$unwind• Useful command to convert arrays of objects, within documents, into
sub-documents that are then searchable by query.
db.depts.aggregate({"$project":{"employees":"$employees"}},{"$unwind":"$employees"},{"$match":{"employees.lname":"Vural"}});
Demo
64
MongoDB GridFS
“…specification for storing large files in MongoDB.”
As the name implies, “Grid” allows the storage of very large files divided across multiple MongoDB documents.• Uses native BSON binary formats
16MB per document• Will be higher in future
Large files added to GridFS get chunked and spread across multiple documents.
65
MongoDB GridFS
“…specification for storing large files in MongoDB.”
As the name implies, “Grid” allows the storage of very large files divided across multiple MongoDB documents.• Uses native BSON binary formats
16MB per document• Will be higher in future
Large files added to GridFS get chunked and spread across multiple documents.
66
MongoDB GridFS
“…specification for storing large files in MongoDB.”
As the name implies, “Grid” allows the storage of very large files divided across multiple MongoDB documents.• Uses native BSON binary formats
16MB per document• Will be higher in future
Large files added to GridFS get chunked and spread across multiple documents.
67
MongoDB Indexes
Similar to RDBMS Indexes, Btree (support range queries)
Can have many
Can be compound• Including indexes of array fields in document
Makes searches, aggregates, and group functions faster
Makes writes slower
Sparse = true• Only include documents in this index that actually contain a value in the
indexed field.
68
Text Indexes
Currently in BETA, as of 2.4, not recommended for production…yet
Requires enabled in mongod• --setParameter textSearchEnabled=true
In mongo (shelll)• db["employees"].ensureIndex({"title":"text"})
• Index “title” field with text index
69
Text Indexes
Currently in BETA, as of 2.4, not recommended for production…yet
Requires enabled in mongod• --setParameter textSearchEnabled=true
In mongo (shelll)• db["employees"].ensureIndex({"title":"text"})
• Index “title” field with text index
70
GEO Spatial Operations
One of MongoDB’s sweet spots
Used to store, index, search on geo-spatial data for GIS operations.
Requires special indexes, 2d and 2dsphere (new with 2.4)
Requires Longitude and Latitude (in that order) coordinates contained in double precision array within documents.
Demo
71
GEO Spatial Operations
One of MongoDB’s sweet spots
Used to store, index, search on geo-spatial data for GIS operations.
Requires special indexes, 2d and 2dsphere (new with 2.4)
Requires Longitude and Latitude (in that order) coordinates contained in double precision array within documents.
Demo
72
Query Pagination
Use Spring Data and QueryDSL - http://www.querydsl.com/
Modify Spring Data repo extend QueryDslPredicateExecutor
Add appropriate Maven POM entries for QueryDSL
Use Page and PageRequest objects to page through result sets
QueryDSL will create Q<MODEL> Java classes• Precludes developers from righting pagination code
73
Save vs. Update
Java driver save() saves entire document.
Use “update” to save time and bandwidth, and possibly indexing.• Spring Data is slightly slower than lower level mongo Java driver• Spring data fluent API is very helpful.
74
MongoDB Security
http://www.mongodb.org/display/DOCS/Security+and+Authentication
Default is trusted mode, no security
--auth
--keyfile• Replica sets require this option
New with 2.4:• Kerberos Support
75
MongoDB Auth Security
Use –auth switch to enable
Create users with roles
Use db.authenticate in the code (if need be)
76
MongoDB Auth Security with Spring
May need to add credentials to Spring MongoDB config
Do not authenticate twice
java.lang.IllegalStateException: can't call authenticate twice on the same DBObject
at com.mongodb.DB.authenticate(DB.java:476)
77
MongoDB Write Concerns
Describes quality of writes (or write assurances) to MongoDB
Application (MongoDB client) is concerned with this quality
Write concerns describe the durability of a write, and can be tuned based on application and data needs
Adjusting write concerns can have an affect (maybe deleterious) on write performance.
78
MongoDB Encryption
MongoDB does not support data encryption, per se
Use application-level encryption and store encrypted data in BSON fields
Or…use TDE (Transparent Data Encryption) from Gazzang• http://www.gazzang.com/encrypt-mongodb
79
MongoDB Licensing
Database• “Free Software Foundation's GNU AGPL v3.0.” – 10gen• “Commercial licenses are also available from 10gen, including free
evaluation licenses.” – 10gen
Drivers (API):• “mongodb.org supported drivers: Apache License v2.0.” – 10gen• “Third parties have created drivers too; licenses will vary there.” –
10gen
80
MongoDB 2.2
Drop-in replacement for 1.8 and 2.0.x
Aggregation without Map Reduce
TTL Collections (alternative to Capped Collections)
Tag-aware Sharding
http://docs.mongodb.org/manual/release-notes/2.2/
81
MongoDB 2.4
Text Search• Must be enabled, off by default• Introduces considerable overhead for processing and storage• Not recommended for PROD systems; it is a BETA feature.
Hashed Index and sharding
http://docs.mongodb.org/manual/release-notes/2.4/
82
New JavaScript Engine – V8
MongoDB 2.4 uses the Google V8 JavaScript Engine• https://code.google.com/p/v8/• Open source, written in C++, • High performance, with improved concurrency for multiple JavaScript
operations in MongoDB at the same time.
83
Some Useful Commands
use <db> - connects to a DB
use admin; db.runCommand({top:1})• Returns info about collection activity
db.currentOp() – returns info about operations currently running in mongo db
db.serverStatus()
db.hostInfo()
db.isMaster()
db.runCommand({"buildInfo":1})
it
db.runCommand({touch:"employees",data:true,index:true})• { "ok" : 1 }
84
Helpful Links
Spring Data MongoDB - Reference Documentation: http://static.springsource.org/spring-data/data-mongodb/docs/1.0.2.RELEASE/reference/html/
http://nosql-database.org/
www.mongodb.org
http://www.mongodb.org/display/DOCS/Java+Language+Center
http://www.mongodb.org/display/DOCS/Books
http://openmymind.net/2011/3/28/The-Little-MongoDB-Book/
http://jimmyraywv.blogspot.com/2012/05/mongodb-and-spring-data.html
http://jimmyraywv.blogspot.com/2012/04/mongodb-jongo-and-morphia.html
https://www.10gen.com/presentations/webinar/online-conference-deep-dive-mongodb
http://docs.mongodb.org/manual/faq/developers/#faq-developers-query-for-nulls
85
Questions