Develop an App with MongoDB - Percona · Authorization Use Auth with 1-user-per App Authorization...

54
Speaker Name Develop an App with MongoDB Tim Vaillancourt Software Engineer, Percona

Transcript of Develop an App with MongoDB - Percona · Authorization Use Auth with 1-user-per App Authorization...

  • Speaker Name

    Develop an App with MongoDB

    Tim VaillancourtSoftware Engineer, Percona

  • {name: “tim”,lastname: “vaillancourt”,employer: “percona”,techs: [

    “mongodb”,“mysql”,“cassandra”,“redis”,“rabbitmq”,“solr”,“mesos”“kafka”,“couch*”,“python”,“golang”

    ]}

    `whoami`

  • Agenda

    ● Security● Schema, Performance, etc● Aggregation Framework● Data Integrity● Monitoring● Troubleshooting● Scaling● Elastic Deployment● Questions?

  • Security

  • Authorization

    ● Use Auth with 1-user-per App○ Authorization is default in modern MongoDB○ Most apps need the “readWrite” built-in role only

    ● Built-in Roles○ Database User: Read or Write data from collections

    ■ “All Databases” or Single-database○ Database Admin: Non-RW commands (create/drop/list/etc)○ Backup and Restore: ○ Cluster Admin: Add/Drop/List shards○ Superuser/Root: All capabilities○ User-defined roles are also possible

  • Encryption

    ● Make sure operations teams are aware of sensitive data in the app!● MongoDB SSL / TLS Connections

    ○ Supported since MongoDB 2.6x○ Minimum of 128-bit key length for security○ Relaxed and strict (requireSSL) modes○ System (default) or Custom Certificate Authorities are accepted

    ● Encryption-at-Rest○ Possible with:

    ■ MongoDB Enterprise ($$$) binaries■ Block device encryption (See Percona Blog)

  • Source IP Restrictions

    ● “authenticationRestrictions” added to db.createUser() in MongoDB 3.6● Allows access restriction by client source IP(s) and/or IP range(s)● Example:

    db.createUser({user: "admin",pwd: "insertSecurePasswordHere",roles: [

    { db: "admin", role: "root" }],authenticationRestrictions: [

    { clientSource: [ "127.0.0.1", "10.10.19.0/24" ] }]

    })

  • Schema, Performance, etc

  • Data Types

    ● Strings○ Only use strings if required○ Do not store numbers as strings!○ Look for {field:“123456”} instead of {field:123456}

    ■ “12345678” moved to a integer uses 25% less space■ Range queries on proper integers is more efficient

    ○ Example JavaScript to convert a field in an entire collectiondb.items.find().forEach(function(x) { newItemId = parseInt(x.itemId); db.items.update( { _id: x._id }, { $set: { itemId: itemId } } )});

  • Data Types

    ● Strings○ Do not store dates as strings!

    ■ The field "2017-08-17 10:00:04 CEST" stores in 52.5% less space as a real date!○ Do not store booleans as strings!

    ■ “true” -> true = 47% less space wasted● DBRefs

    ○ DBRefs provide pointers to another document○ DBRefs can be cross-collection

    ● NumberLong (MongoDB 3.4+)○ Higher precision for floating-point numbers

  • Indexes

    ● MongoDB supports BTree, text and geo indexes○ Btree is default behaviour

    ● By default, collection lock until indexing completes● { background:true } Indexing

    ○ Runs indexing in the background avoiding pauses○ Hard to monitor and troubleshoot progress○ Unpredictable performance impact○ Our suggestion: rollout indexes one node at a time

    ■ Disable replication and change TCP port, restart.■ Apply index.■ Enable replication, restore TCP port.

  • Indexes

    ● Avoid drivers that auto-create indexes○ Use real performance data to make indexing decisions, find out before

    Production!● Too many indexes hurts

    ○ Write performance for an entire collection○ Optimiser efficiency○ Disk and RAM is wasted

    ● Indexes have a forward or backward direction○ Try to cover .sort() with index and match direction!

    ● Indexes can be “hinted” and forced if necessary

  • Indexes

    ● The size of the indexed fields impacts the size of the index○ A point so important, it has its own slide!

  • Indexes

    ● Compound Indexes○ Several fields supported○ Fields can be in forward or backward direction

    ■ Consider any .sort() query options and match sort direction!○ Composite Keys are read Left -> Right!

    ■ Index can be partially-read■ Left-most fields do not need to be duplicated!■ All Indexes below are duplicates of the first index:

    ● {username: 1, status: 1, date: 1, count: -1}● {username: 1, status: 1, data: 1 }● {username: 1, status: 1 }● {username: 1 }

    ● Use db.collection.getIndexes() to view current Indexes

  • Query Efficiency

    ● Query Efficiency Ratios○ Index: keysExamined / nreturned○ Document: docsExamined / nreturned

    ● End goal: Examine only as many Index Keys/Docs as you return!○ Tip: when using covered indexes zero documents are fetched (docsExamined: 0)!○ Example: a query scanning 10 documents to return 1 has efficiency 0.1○ Scanning zero docs is possible if using a covered index!

  • Query Efficiency

    ● Sorting is relatively CPU intensive due to iteration● Sharding

    ○ Sorting occurs on the Mongos process when no index on field○ A mongos often has fewer resources

    ● Match the direction of the sort direction, ie: 1 or -1

    Index: { id: 1, cost: -1 }Query: db.items.find({ id: 1234 }).sort({ cost: -1 })

    Index: { id: 1, cost: 1 }Query: db.items.find({ id: 1234 }).sort({ cost: -1 })

  • Bulk Writes

    ● Bulk Write operations allow many writes in a single operation○ Available since 3.2 as the shell operation db.collection.bulkWrite()○ Operates on a single collection○ Can improve batch insert performance

    ■ Helpful for ETL jobs, import/export jobs, etc○ Ordered Mode

    ■ Documents are written in order■ An error stops the Bulk operation

    ○ Unordered Mode■ Documents are written unordered■ An error DOES NOT STOP the Bulk Operation!

  • Antipatterns / Features to Avoid

    ● No list of fields specified in .find()○ MongoDB returns entire documents unless fields are specified○ Only return the fields required for an application operation!○ Covered-index operations require only the index fields to be specified○ Example:

    db.items.find({ id: 1234 }, { cost: 1, available: 1 })● Using $where operators

    ○ This executes JavaScript with a global lock

  • Antipatterns / Features to Avoid

    ● Many $and or $or conditions○ MongoDB (or any RDBMS) doesn’t handle large lists of $and or $or efficiently○ Try to avoid this sort of model with

    ■ Data locality■ Background Summaries / Views

    ● .mapReduce()○ Generally more complex code to read/maintain○ Performs slower than the Aggregation Framework○ Performs extraneous locking vs Aggregation Framework

    ● Unordered Bulk Writes○ Error handling can be unpredictable

  • Aggregation Framework

  • Aggregation Pipeline: .aggregate()

    ● Run as a pipeline of “stages” on a MongoDB collection○ Each stage passes it’s result to the next○ Aggregates the entire collection by default

    ■ Add a $match stage to reduce the aggregation data● Runs inside the MongoDB Server code

    ○ Much more efficient than .mapReduce() operations● Example stages:

    ○ $match - only aggregate documents that match this filter (same as .find())■ Must be 1st stage to use indexes!

    ○ $group - group documents by certain conditions■ Similar to “SELECT …. GROUP BY”

  • Aggregation Pipeline: .aggregate()

    ● Example stages:○ $count - count the # of documents○ $project - only output specific pieces of the data○ $bucket and $bucketAuto - Group documents based on specified expression

    and bucket boundaries■ Useful for Faceted Search

    ○ $geoNear - Returns documents based on geo-proximity○ $graphLookup - Performs a recursive search on a collection○ $sample - Returns a random sample of documents of a specified size○ $unwind - Unwinds arrays into many separate documents○ $facet - Runs many aggregation pipelines within a single stage

  • Aggregation Pipeline: .aggregate()

    ● Just a few examples of operators that can be used each stage:○ $and / $or /$not○ $add / $subtract / $multiply○ $gt / $gte / $lt / $lte / $ne○ $min / $max / $avg / $stdDevPop○ $log / $log10○ $sqrt○ $floor / $ceil○ $in (inefficient)○ $dayOfWeek / $dayOfMonth / $dayOfYear○ $concat / $split /…

  • Aggregation Pipeline: .aggregate()

    ● More on the Aggregation Pipeline:○ https://docs.mongodb.com/manual/reference/operator/

    aggregation-pipeline/○ https://docs.mongodb.com/manual/reference/operator/

    aggregation/○ https://www.amazon.com/MongoDB-Aggregation-Frame

    work-Principles-Examples-ebook/dp/B00DGKGWE4

    https://docs.mongodb.com/manual/reference/operator/aggregation-pipeline/https://docs.mongodb.com/manual/reference/operator/aggregation-pipeline/https://docs.mongodb.com/manual/reference/operator/aggregation/https://docs.mongodb.com/manual/reference/operator/aggregation/https://www.amazon.com/MongoDB-Aggregation-Framework-Principles-Examples-ebook/dp/B00DGKGWE4https://www.amazon.com/MongoDB-Aggregation-Framework-Principles-Examples-ebook/dp/B00DGKGWE4

  • Data Integrity

  • Storage and Journaling

    ● The Journal provides durability in the event of failure of the server

    ● Changes are written ahead to the journal for each write operation

    ● On crash recovery, the server○ Finds the last point of consistency to disk○ Searches the journal file(s) for the record matching the

    checkpoint○ Applies all changes in the journal since the last point of

    consistency

  • Write Concern

    ● MongoDB Replication is Asynchronous○ Write Concerns can simulate synchronous operations

    ● Write Concerns○ Per-session (or even per-query) tunable○ Allow control of data integrity of a write to a Replica Set○ Write Concern Modes

    ■ “w: ” - Writes much acknowledge to defined number of nodes■ “majority” - Writes much acknowledge on a majority of nodes■ “” - Writes acknowledge to a member with the specified replica set tags

    ○ Journal flag■ “j: ” - Sets requirement for change to write to journal

  • Write Concern

    ● Write Concerns○ Durability

    ■ By default write concerns are NOT durable■ “j: true” - Optionally, wait for node(s) to acknowledge journaling of operation■ In 3.4+ “writeConcernMajorityJournalDefault” allows enforcement of “j: true” via replica set

    configuration!● Must specify “j: false” or alter “writeConcernMajorityDefault” to disable

  • Read Concern

    ● Like write concerns, the consistency of reads can be tuned per session or operation

    ● Levels○ “local” - Default, return the current node’s

    most-recent version of the data○ “majority” - Most recent version of the data that has

    been ack’d on a majority of nodes. Not supported on MMAPv1.

    ○ “linearizable” (3.4+) - Reads return data that reflects a “majority” read of all changes prior to the read

  • Replication

    ● Size of Oplog○ Monitor this closely!○ The length of time from start to end of the oplog affects the impact of adding new nodes○ If a node is brought online with a backup within the window it avoids a full sync○ If a node is brought online with a backup older than the window it will full sync!!!

    ● Lag○ Due to async lag is possible○ A use of Read Concerns and/or Write Concerns can work around the logical

    impact of this!

  • Monitoring & Troubleshooting

  • Usual Suspects

    ● Locking○ Collection-level locks○ Document-level locks○ Software mutex/semaphore

    ● Limits○ Max connections○ Operation rate limits○ Resource limits

    ● Resources○ Lack of IOPS, RAM, CPU, network, etc

  • db.currentOp()

    ● A function that dumps status info about running operations and various lock/execution details

    ● Only queries currently in progress are shown● Provides Query ID number, used for killing ops● Includes

    ○ Original Query○ Parsed Query○ Query Runtime○ Locking details

  • db.currentOp()

    ● Filter Documents○ { "$ownOps": true } == Only show

    operations for the current user○ https://docs.mongodb.com/manual/refere

    nce/method/db.currentOp/#examples

    https://docs.mongodb.com/manual/reference/method/db.currentOp/#exampleshttps://docs.mongodb.com/manual/reference/method/db.currentOp/#examples

  • db.currentOp()

  • Operation Profiler

    ● Writes slow database operations to a new MongoDB collection for analysis○ Capped Collection “system.profile” in each database, default 1mb○ The collection is capped, ie: profile data doesn’t last forever○ The slow threshold can be defined in milliseconds

    ■ 50-100ms is a good starting point● Enabled Server-wide or Per-Database● This is used by Percona Monitoring and Management

  • Operation Profiler

    ● Useful Profile Metrics○ op/ns/query: type, namespace and query of a

    profile○ keysExamined: # of index keys examined○ docsExamined: # of docs examined to achieve

    result○ writeConflicts: # of Write Concern Exceptions

    encountered during update○ numYields: # of times operation yielded for others○ locks: detailed lock statistics

  • Troubleshooting: .explain()

    ● Shows query explain plan for query cursors, before execution

    ● This will include○ Winning Plan

    ■ Query stages● Query stages may include sharding info

    in clusters■ Index chosen by optimiser

    ○ Rejected Plans■ Many rejected plans can be a sign of too

    many indexes

  • Troubleshooting: .explain() and Profiler

  • Log File - Slow Query2017-09-19T20:58:03.896+0200 I COMMAND [conn175] command config.locks appName: "MongoDB Shell" command: findAndModify { findAndModify: "locks", query: { ts: ObjectId('59c168239586572394ae37ba') }, update: { $set: { state: 0 } }, writeConcern: { w: "majority", wtimeout: 15000 }, maxTimeMS: 30000 } planSummary: IXSCAN { ts: 1 } update: { $set: { state: 0 } }

    keysExamined:1docsExamined:1nMatched:1nModified:1keysInserted:1keysDeleted:1numYields:0reslen:604locks:{ Global: { acquireCount: { r: 2, w: 2 } }, Database: { acquireCount: { w: 2 } }, Collection: { acquireCount: { w: 1 } }, Metadata: { acquireCount: { w: 1 } }, oplog: { acquireCount: { w: 1 } } }protocol:op_command106ms

  • Percona PMM

    ● Open-source monitoring from Percona!

    ● Based on open-source technology○ Prometheus○ Grafana○ Go Language

  • Percona PMM

    ● Simple deployment● Examples in this demo

    arefrom PMM

    ● Correlation of OS and DBMetrics

    ● 800+ OS and Database metrics per ping

  • Percona PMM

  • Using PMM in your Dev Process

    1. Install PMMa. Simple Docker-based deployment for

    easy install2. Install PMM Clients/Agents on your

    MongoDB Host(s)3. Develop your app4. Visualise your database resource

    usage, queries, etc5. Repeat

  • Percona PMM QAN

    ● Allows DBAs and developers to:○ Analyze queries over periods of time○ Find performance problems○ Access database performance data securely

    ● Agent collected from MongoDB Profiler (required) from agent

    ● Query Normalization○ ie:“{ item: 123456 }” -> “{ item: ##### }”.○ Good for reduced data exposure

    ● CLI alternative: pt-mongodb-query-digest tool

  • Percona PMM QAN

  • Other tools

    ● mlogfilter○ A useful tool for processing mongod.log files

    ● pt-mongodb-summary○ Great for a high-level view of a MongoDB environment

    ● pt-mongodb-query-digest○ A command-line tool similar to PMM QAN (although much simpler)

  • Scaling

  • Read Preference

    ● Allows reads to be sent to specific nodes● Read Preference modes

    ○ primary (default)○ primaryPreferred○ secondary○ secondaryPreferred (recommended for Read Scaling!)○ nearest

    ● Tags○ Select nodes based on key/value pairs (one or more)○ Often used for

    ■ Datacenter awareness, eg: { “dc”: “eu-east” }

  • What is Sharding?

    ● Sharding is a MongoDB deployment style that allows linear scaling of data in MongoDB○ Extra system components

    ■ A cluster-metadata replica set and special routers are added○ Many shards can be added and removed online○ Internal balancer migrates data to create a “balanced” state○ Relies on a “shard key” (a field in documents) to partition the data

    ■ The choice of shard key is a critical decision■ Today shard keys cannot be changed post-deployment!

    ● If you expect your system to scale beyond a single replica set, see:https://www.percona.com/blog/2015/03/19/choosing-a-good-sharding-key-in-mongodb-and-mysql/

  • Elastic Deployment

  • MongoDB: An Elastic Database

    ● Scale Reads?○ Add more nodes to your replica-set (or shard)

    ■ More replica set members increases read capacity when using secondary reads■ Note: replica set members may have some replication lag

    ● Use a read concern if lag is a concern○ Add a caching tier (Redis/Memcached/In-application)

    ● Scale Writes?○ Add more shards to your (hopefully) sharded cluster

    ■ Increases write AND read capacity, as well as storage space○ Use MongoDB Bulk Writes○ Use a queue for batching

  • MongoDB DNS SRV records

    ● New in MongoDB 3.6● DNS SRV-record Connect Strings support

    ○ Allows apps to use a consistent DNS name to get a full list of MongoDB hosts○ Avoids the need for application configuration changes and reload logic○ Without DNS SRV:

    mongodb://db1.example.net:27017,db2.example.net:25001,db3.example.net:25003/

    ○ With DNS SRV:mongodb+srv://db1.example.net/

  • Speaker Name

    Questions?