Couchbase Live Europe 2015: Viber: NoSQL Performance at Scale
How-To NoSQL 3.0 Webinar Series: Couchbase 104 - Views and Indexing
description
Transcript of How-To NoSQL 3.0 Webinar Series: Couchbase 104 - Views and Indexing
Couchbase 104Justin Michaels
[email protected] | @justindmichaels
Views and Indexes Overview
Indexes are “views” into Data
• shortcut derived from and pointing into, a greater volume of values, data,
information or knowledge
Traditional Index Examples
• Table of Contents
• Card Catalog
Indexes and Views
©2014 Couchbase, Inc. 3
In Couchbase Map-Reduce is used to maintain Indexes
Map functions are applied to JSON documents and they output or "emit" data that is organized in an Index form
Each emit() call produces a row in the index
Couchbase Views - Map-Reduce Indexes
©2014 Couchbase, Inc. 4
Map-Reduce is a technique designed for dealing with semi-structured data by parallel processing across a distributed system
Different than Hadoop Map/Reduce
• Map functions identify data with collections, process them, and output transformed values
• Reduce functions take the output of Map functions and perform numeric aggregate calculations on them
What is Map Reduce?
©2014 Couchbase, Inc. 5
Map inputs:
• Document – Application data
• Metadata – Couchbase data
Map outputs:
• Document ID
• View Key: User configurable based on JSON fields
• View Value: Only needed when reducing, use ‘null’ otherwise
Produces Index:
• B-tree Structure
• Sorted Alphabetically
Map Functions
©2014 Couchbase, Inc.
Built-in reduce functions (Optional)
• _count – provides a count of unique keys
• _sum – provides a sum total of values
• _stats – provides statistics (max, min, avg, etc.) of values
Operate on results emitted by map function
Results stored pre-computed for fast access
Custom reductions are possible
Reduce Functions
©2014 Couchbase, Inc.
Architecture
33 2
Architecture - Couchbase View Engine
2
Managed Cache
Dis
k Q
ueu
e
Disk
Replication Queue
App Server
Couchbase Server Node
Doc 1
Doc 1
To other node
View engine Doc 1Doc 1
©2014 Couchbase, Inc.9
COUCHBASE SERVER CLUSTER
User Configured Replica Count = 1
ACTIVE
Doc 5
Doc 2
Doc
Doc
Doc
SERVER 1
REPLICA
Doc 4
Doc 1
Doc 8
Doc
Doc
Doc
APP SERVER 1
COUCHBASE Client Library
CLUSTER MAP
COUCHBASE Client Library
CLUSTER MAP
APP SERVER 2
Doc 9
• Indexing is distributed across nodes
• Parallelize the effort
• Each node has index for data stored on it
• Queries combine the results from required nodes
ACTIVE
Doc 5
Doc 2
Doc
Doc
Doc
SERVER 2
REPLICA
Doc 4
Doc 1
Doc 8
Doc
Doc
Doc
Doc 9
ACTIVE
Doc 5
Doc 2
Doc
Doc
Doc
SERVER 3
REPLICA
Doc 4
Doc 1
Doc 8
Doc
Doc
Doc
Doc 9
Query
Architecture - Couchbase View Engine
Buckets have one or more DESIGN DOCUMENTS
• Distributed across cluster when created
DESIGN DOCUMENTS contain one or more VIEW definitions
• Design Documents are processed in parallel
• All the views in a single design document are processed sequentially
Architecture – Design Document
BUCKET A
Design document 1View 1
View 2
View 3
Design document 2View 4
View 5
Design document 3 View 6
View 7BUCKET B©2014 Couchbase, Inc.
Architecture – Couchbase Map Reduce
©2014 Couchbase, Inc. 12
Patch
Management
Many others..
Individual document operations are atomic
Views are eventually consistent in relation to documents
Incremental Map-Reduce
• Spread load across nodes
• Each node indexes it’s data
Map Reduce
Process, filter, map
and emit a row
Aggregate mapped
data
Default:
_count
_sum
_stats
Architecture - Index Building Details
©2014 Couchbase, Inc. 13
Views are maintained directly from managed cache
• The entire view is recreated if the view definition has changed
• All the views within a design document are incrementally updated
Views are updated automatically according to:
• Update Interval (time period); default 5000 millisecondsOR (as of 3.x)
• Update Documents (number of changes); default 5000 changes
Update Controlled by:
• Configured Globally via REST for Individual Design Document
• Manual updates provide application control
stale = UPDATE_AFTER (default if nothing is specified)
• fast response
• can take two operations to read your own writes
stale = OK (most likely to be used)
• auto update only
• might not see your own writes
• least frequent updates -> least resource impact -> highest performance
stale = FALSE (only when TRULY required)
• use with persistTo during set if data needs to force view update
• BUT aware of delay it adds on set and query operation
Architecture - Index Building Details
©2014 Couchbase, Inc.
In addition to data replicas, optionally create replica for indexes
• Build an index using the data in replica vBuckets
Enabled per bucket (Bucket Config) or per design document (REST API)
• Each node must maintain index for active and replica data
• Implies additional CPU and I/O overhead
Failover and Failures
• Without replica indexes complete view is rebuilt
• Replica indexes enabled if present and queries remain consistent
Architecture - Index Building Details (Replicas)
©2014 Couchbase, Inc.
Architecture - Disk Structure
Each design document creates it’s own set of index files
Index data is always read from disk
• File format allows for successful I/O caching by operating system
Separate disk devices for view versus data files
• Both are append-only
• Both are compacted in parallel
• Better use of IO and caching
• Possible to use SSD’s for improved performance on one or other (or both)
©2014 Couchbase, Inc.
Development vs Production Views
Development Views
• Can be edited
• Can be test on full/partial dataset
• Not automatically maintained
Production Views
• Always operate on full document set
• Cannot be modified
• Automatically updated
Development Views are ‘published’ to Production
Simple creation of the view definition NOT a move to new cluster
Execute Development View on Entire Cluster
Development View
Create
Edit/Refine
Sample Index
Subset
Production View
Full Index
Promote to ProductionFull Data
Full DataBucket Content
©2014 Couchbase, Inc.
Writing Views
Map() Function => Index
function(doc, meta) {emit(doc.username, doc.email)
} indexed key output value(s)create row
json doc doc metadata
Every Document passes through View Map() functions
Map
View Anatomy
©2014 Couchbase, Inc.
Single Element Keys (Text Key)
function(doc, meta) {emit(doc.email, doc.points)
}text key
Map
meta.id doc.email doc.points
u::1 [email protected] 1000
u::35 [email protected] 1200
u::20 [email protected] 900
View Anatomy
©2014 Couchbase, Inc.
Compound Keys (Array)
function(doc, meta) {emit(dateToArray(doc.timestamp), 1)
} array key
Array Based Index Keys get sorted as Strings,
but can be grouped by array elements
Map
meta.id dateToArray(doc.timestamp) value
u::20 [2012,10,9,18,45] 1
u::1 [2012,9,26,11,15] 1
u::35 [2012,8,13,2,12] 1
View Anatomy
key = “” (exact match)
keys = [ ] (set of keys match)
startkey/endkey = “” (range queries on view key)
startkey_docID/endkey_docID = “” (range queries on meta.id)
stale (false, update_after, ok)
group/group_by (aggregate with grouping)
View Anatomy - Parameters
©2014 Couchbase, Inc.
View Anatomy - Collation
©2014 Couchbase, Inc.
23
1234567890 < aAbBcCdDeEfFgGhHiIjJkKlLmM...
Unicode Collation
a < á < A < Á < b
1234567890 < a-z < A-Z
Byte Order
View Anatomy - Sample Document
Document ID
©2014 Couchbase, Inc.
View Anatomy - Sample Index
ValueKey
©2014 Couchbase, Inc.
View Anatomy - Examples
©2014 Couchbase, Inc. 26
Patch
Management
Many others..
View Anatomy - Querying
©2014 Couchbase, Inc. 27
Patch
Management
• Simple View Access
• Exact Match
• Range
• With Reduction
• With Grouping
Best Practices
View size is determined by key and value contents
• Emit as little as possible … not full document
• Only use values when required by a reduce function
• Only emit either null or the secondary key (doc ID included with each row)
View distribution:
• More views per designdoc require more time to update all views in group
• Single views per designdoc may require more CPU
• Group views in designdocs by update frequency, rather than subject/topic
View Best Practices
©2014 Couchbase, Inc.
Queries should have consistent response times
• Indexes are pre-materialized
• Expect to use “stale.ok”
File system cache availability for the index has a big impact on performance
• Indexes are disk based
• Reduce cluster quota to give more system cache
In house performance results show that by doubling system cache availability
• query latency reduces by half
• throughput increases by 50%
View Best Practices
©2014 Couchbase, Inc.
View Best Practices
31
Patch
Management
Many others..
Avoid computing too many things in a single View
Select (filter) data to avoid unnecessary entries in the View
• Use document types to make Views more selective
Project (map) only necessary data and emit it as value
• When possible emit a null value and perform additional Get to retrieve the whole document
Use the built in reduce functions if possible
©2014 Couchbase, Inc.
Couchbase Query Language
32
Querying with N1QL (“Nickel”)
33
Person
JSON can model our
Complex World
N1QL Can Query
that World
N1QL Developer Preview and Tutorial
http://docs.couchbase.com/developer/n1ql-dp3/n1ql-intro.html
http://query.pub.couchbase.com/tutorial/#1©2014 Couchbase, Inc.
Thank You!
Next Session:
Couchbase 105 | December 3, 2014 | 10am Pacific
Cross Data Center Replication (aka XDCR)
34
Justin Michaels
[email protected] | @justindmichaels