Deep Dive into Global Secondary Indexing Architecture in Couchbase Server 4.0: Couchbase Connect...
-
Upload
couchbase -
Category
Technology
-
view
122 -
download
2
Transcript of Deep Dive into Global Secondary Indexing Architecture in Couchbase Server 4.0: Couchbase Connect...
GLOBAL SECONDARY INDEX (GSI) DEEP DIVE
John Liang, Couchbase
©2015 Couchbase Inc. 2©2015 Couchbase Inc. 2
Disclaimer
Couchbase Server 4.0 and ForestDB are still in development and the final version of the products may not be identical in details discussed on this session.
©2015 Couchbase Inc. 3©2015 Couchbase Inc. 3
Agenda
IntroductionPartitioned Index Index Build and Maintenance Index ScanAvailability Data Rebalance and Failover
©2014 Couchbase Inc.
Couchbase Server Cluster Architecture
4
STORAGE
Couchbase Server 1
SHARD7
SHARD9
SHARD5
SHARDSHARDSHARD
Managed Cache
Cluster ManagerCluster Manager
Managed CacheStorage
Data Service
Index Service
Query Service
STORAGE
Couchbase Server 2
Managed Cache
Cluster ManagerCluster Manager
Data Service
Index Service
Query Service
STORAGE
Couchbase Server 3
SHARD7
SHARD9
SHARD5
SHARDSHARDSHARD
Managed Cache
Cluster ManagerCluster Manager
Data Service
Index Service
Query Service
STORAGE
Couchbase Server 4
SHARD7
SHARD9
SHARD5
SHARDSHARDSHARD
Managed Cache
Cluster ManagerCluster Manager
Data Service
Index Service
Query Service
STORAGE
Couchbase Server 5
SHARD7
SHARD9
SHARD5
SHARDSHARDSHARD
Managed Cache
Cluster ManagerCluster Manager
Data Service
Index Service
Query Service
STORAGE
Couchbase Server 6
SHARD7
SHARD9
SHARD5
SHARDSHARDSHARD
Managed Cache
Cluster ManagerCluster Manager
Data Service
Index Service
Query Service
Managed CacheStorage
Managed CacheStorage
Managed CacheStorage
Managed CacheStorage
Managed CacheStorage
©2015 Couchbase Inc. 5©2015 Couchbase Inc. 5
Couchbase Cluster Service Deployment
STORAGE
Couchbase Server 1
SHARD7
SHARD9
SHARD5
SHARDSHARDSHARD
Managed Cache
Cluster ManagerCluster Manager
Managed CacheStorage
Data Servi
ceSTORAGE
Couchbase Server 2
Managed Cache
Cluster ManagerCluster Manager
Data Servi
ceSTORAGE
Couchbase Server 3
SHARD7
SHARD9
SHARD5
SHARDSHARDSHARD
Managed Cache
Cluster ManagerCluster Manager
Data Servi
ceSTORAGE
Couchbase Server 4
SHARD7
SHARD9
SHARD5
SHARDSHARDSHARD
Managed Cache
Cluster ManagerCluster Manager
Query
Service
STORAGE
Couchbase Server 5
SHARD7
SHARD9
SHARD5
SHARDSHARDSHARD
Managed Cache
Cluster ManagerCluster Manager
Index Servi
ce
Managed CacheStorage
Managed CacheStorage Storage
STORAGE
Couchbase Server 6
SHARD7
SHARD9
SHARD5
SHARDSHARDSHARD
Managed Cache
Cluster ManagerCluster Manager
Index Servi
ce
Storage
Managed Cache
Managed Cache
©2015 Couchbase Inc. 6©2015 Couchbase Inc. 6
Why GSI - Range Partitioning
Metadata locate index node holding specific index value
No Scatter/Gather across large cluster
Fast Scan Response High Scan
Throughput
Index Service
(LastName)A-I
Index Service
(LastName)J-R
Index Service
(LastName)S-Z
Select * from customer where
LastName=“Adams”
Select * from customer where
LastName=“Smith”
Query Service
Index Metadata
©2015 Couchbase Inc. 7©2015 Couchbase Inc. 7
Why GSI - Workload Isolation
Data Service
Data Service
Data Service
Index Service
Index Service
Get/Set Ops
Query Service
Transactional Workload (hash partitioned)
Query Workload (range partitioned)
DCP
©2015 Couchbase Inc. 8©2015 Couchbase Inc. 8
Why GSI - Independent Scaling
Add Indexing capacity as number of indexes increase
Scale-up and/or Scale-out Options
Query Workload
Data Workload
node1 node 9
Couchbase Cluster
Index Workload
©2015 Couchbase Inc. 9©2015 Couchbase Inc. 9
Why GSI - Read Availability
Index Service
Index Email1
Query Service
Connection Pool
Metadata Cache
(Email1 == Email2)
Index Service
Index Email2
Connection
Pool
Create index Email1 on Customer(Email) using gsi;Create index Email2 on Customer(Email) using gsi;
©2015 Couchbase Inc. 10©2015 Couchbase Inc. 10
ForestDB
Why GSI – Fast Storage
10
Throughput
UP TO
FASTER6X
Future Proof
FITTED
FORSSD
Efficiency
UP TO
5X
COMPACT
©2015 Couchbase Inc. 11©2015 Couchbase Inc. 11
Index Fields
JSON Field String Boolean Numeric Nil Array Sub-document
©2015 Couchbase Inc. 12©2015 Couchbase Inc. 12
Sort Order
Nil False True Number String (UTF-8) Array Sub-document
©2015 Couchbase Inc. 13©2015 Couchbase Inc. 13
Index DDL
create index Email on Customer(Email) using gsi;
Create index on ‘Email’ field for documents in ‘Customer’ bucket
Statement returns after index is built
©2015 Couchbase Inc. 14©2015 Couchbase Inc. 14
Index DDL
create index Email on Customers(Email) with {“defer_build”=true} using gsi;
build index on Customers(Email, Phone) using gsi;
Use defer_build option to build index in background Can build multiple indexes simultaneously
©2015 Couchbase Inc. 15©2015 Couchbase Inc. 15
Index DDL
drop index Customers.Email using gsi;
Create index Composite on Customer(State,Zip) using gsi;
©2015 Couchbase Inc. 16©2015 Couchbase Inc. 16
Partitioned Index
Partitioned Index: Index on a sub-set of JSON documents
Single Bucket
Index
Partition 1
Partition 2
Partition 3
create index IndexName on BucketName(FieldName) where FieldName=“Value” using gsi;
©2015 Couchbase Inc. 17©2015 Couchbase Inc. 17
Partitioned Index – Document Type
Partitioned by document type (a field in document)
Similar to index for a table
Order
Customer
Shipment
Bucket ‘default’
Index Email
create index Email on default(Email) where type=“customer” using gsi;
{“type” : “order”}
{“type” : “customer”}
{“type” : “shipment”}
©2015 Couchbase Inc. 18©2015 Couchbase Inc. 18
Partitioned Index – Range
Email=‘A-I’
Bucket ‘default’
Index Email1
{“Email” : “[email protected]”}
{“Email” : “[email protected]”}
{“Email” : “[email protected]”}
create index Email1 on default(Email) where Email > ‘A’ and Email <‘I’ using gsi;
Email=‘J-R’
Email=‘S-Z’
create index Email2 on default(Email) where Email > ‘J’ and Email <‘R’ using gsi;
create index Email3 on default(Email) where Email > ‘S’ and Email <‘Z’ using gsi;
Index Email2
Index Email3
©2015 Couchbase Inc. 19©2015 Couchbase Inc. 19
Partitioned Index - Benefits
Faster Index Build Time Lower Maintenance
Latency Faster Scan Latency Higher Scan Throughput
Single Bucket
Index
Partition 1
Partition 2
Partition 3
create index IndexName on BucketName(FieldName) where FieldName=“Value” using gsi;
©2015 Couchbase Inc. 20©2015 Couchbase Inc. 20
Index Deployment - Default
Default : on least populated node
create index OrderId on default(OrderId) where type=“order” using gsi;
Order
Customer
Shipment
Bucket ‘default’
Index
Index Service (Node 198.33.25.17)
Index Service (Node 198.33.25.89)
create index ShipmentId on default(ShipmentId) where type=“shipment” using gsi;
OrderId
ShipmentId
©2015 Couchbase Inc. 21©2015 Couchbase Inc. 21
Index Deployment - Explicit
Explicit : Specify node to deploy
create index CustomerId on default(CustomerId) where type=“order” with {“nodes”:”198.33.25.17”} using gsi;
Order
Customer
Shipment
Bucket ‘default’
Index
Index Service (Node 198.33.25.17)
Index Service (Node 198.33.25.89)
OrderId
ShipmentId
CustomerId
©2015 Couchbase Inc. 22©2015 Couchbase Inc. 22
Index Build
Order
Customer
Shipment
Index Service (node 1)
Index Service (node 2)
Data Service (node 3)
build index on bucket(LastName, Phone, Email) using gsi;
LastName
Phone
create index LastName on default(LastName) where type=“customer” with {“defer_build”: true} using gsi;
create index Phone on default(Phone) where type=“customer” with {“defer_build”: true} using gsi;
create index Email on default(Email) where type=“customer” with {“defer_build”: true} using gsi;
©2015 Couchbase Inc. 23©2015 Couchbase Inc. 23
Index Build – DCP
Order
Customer
Shipment
Index Service (node 1)
Index Service (node 2)
DCP
IndexPort
DCP
Index Port
Data Service (node 3)
Index Stream
Index Stream
A dedicated DCP stream for each index service
Stream is shared among indexes within the same service
DCP streams are concurrent across index services
build index on default(LastName, Phone, Email) using gsi;LastName
Phone
©2015 Couchbase Inc. 24©2015 Couchbase Inc. 24
Index Build – Projector
Projector resides in data service listening to DCP streams Extract secondary keys
from documents based on index definition
Route secondary keys to appropriate index service based on index deployment
Reduce Network Traffic Projector Payload = only
keys relevant for a index service
Filter out irrelevant mutation
Payload size can be much smaller than document size
Data Service (node 3)
Index Service (node 1)
Index Service (node 2)
DCP
Index Port
DCP Index
Port
Projector
{“LastName” : “Adams”,
“Phone” : “323-180-9978”}
{“Email” : “[email protected]”}
LastName
Phone
{“LastName” : “Adams”,
“Phone” : “323-180-9978”,
“Email”:“[email protected]”}
©2015 Couchbase Inc. 25©2015 Couchbase Inc. 25
Index Build – Parallel Indexer Pipeline
Index Service
Index Port
Mutation QueueExtractionWorker
Index Queue
PersistenceWorker
ForestDB
Update index ONLY IF key has changed
{“LastName” : “Adams”,
“Phone” : “323-180-9978”}
{“LastName” : “Adams”}
{“Phone” : “323-180-9978”}
©2015 Couchbase Inc. 26©2015 Couchbase Inc. 26
Index Maintenance
Updating index with on-going mutations
Parallel DCP streams Concurrent index build
and index maintenance
One shared maintenance stream for all active indexes in
each service
Order
Customer
Shipment
Data Service
Index Service
DCP
Index Build Port
DCP
Index Maintenance Port
Projector
©2015 Couchbase Inc. 27©2015 Couchbase Inc. 27
Index Snapshot
Persisted Snapshot For Rollback during
Recovery (checkpointing) Tunable snapshot interval
In-memory Snapshot For scanning Query stability and
consistency Tunable snapshot interval
©2015 Couchbase Inc. 28©2015 Couchbase Inc. 28
Index Metadata
Index Service
Query Service
IndexClient Metadata Cache
(Email, Phone)
Index Service
Metadata Manager
Metadata Manager
Repository Repository
Metadata update is pushed from Index Service to all index clients (across all query service nodes)
Metadata manager handles index DDL request from any index client
Index Phone Index Email
©2015 Couchbase Inc. 29©2015 Couchbase Inc. 29
Index Scan
Index Service
Snapshot at T2
Snapshot at T1
Query Service
Snapshot at T2
Snapshot at T1
Index Email
Index Phone
Scan Port
Connection Pool
IndexClientMetadata
Cache
Query Service
Connection Pool
Index Client Metadata
Cache
Select * from default where Email = “[email protected]”;
• Use Metadata to locate index service for scanning
• A scan request is made of:• High value for
scan• Low value for
scan• Consistency
option
©2015 Couchbase Inc. 30©2015 Couchbase Inc. 30
Index Scan
Index Service
Snapshot at T2
Snapshot at T1
Query Service
Snapshot at T2
Snapshot at T1
Index Email
Index Phone
Scan Port
Connection Pool
Index ClientMetadata
Cache
Query Service
Connection Pool
Index Client Metadata
Cache
Select * from default where Email = “[email protected]”;
Pick snapshot based on consistency option:• Unbounded : Pick
the latest snapshot in indexer
• Request_plus : Observe current or future snapshot that matches query request
Metadata Cache
©2015 Couchbase Inc. 31©2015 Couchbase Inc. 31
Index Scan
Index Service
Snapshot at T2
Snapshot at T1
Query Service
Snapshot at T2
Snapshot at T1
Index Email
Index Phone
Scan Port
Connection Pool
Index ClientMetadata
Cache
Query Service
Connection Pool
Index Client Metadata
Cache
Select * from default where Email = “[email protected]”;
• Open an iterator on the chosen snapshot with given range (min/max values).
• Stream result back to Index Client
• Results are placed in response channel
Metadata Cache
Metadata Cache
Response Channel
©2015 Couchbase Inc. 32©2015 Couchbase Inc. 32
Availability and Load Balancing
Choose among equivalent indexes to serve scan request
For loading balancing and read availability
Index Service
Snapshot at T1
Snapshot at T2
Index Email1
Scan Port
Query Service
Connection Pool
Index Client
Metadata Cache
Email1 == Email2
Index Service
Snapshot at T3
Snapshot at T4
Index Email2
Scan Port
Connection Pool
Create index Email1 on default(Email) using gsi;Create index Email2 on default(Email) using gsi;
©2015 Couchbase Inc. 33©2015 Couchbase Inc. 33
Data Rebalance
OrderCustomer
Node 1
DCP
Projector
Index Service
Index Port
OrderCustomer
Node 2
DCP
Projector
VB 0-511
VB 512-1023
VB 0-511
VB 512-1023
©2015 Couchbase Inc. 34©2015 Couchbase Inc. 34
Data Rebalance
Rebalance Vbucket 512-2013 from node 2 to node 1
2 Projectors continue to send keys to index service
OrderCustomer
Node 1
DCP
Projector
Index Service
Index Port
OrderCustomer
Node 2
DCP
Projector
VB 0-511
VB 512-1023
VB 0-511
VB 512-1023
©2015 Couchbase Inc. 35©2015 Couchbase Inc. 35
Data Rebalance
Rebalance complete. Node 1 has Vbuckets 0-1023
Node 1 projector continues to send mutations for 0-511
Node 2 projector sends control messages for end of stream
OrderCustomer
Node 1
DCP
Projector
Index Service
Index Port
Node 2
DCP
Projector
VB 0-1023
VB 0-511
©2015 Couchbase Inc. 36©2015 Couchbase Inc. 36
Data Rebalance
Index Service receives end-of-stream for VB 512-1023
Index Service finds out new VB master from cluster manager for VB 512-1023
Index Service request Node 1 projector to include keys for VB 512-1023
OrderCustomer
Node 1
DCP
Projector
Index Service
Index Port
VB 0-1023
VB 0-1023
Cluster ManagerCluster Map { Bucket : Customers,VB 0-1023 : Node 1}
©2015 Couchbase Inc. 37©2015 Couchbase Inc. 37
Data Failover
Node 1 is the replica for VB 512-1023
Node 2 fails Index Service
receives connection error for Node 2
OrderCustomer
Node 1
DCP
Projector
Index Service
Index Port
Node 2
DCP
Projector
VB 0-511 (master)VB 512-1023 (replica)
VB 0-511
©2015 Couchbase Inc. 38©2015 Couchbase Inc. 38
Data Failover
Index Service finds out new VB master from cluster manager for VB 512-1023
Index Service request Node 1 projector to send failover log
OrderCustomer
Node 1
DCP
Projector
Index Service
Index Port
VB 0-1023
Failover log
Cluster ManagerCluster Map { Bucket : Customers,VB 0-1023 : Node 1}
©2015 Couchbase Inc. 39©2015 Couchbase Inc. 39
Data Failover
From failover log, determine valid timestamp for rollback Valid timestamp = last valid sequence number of 1024 vbuckets Pick most recent persisted snapshot that matches rollback timestamp Rollback to snapshot Re-stream keys for vbuckets, starting from rollback snapshot
OrderCustomer
Node 1
DCP
Projector
Index Service
Index Port
VB 0-1023
VB 0-1023
Snapshots
Snapshots
T1
T1 T2 T3
T2 T3
©2015 Couchbase Inc. 40©2015 Couchbase Inc. 40
Network Partition
Network partition is handled similarly as data failover, except Snapshot rollback
may not be required Re-stream
mutations from starting from when partition happens
OrderCustomer
Node 1
DCP
Projector
Index Service
Index Port
OrderCustomer
Node 2
DCP
Projector
VB 0-511
VB 512-1023
VB 0-511
VB 512-1023
Get Started Today Couchbase Server 4.0 & N1QL
Couchbase.com/beta
Thank you.