HyperDex: A Consistent, Fault-tolerant, Searchable, Transactional NoSQL Store - VNGRS Tech Talks

96
HyperDex: A Consistent, Fault-tolerant, Searchable, Transactional NoSQL Store Emin Gün Sirer Cornell University / HyperDex, Inc. VNGRS/YTU 6 January, 2014 Emin Gün Sirer Cornell University / HyperDex, Inc. http://hyperdex.org/ 1 / 51

Transcript of HyperDex: A Consistent, Fault-tolerant, Searchable, Transactional NoSQL Store - VNGRS Tech Talks

Page 1: HyperDex: A Consistent, Fault-tolerant, Searchable, Transactional NoSQL Store - VNGRS Tech Talks

HyperDex: A Consistent,Fault-tolerant, Searchable,Transactional NoSQL Store

Emin Gün SirerCornell University / HyperDex, Inc.

VNGRS/YTU6 January, 2014

Emin Gün Sirer Cornell University / HyperDex, Inc. http://hyperdex.org/ 1 / 51

Page 2: HyperDex: A Consistent, Fault-tolerant, Searchable, Transactional NoSQL Store - VNGRS Tech Talks

From RDBMS to NoSQL

I RDBMS have difficulty with scalability andperformance

I ... also, they have been expensive to own and operateI NoSQL systems emerged to fill the gap

Emin Gün Sirer Cornell University / HyperDex, Inc. http://hyperdex.org/ 2 / 51

Page 3: HyperDex: A Consistent, Fault-tolerant, Searchable, Transactional NoSQL Store - VNGRS Tech Talks

Problems Typical of NoSQL

Lack of ...

I SearchI ConsistencyI Fault-Tolerance

Specifics vary between systems

Emin Gün Sirer Cornell University / HyperDex, Inc. http://hyperdex.org/ 3 / 51

Page 4: HyperDex: A Consistent, Fault-tolerant, Searchable, Transactional NoSQL Store - VNGRS Tech Talks

Typical NoSQL Architecture

K

Consistent hashing maps each key to a server

Emin Gün Sirer Cornell University / HyperDex, Inc. http://hyperdex.org/ 4 / 51

Page 5: HyperDex: A Consistent, Fault-tolerant, Searchable, Transactional NoSQL Store - VNGRS Tech Talks

The Search Problem

Searching for objects without the key involves manyserversEmin Gün Sirer Cornell University / HyperDex, Inc. http://hyperdex.org/ 4 / 51

Page 6: HyperDex: A Consistent, Fault-tolerant, Searchable, Transactional NoSQL Store - VNGRS Tech Talks

The Consistency Problem

Clients may read inconsistent data and writes may be lost

Emin Gün Sirer Cornell University / HyperDex, Inc. http://hyperdex.org/ 5 / 51

Page 7: HyperDex: A Consistent, Fault-tolerant, Searchable, Transactional NoSQL Store - VNGRS Tech Talks

The Fault-Tolerance Problem

Many systems’ default settings consider a write completeafter writing to just one nodeEmin Gün Sirer Cornell University / HyperDex, Inc. http://hyperdex.org/ 6 / 51

Page 8: HyperDex: A Consistent, Fault-tolerant, Searchable, Transactional NoSQL Store - VNGRS Tech Talks

HyperDex: An OverviewI Hyperspace hashingI Value-dependent chainingI ACID Transactions

⇓I High-Performance: High throughput with low varianceI Consistent: Every GET returns the latest PUTI Available: Tolerates a threshold of failuresI Partition-Tolerant: Operates in the presence of

partitionsI Scalable: Adding resources increases performanceI Rich API: Support for complex datastructures and

searchEmin Gün Sirer Cornell University / HyperDex, Inc. http://hyperdex.org/ 7 / 51

Page 9: HyperDex: A Consistent, Fault-tolerant, Searchable, Transactional NoSQL Store - VNGRS Tech Talks

Introduction

Design and ImplementationHyperspace HashingValue-Dependent ChainingLinear Transactions

Evaluation

Perspective

Additional

Conclusion

Emin Gün Sirer Cornell University / HyperDex, Inc. http://hyperdex.org/ 8 / 51

Page 10: HyperDex: A Consistent, Fault-tolerant, Searchable, Transactional NoSQL Store - VNGRS Tech Talks

Attributes map to dimensions in a multidimensionalhyperspace

First Name

Phone Number

Last Name

H(“Neil”)

H(“607-555-1024”)

H(“Armstrong”)

Neil ArmstrongLance ArmstrongNeil Diamond

Emin Gün Sirer Cornell University / HyperDex, Inc. http://hyperdex.org/ 9 / 51

Page 11: HyperDex: A Consistent, Fault-tolerant, Searchable, Transactional NoSQL Store - VNGRS Tech Talks

Attribute values are hashed independentlyAny hash function may be used

First Name

Phone Number

Last Name

H(“Neil”)

H(“607-555-1024”)

H(“Armstrong”)

Neil ArmstrongLance ArmstrongNeil Diamond

Emin Gün Sirer Cornell University / HyperDex, Inc. http://hyperdex.org/ 9 / 51

Page 12: HyperDex: A Consistent, Fault-tolerant, Searchable, Transactional NoSQL Store - VNGRS Tech Talks

Objects reside at the coordinate specified by the hashes

First Name

Phone Number

Last Name

H(“Neil”)

H(“607-555-1024”)

H(“Armstrong”)

Neil Armstrong

Lance ArmstrongNeil Diamond

Emin Gün Sirer Cornell University / HyperDex, Inc. http://hyperdex.org/ 9 / 51

Page 13: HyperDex: A Consistent, Fault-tolerant, Searchable, Transactional NoSQL Store - VNGRS Tech Talks

First Name

Phone Number

Last Name

H(“Neil”)

H(“607-555-1024”)

H(“Armstrong”)

Neil Armstrong

Lance ArmstrongNeil Diamond

Emin Gün Sirer Cornell University / HyperDex, Inc. http://hyperdex.org/ 9 / 51

Page 14: HyperDex: A Consistent, Fault-tolerant, Searchable, Transactional NoSQL Store - VNGRS Tech Talks

Different objects reside at different coordinates

First Name

Phone Number

Last Name

H(“Neil”)

H(“607-555-1024”)

H(“Armstrong”)

Neil ArmstrongLance ArmstrongNeil Diamond

Emin Gün Sirer Cornell University / HyperDex, Inc. http://hyperdex.org/ 9 / 51

Page 15: HyperDex: A Consistent, Fault-tolerant, Searchable, Transactional NoSQL Store - VNGRS Tech Talks

The hyperspace is divided into regions whereeach object resides in exactly one region

First Name

Phone Number

Last Name

H(“Neil”)

H(“607-555-1024”)

H(“Armstrong”)

Neil ArmstrongLance ArmstrongNeil Diamond

Emin Gün Sirer Cornell University / HyperDex, Inc. http://hyperdex.org/ 9 / 51

Page 16: HyperDex: A Consistent, Fault-tolerant, Searchable, Transactional NoSQL Store - VNGRS Tech Talks

Each server is responsible for a region of the hyperspace

First Name

Phone Number

Last Name

H(“Neil”)

H(“607-555-1024”)

H(“Armstrong”)

Neil ArmstrongLance ArmstrongNeil Diamond

Emin Gün Sirer Cornell University / HyperDex, Inc. http://hyperdex.org/ 9 / 51

Page 17: HyperDex: A Consistent, Fault-tolerant, Searchable, Transactional NoSQL Store - VNGRS Tech Talks

Each search intersects a subset of regions of thehyperspace

First Name

Phone Number

Last Name

H(“Neil”)

H(“607-555-1024”)

H(“Armstrong”)

Neil ArmstrongLance ArmstrongNeil Diamond

Emin Gün Sirer Cornell University / HyperDex, Inc. http://hyperdex.org/ 9 / 51

Page 18: HyperDex: A Consistent, Fault-tolerant, Searchable, Transactional NoSQL Store - VNGRS Tech Talks

All people named Neil are mapped to the yellow plane

First Name

Phone Number

Last Name

H(“Neil”)

H(“607-555-1024”)

H(“Armstrong”)

Neil ArmstrongLance ArmstrongNeil Diamond

Emin Gün Sirer Cornell University / HyperDex, Inc. http://hyperdex.org/ 9 / 51

Page 19: HyperDex: A Consistent, Fault-tolerant, Searchable, Transactional NoSQL Store - VNGRS Tech Talks

All people named Neil are mapped to the yellow plane

First Name

Phone Number

Last Name

H(“Neil”)

H(“607-555-1024”)

H(“Armstrong”)

Neil ArmstrongLance ArmstrongNeil Diamond

Emin Gün Sirer Cornell University / HyperDex, Inc. http://hyperdex.org/ 9 / 51

Page 20: HyperDex: A Consistent, Fault-tolerant, Searchable, Transactional NoSQL Store - VNGRS Tech Talks

All people named Armstrong are mapped to the grayplane

First Name

Phone Number

Last Name

H(“Neil”)

H(“607-555-1024”)

H(“Armstrong”)

Neil ArmstrongLance ArmstrongNeil Diamond

Emin Gün Sirer Cornell University / HyperDex, Inc. http://hyperdex.org/ 9 / 51

Page 21: HyperDex: A Consistent, Fault-tolerant, Searchable, Transactional NoSQL Store - VNGRS Tech Talks

All people named Armstrong are mapped to the grayplane

First Name

Phone Number

Last Name

H(“Neil”)

H(“607-555-1024”)

H(“Armstrong”)

Neil ArmstrongLance ArmstrongNeil Diamond

Emin Gün Sirer Cornell University / HyperDex, Inc. http://hyperdex.org/ 9 / 51

Page 22: HyperDex: A Consistent, Fault-tolerant, Searchable, Transactional NoSQL Store - VNGRS Tech Talks

A more restrictive search for Neil Armstrong contactsfewer servers

First Name

Phone Number

Last Name

H(“Neil”)

H(“607-555-1024”)

H(“Armstrong”)

Neil ArmstrongLance ArmstrongNeil Diamond

Emin Gün Sirer Cornell University / HyperDex, Inc. http://hyperdex.org/ 9 / 51

Page 23: HyperDex: A Consistent, Fault-tolerant, Searchable, Transactional NoSQL Store - VNGRS Tech Talks

Range searches are natively supported

First Name

Phone Number

Last Name

H(“Neil”)

H(“607-555-1024”)

H(“Armstrong”)

Neil ArmstrongLance ArmstrongNeil Diamond

Emin Gün Sirer Cornell University / HyperDex, Inc. http://hyperdex.org/ 9 / 51

Page 24: HyperDex: A Consistent, Fault-tolerant, Searchable, Transactional NoSQL Store - VNGRS Tech Talks

Space PartitioningI In a naive implementation, the hyperspace would

grow exponentially in the number of dimensionsI Space partitioning prevents exponential growth in the

number of searchable attributes

k a1 a2 a3 a4 a5 . . . aD-2aD-1 aD

k a1 a2 a3 a4 a5 . . . aD-2aD-1 aD

keysubspace

subspace 0 subspace 1 subspace S

I A search is performed in the most restrictivesubspace

Emin Gün Sirer Cornell University / HyperDex, Inc. http://hyperdex.org/ 10 / 51

Page 25: HyperDex: A Consistent, Fault-tolerant, Searchable, Transactional NoSQL Store - VNGRS Tech Talks

Space PartitioningI In a naive implementation, the hyperspace would

grow exponentially in the number of dimensionsI Space partitioning prevents exponential growth in the

number of searchable attributes

k a1 a2 a3 a4 a5 . . . aD-2aD-1 aD

k a1 a2 a3 a4 a5 . . . aD-2aD-1 aD

keysubspace

subspace 0 subspace 1 subspace S

I A search is performed in the most restrictivesubspace

Emin Gün Sirer Cornell University / HyperDex, Inc. http://hyperdex.org/ 10 / 51

Page 26: HyperDex: A Consistent, Fault-tolerant, Searchable, Transactional NoSQL Store - VNGRS Tech Talks

Hyperspace Hashing Implications

I searches are efficientI Hyperspace hashing is a mapping, not an index

I No per-object updates to a shared datastructureI No overhead for building and maintaining B-treesI Functionality gained solely through careful placement

Emin Gün Sirer Cornell University / HyperDex, Inc. http://hyperdex.org/ 11 / 51

Page 27: HyperDex: A Consistent, Fault-tolerant, Searchable, Transactional NoSQL Store - VNGRS Tech Talks

Introduction

Design and ImplementationHyperspace HashingValue-Dependent ChainingLinear Transactions

Evaluation

Perspective

Additional

Conclusion

Emin Gün Sirer Cornell University / HyperDex, Inc. http://hyperdex.org/ 12 / 51

Page 28: HyperDex: A Consistent, Fault-tolerant, Searchable, Transactional NoSQL Store - VNGRS Tech Talks

Replication

I As an object changes, so too must the set of serversholding it

Emin Gün Sirer Cornell University / HyperDex, Inc. http://hyperdex.org/ 13 / 51

Page 29: HyperDex: A Consistent, Fault-tolerant, Searchable, Transactional NoSQL Store - VNGRS Tech Talks

Value-Dependent Chaining

k

key subspace

A

Bsubspace 1

C

Dsubspace 2

put(k,A=1,B=1,C=1,D=1)

11 1

1 1 1put(k,A=0,B=0,C=1,D=1)

22

2

2

2

2put(k,A=0,B=1,C=1,D=1)

32 3

3

3

2

3

3

Emin Gün Sirer Cornell University / HyperDex, Inc. http://hyperdex.org/ 14 / 51

Page 30: HyperDex: A Consistent, Fault-tolerant, Searchable, Transactional NoSQL Store - VNGRS Tech Talks

Value-Dependent Chaining

k

key subspace

A

Bsubspace 1

C

Dsubspace 2

put(k,A=1,B=1,C=1,D=1)

11 1

1 1 1

put(k,A=0,B=0,C=1,D=1)

22

2

2

2

2put(k,A=0,B=1,C=1,D=1)

32 3

3

3

2

3

3

A put includes one node from each subspace

Emin Gün Sirer Cornell University / HyperDex, Inc. http://hyperdex.org/ 14 / 51

Page 31: HyperDex: A Consistent, Fault-tolerant, Searchable, Transactional NoSQL Store - VNGRS Tech Talks

Value-Dependent Chaining

k

key subspace

A

Bsubspace 1

C

Dsubspace 2

put(k,A=1,B=1,C=1,D=1)

11 1

1 1 1

put(k,A=0,B=0,C=1,D=1)

21

2

2

2 1

2

2

put(k,A=0,B=1,C=1,D=1)

32 3

3

3

2

3

3

The value-dependent chain includes the servers whichhold the old and new versions of the objectEmin Gün Sirer Cornell University / HyperDex, Inc. http://hyperdex.org/ 14 / 51

Page 32: HyperDex: A Consistent, Fault-tolerant, Searchable, Transactional NoSQL Store - VNGRS Tech Talks

Value-Dependent Chaining

k

key subspace

A

Bsubspace 1

C

Dsubspace 2

put(k,A=1,B=1,C=1,D=1)

11 1

1 1 1

put(k,A=0,B=0,C=1,D=1)

22

2

2

2

2

put(k,A=0,B=1,C=1,D=1)

32 3

3

3

2

3

3

Each put removes all state from the previous put

Emin Gün Sirer Cornell University / HyperDex, Inc. http://hyperdex.org/ 14 / 51

Page 33: HyperDex: A Consistent, Fault-tolerant, Searchable, Transactional NoSQL Store - VNGRS Tech Talks

Value-Dependent Chaining

k

key subspace

A

Bsubspace 1

C

Dsubspace 2

put(k,A=1,B=1,C=1,D=1)

11 1

1 1 1put(k,A=0,B=0,C=1,D=1)

22

2

2

2

2

put(k,A=0,B=1,C=1,D=1)

32 3

3

3

2

3

3

Subsequent operations involve solely the most recentnodesEmin Gün Sirer Cornell University / HyperDex, Inc. http://hyperdex.org/ 14 / 51

Page 34: HyperDex: A Consistent, Fault-tolerant, Searchable, Transactional NoSQL Store - VNGRS Tech Talks

Value-Dependent Chaining

k

key subspace

A

Bsubspace 1

C

Dsubspace 2

put(k,A=0,B=0,C=1,D=1)

21

2

2

2 2 1 1

2 2

2 2

Servers are replicated in each region for fault tolerance

Emin Gün Sirer Cornell University / HyperDex, Inc. http://hyperdex.org/ 15 / 51

Page 35: HyperDex: A Consistent, Fault-tolerant, Searchable, Transactional NoSQL Store - VNGRS Tech Talks

Value-Dependent Chaining

k

key subspace

A

Bsubspace 1

C

Dsubspace 2

put(k,A=0,B=0,C=1,D=1)

21

2

2

2 2 1 1

2 2

2 2

The value-dependent chain includes all replicas

Emin Gün Sirer Cornell University / HyperDex, Inc. http://hyperdex.org/ 15 / 51

Page 36: HyperDex: A Consistent, Fault-tolerant, Searchable, Transactional NoSQL Store - VNGRS Tech Talks

Value-Dependent Chaining

k

key subspace

A

Bsubspace 1

C

Dsubspace 2

put(k,A=0,B=0,C=1,D=1)

21

2

2

2 2 1 1

2 2

2 2

Failed nodes are removed from the chain

Emin Gün Sirer Cornell University / HyperDex, Inc. http://hyperdex.org/ 15 / 51

Page 37: HyperDex: A Consistent, Fault-tolerant, Searchable, Transactional NoSQL Store - VNGRS Tech Talks

Value-Dependent Chaining Implications

No extra mechanism is necessary to provideI AtomicityI OrderingI ReplicationI Relocation

Emin Gün Sirer Cornell University / HyperDex, Inc. http://hyperdex.org/ 16 / 51

Page 38: HyperDex: A Consistent, Fault-tolerant, Searchable, Transactional NoSQL Store - VNGRS Tech Talks

Multikey Transactions

I Hyperspace hashing enables HD to locate dataquickly

I Value-dependent chaining enables HD to replicatedata

I And this is sufficient for many applicationsI But some apps require atomic, consistent updates to

multiple items

Options are:I Spray and prayI Use a heavyweight algorithm (e.g. Paxos) for

orderingI HyperDex Warp

Emin Gün Sirer Cornell University / HyperDex, Inc. http://hyperdex.org/ 17 / 51

Page 39: HyperDex: A Consistent, Fault-tolerant, Searchable, Transactional NoSQL Store - VNGRS Tech Talks

Warp Properties

Warp is a novel, optimistic, concurrent, distributedalgorithm for ensuring isolated updates to a data store.

I Atomic – operations on multiple keys are indivisibleI Consistent – application invariants are preservedI Isolated – one copy serializableI Durability – all transactions are propagated to f+1

replicas

Emin Gün Sirer Cornell University / HyperDex, Inc. http://hyperdex.org/ 18 / 51

Page 40: HyperDex: A Consistent, Fault-tolerant, Searchable, Transactional NoSQL Store - VNGRS Tech Talks

Existing Systems Exhibit SpuriousCoordination

Spurious coordination is the unnecessary delay orreordering of a transaction in order to enforce an order(w.r.t. other transactions) when the transaction could havebeen applied without delay/reordering without violatingguarantees.

Examples:I Centralized sequencerI Paxos RSMI Batching

Emin Gün Sirer Cornell University / HyperDex, Inc. http://hyperdex.org/ 19 / 51

Page 41: HyperDex: A Consistent, Fault-tolerant, Searchable, Transactional NoSQL Store - VNGRS Tech Talks

Linear Transactions

I Arrange servers into a chainI Track dependencies while traversing the chainI Prevent cycles using the dependenciesI Protocol exhibits no spurious coordination

Emin Gün Sirer Cornell University / HyperDex, Inc. http://hyperdex.org/ 20 / 51

Page 42: HyperDex: A Consistent, Fault-tolerant, Searchable, Transactional NoSQL Store - VNGRS Tech Talks

Implementation

I Bindings for C, C++, Python, Go, Java, Ruby,Node.JS

I Open sourced under a BSD-like licenseI Active user community with many contributors

Emin Gün Sirer Cornell University / HyperDex, Inc. http://hyperdex.org/ 21 / 51

Page 43: HyperDex: A Consistent, Fault-tolerant, Searchable, Transactional NoSQL Store - VNGRS Tech Talks

Inserting/Retrieving an Object

>>> c.put(’phonebook’, ’jsmith’,... {’first’: ’John’,... ’last’: ’Smith’,... ’phone’: 6075551024})True>>> c.get(’phonebook’, ’jsmith’){’first’: ’John’, ’last’: ’Smith’,’phone’: 6075551024}

Emin Gün Sirer Cornell University / HyperDex, Inc. http://hyperdex.org/ 22 / 51

Page 44: HyperDex: A Consistent, Fault-tolerant, Searchable, Transactional NoSQL Store - VNGRS Tech Talks

Performing A Search

>>> [x for x in c.search(’phonebook’,... {’first’: ’John’})][{’first’: ’John’,

’last’: ’Smith’,’phone’: 6075551024, ’username’: ’jsmith’}]

>>> [x for x in c.search(’phonebook’,... {’phone’: (6070000000, 6080000000)})][{’first’: ’John’,

’last’: ’Smith’,’phone’: 6075551024, ’username’: ’jsmith’}]

Emin Gün Sirer Cornell University / HyperDex, Inc. http://hyperdex.org/ 23 / 51

Page 45: HyperDex: A Consistent, Fault-tolerant, Searchable, Transactional NoSQL Store - VNGRS Tech Talks

Atomic Operations

>>> c.condput(’phonebook’, ’jsmith’,... {’phone’: 6075551024},... {’phone’: 6075552048})True>>> c.condput(’phonebook’, ’jsmith’,... {’phone’: 6075551024},... {’phone’: 6075552048})False

Emin Gün Sirer Cornell University / HyperDex, Inc. http://hyperdex.org/ 24 / 51

Page 46: HyperDex: A Consistent, Fault-tolerant, Searchable, Transactional NoSQL Store - VNGRS Tech Talks

Atomic Operations

>>> c.atomic_add(’phonebook’, ’jsmith’,... {’phone’: 1})True>>> c.get(’phonebook’, ’jsmith’){’first’: ’John’,’last’: ’Smith’,’phone’: 6075552049}

Emin Gün Sirer Cornell University / HyperDex, Inc. http://hyperdex.org/ 25 / 51

Page 47: HyperDex: A Consistent, Fault-tolerant, Searchable, Transactional NoSQL Store - VNGRS Tech Talks

Asynchronous Operations

>>> d = c.async_put(’phonebook’, ’js’,... {’first’: ’John’,... ’last’: ’Smith’,... ’phone’: 6075551024})>>> d<hyperclient.DeferredInsert object at ...>>>> do_work()>>> d.wait()True

Emin Gün Sirer Cornell University / HyperDex, Inc. http://hyperdex.org/ 26 / 51

Page 48: HyperDex: A Consistent, Fault-tolerant, Searchable, Transactional NoSQL Store - VNGRS Tech Talks

A Space with Datastructures

>>> c.add_space(’’’... space socialnetwork... key username... attributes... first, last,... list(string) pending,... set(string) hobbies,... map(string,string) unread_messages,... map(string,int) upvotes... subspace first, last... tolerate 2 failures’’’)True

Emin Gün Sirer Cornell University / HyperDex, Inc. http://hyperdex.org/ 27 / 51

Page 49: HyperDex: A Consistent, Fault-tolerant, Searchable, Transactional NoSQL Store - VNGRS Tech Talks

Manipulating Lists

>>> c.list_rpush(’socialnetwork’, ’js’,... {’pending’: ’bjones1’})True>>> c.get(’socialnetwork’, ’js’)[’pending’][’bjones1’]

Emin Gün Sirer Cornell University / HyperDex, Inc. http://hyperdex.org/ 28 / 51

Page 50: HyperDex: A Consistent, Fault-tolerant, Searchable, Transactional NoSQL Store - VNGRS Tech Talks

Transactions

>>> t = c.begin_transaction()>>> amt1 = t.get(’account’, ’egs’)[’balance’]>>> amt2 = t.get(’account’, ’rob’)[’balance’]>>> amt1 -= 1000>>> amt2 += 1000>>> t.put(’account’, ’egs’, {’balance’: amt1})>>> t.put(’account’, ’rob’, {’balance’: amt2})>>> t.commit()

Emin Gün Sirer Cornell University / HyperDex, Inc. http://hyperdex.org/ 29 / 51

Page 51: HyperDex: A Consistent, Fault-tolerant, Searchable, Transactional NoSQL Store - VNGRS Tech Talks

Introduction

Design and Implementation

Evaluation

Perspective

Additional

Conclusion

Emin Gün Sirer Cornell University / HyperDex, Inc. http://hyperdex.org/ 30 / 51

Page 52: HyperDex: A Consistent, Fault-tolerant, Searchable, Transactional NoSQL Store - VNGRS Tech Talks

Experimental Setup

I Use the Yahoo! Cloud Serving BenchmarkI Each operation writes to two replicas of the data

Emin Gün Sirer Cornell University / HyperDex, Inc. http://hyperdex.org/ 31 / 51

Page 53: HyperDex: A Consistent, Fault-tolerant, Searchable, Transactional NoSQL Store - VNGRS Tech Talks

YCSB Throughput

0

50

100

150

200

250

MongoDB HyperDex

Thro

ughp

ut(th

ousa

ndop

/s)

Bulk LoadingWorkload AWorkload BWorkload CWorkload F

Emin Gün Sirer Cornell University / HyperDex, Inc. http://hyperdex.org/ 32 / 51

Page 54: HyperDex: A Consistent, Fault-tolerant, Searchable, Transactional NoSQL Store - VNGRS Tech Talks

95% get / 5% put Latency

0

20

40

60

80

100

1 10 100 1000

CD

F(%

)

Latency (ms)

YCSB Workload B

MongoDB (R)MongoDB (U)HyperDex (R)HyperDex (U)

Emin Gün Sirer Cornell University / HyperDex, Inc. http://hyperdex.org/ 33 / 51

Page 55: HyperDex: A Consistent, Fault-tolerant, Searchable, Transactional NoSQL Store - VNGRS Tech Talks

100% put Latency

0

20

40

60

80

100

1 10 100 1000

CD

F(%

)

Latency (ms)

YCSB Load Dataset

MongoDBHyperDex

Emin Gün Sirer Cornell University / HyperDex, Inc. http://hyperdex.org/ 34 / 51

Page 56: HyperDex: A Consistent, Fault-tolerant, Searchable, Transactional NoSQL Store - VNGRS Tech Talks

Chain Length vs. Put Latency

0

1

2

3

4

5

6

7

2 4 6 8 10 12 14 16 18 20 22

Late

ncy

(ms)

Chain Length (nodes)

Emin Gün Sirer Cornell University / HyperDex, Inc. http://hyperdex.org/ 35 / 51

Page 57: HyperDex: A Consistent, Fault-tolerant, Searchable, Transactional NoSQL Store - VNGRS Tech Talks

Scalability

0

1

2

3

4

4 8 12 16 20 24 28 32

Thro

ughp

ut(m

illio

nop

s/s)

Nodes

Emin Gün Sirer Cornell University / HyperDex, Inc. http://hyperdex.org/ 36 / 51

Page 58: HyperDex: A Consistent, Fault-tolerant, Searchable, Transactional NoSQL Store - VNGRS Tech Talks

TPC-C BenchmarkI TPC-C for key-value storesI Each row is a separate key/value pairI Key-operations onlyI Transactions: New-Order, Order-Status, Payment,

Stock-LevelI No time wait between transactionsI Each transaction touches multiple objects:

Profile R W RMW % MixNew Order 12 3 11 (1) 45Payment 0 1 3 (2) 45Order Status 12 0 0 5Stock Level 201 (1) 0 0 5

Emin Gün Sirer Cornell University / HyperDex, Inc. http://hyperdex.org/ 37 / 51

Page 59: HyperDex: A Consistent, Fault-tolerant, Searchable, Transactional NoSQL Store - VNGRS Tech Talks

TPC-C Throughput

0

50

100

150

200

HyperDex 2PCDex Warp

Thro

ughp

ut(th

ousa

ndop

s/s)

Emin Gün Sirer Cornell University / HyperDex, Inc. http://hyperdex.org/ 38 / 51

Page 60: HyperDex: A Consistent, Fault-tolerant, Searchable, Transactional NoSQL Store - VNGRS Tech Talks

TPC-C New Order Transactions

0

20

40

60

80

100

1 10 100 1000

CD

F(%

)

Latency (ms)

HyperDex2PCDexWarp

Emin Gün Sirer Cornell University / HyperDex, Inc. http://hyperdex.org/ 39 / 51

Page 61: HyperDex: A Consistent, Fault-tolerant, Searchable, Transactional NoSQL Store - VNGRS Tech Talks

TPC-C Order Status Transactions

0

20

40

60

80

100

1 10 100 1000

CD

F(%

)

Latency (ms)

HyperDex2PCDexWarp

Emin Gün Sirer Cornell University / HyperDex, Inc. http://hyperdex.org/ 40 / 51

Page 62: HyperDex: A Consistent, Fault-tolerant, Searchable, Transactional NoSQL Store - VNGRS Tech Talks

Performance Summary

I Outperforms other systems by 2–4× for get/putI While offering stronger consistency and fault

toleranceI Outperforms 2PC/Sinfonia by 3.3× for xactions

I Within 96% of non-transactional loadsI Latency for chain-operations is predictableI Scales as resources are added

Emin Gün Sirer Cornell University / HyperDex, Inc. http://hyperdex.org/ 41 / 51

Page 63: HyperDex: A Consistent, Fault-tolerant, Searchable, Transactional NoSQL Store - VNGRS Tech Talks

The CAP Theorem

I Consistency, Availability, Partition Tolerance: Pick atmost two

I Problem 1. CAP does not type-check.

I Consistency: property of a systemI Availability: property of a systemI Partitions: property of the failure model

Emin Gün Sirer Cornell University / HyperDex, Inc. http://hyperdex.org/ 42 / 51

Page 64: HyperDex: A Consistent, Fault-tolerant, Searchable, Transactional NoSQL Store - VNGRS Tech Talks

The CAP Theorem

I Consistency, Availability, Partition Tolerance: Pick atmost two

I Problem 1. CAP does not type-check.I Consistency: property of a system

I Availability: property of a systemI Partitions: property of the failure model

Emin Gün Sirer Cornell University / HyperDex, Inc. http://hyperdex.org/ 42 / 51

Page 65: HyperDex: A Consistent, Fault-tolerant, Searchable, Transactional NoSQL Store - VNGRS Tech Talks

The CAP Theorem

I Consistency, Availability, Partition Tolerance: Pick atmost two

I Problem 1. CAP does not type-check.I Consistency: property of a systemI Availability: property of a system

I Partitions: property of the failure model

Emin Gün Sirer Cornell University / HyperDex, Inc. http://hyperdex.org/ 42 / 51

Page 66: HyperDex: A Consistent, Fault-tolerant, Searchable, Transactional NoSQL Store - VNGRS Tech Talks

The CAP Theorem

I Consistency, Availability, Partition Tolerance: Pick atmost two

I Problem 1. CAP does not type-check.I Consistency: property of a systemI Availability: property of a systemI Partitions: property of the failure model

Emin Gün Sirer Cornell University / HyperDex, Inc. http://hyperdex.org/ 42 / 51

Page 67: HyperDex: A Consistent, Fault-tolerant, Searchable, Transactional NoSQL Store - VNGRS Tech Talks

The CAP Theorem

I CAP Proof SketchI Imagine two hosts, with a single network link.I If partitioned and forced to provide answers (achieve

A), the hosts cannot be consistent (fail at C)I If partitioned and forced to remain consistent (achieve

C), the hosts cannot respond (fail at A)I Problem 2. CAP defintions are not what you want.

I Availability is defined as "every host must respond toevery request it receives."

I System designers care about A at the systemboundary

I CAP defines A per-server

Emin Gün Sirer Cornell University / HyperDex, Inc. http://hyperdex.org/ 43 / 51

Page 68: HyperDex: A Consistent, Fault-tolerant, Searchable, Transactional NoSQL Store - VNGRS Tech Talks

The CAP Theorem

I Problem 3. The failure model is all too powerfulI If failures are unconstrained, anything can happenI Scant little is possible when everything can failI The Not-A Theorem: Availability, however you define

it, is not achievable.I Proof Sketch: Imagine a distributed system. If all the

servers are down due to host or network failures, itcannot be available.

I The Not-A Theorem is a dumb tautology, bereft ofinsight or implications

Emin Gün Sirer Cornell University / HyperDex, Inc. http://hyperdex.org/ 44 / 51

Page 69: HyperDex: A Consistent, Fault-tolerant, Searchable, Transactional NoSQL Store - VNGRS Tech Talks

The CAP Theorem

I Problem 4. CAP is correct, but has noconsequences.

I It is simply a contradiction of definitions.I Implies nothing about system design.I If a system gives up on C or A (or P!) because of

CAP, it’s doing something wrong.

Emin Gün Sirer Cornell University / HyperDex, Inc. http://hyperdex.org/ 45 / 51

Page 70: HyperDex: A Consistent, Fault-tolerant, Searchable, Transactional NoSQL Store - VNGRS Tech Talks

The CAP Theorem

I What CAP really says:I If you cannot limit the number of faultsI and requests can be directed to any serverI and you insist on serving every requestI then you cannot possibly be consistent

I Most NoSQL systems are proud to preemptively giveup desirable properties like consistency in the nameof CAP — even in the case of no failures

I HyperDex allows for f failures without sacrificingconsistency or availability

Emin Gün Sirer Cornell University / HyperDex, Inc. http://hyperdex.org/ 46 / 51

Page 71: HyperDex: A Consistent, Fault-tolerant, Searchable, Transactional NoSQL Store - VNGRS Tech Talks

The CAP Theorem

I What of CAP?I There is a 3-way tradeoff involving C, A, and P!I Consistency, Availability and PerformanceI One may sacrifice C or A to gain P. This tradeoff was

behind the success of broken-by-design systemssuch as MongoDB.

I HyperDex achieves C and A, and the P you actuallywant, in the presence of the original P.

Emin Gün Sirer Cornell University / HyperDex, Inc. http://hyperdex.org/ 47 / 51

Page 72: HyperDex: A Consistent, Fault-tolerant, Searchable, Transactional NoSQL Store - VNGRS Tech Talks

Backups

I Everyone else implements inconsistent backupsI Inconsistent backups are only useful if you stop the

world and snapshotI HyperDex takes a consistent snapshotI Approximately 200-300ms for a snapshotI Copying takes place in the background

Emin Gün Sirer Cornell University / HyperDex, Inc. http://hyperdex.org/ 48 / 51

Page 73: HyperDex: A Consistent, Fault-tolerant, Searchable, Transactional NoSQL Store - VNGRS Tech Talks

Security and Macaroons

I Typically, all items in a database are under the sameACL

I Macaroons were invented at Google to protect userdata from front-end breaches

I Like cookies, but much more powerfulI Every item can be protected individually

Emin Gün Sirer Cornell University / HyperDex, Inc. http://hyperdex.org/ 49 / 51

Page 74: HyperDex: A Consistent, Fault-tolerant, Searchable, Transactional NoSQL Store - VNGRS Tech Talks

Deployment

I HyperDex automatically configures itself.I Supported on Linux and OS X.I Works well within Docker.I Install and deploy an auto-scaling cluster in 30

seconds.

Emin Gün Sirer Cornell University / HyperDex, Inc. http://hyperdex.org/ 50 / 51

Page 75: HyperDex: A Consistent, Fault-tolerant, Searchable, Transactional NoSQL Store - VNGRS Tech Talks

Conclusion

I HyperDex is a next generation NoSQL systemI NoSQL advantages can be combined with

consistency and fault tolerance guaranteesI http://hyperdex.org/

Emin Gün Sirer Cornell University / HyperDex, Inc. http://hyperdex.org/ 51 / 51

Page 76: HyperDex: A Consistent, Fault-tolerant, Searchable, Transactional NoSQL Store - VNGRS Tech Talks

Emin Gün Sirer Cornell University / HyperDex, Inc. http://hyperdex.org/ 1 / 17

Page 77: HyperDex: A Consistent, Fault-tolerant, Searchable, Transactional NoSQL Store - VNGRS Tech Talks

YCSB Benchmark WorkloadsName Workload Key Choice Application

A 50% R Zipf Session Store50% U

B 95% R Zipf Photo Tagging5% U

C 100% R Zipf Profile CacheD 95% R Temporal Status Updates

5% IE 95% S Zipf Threads

5% IF 50% R Zipf User Database

50% R-M-U

R = Read, U = Update, I = Insert, S = Scan/Search

Emin Gün Sirer Cornell University / HyperDex, Inc. http://hyperdex.org/ 2 / 17

Page 78: HyperDex: A Consistent, Fault-tolerant, Searchable, Transactional NoSQL Store - VNGRS Tech Talks

Hash Functions and Load Balancing

I Out of the box, HyperDex supports hashing stringsand integers

I What about non-uniform inputs?I Select a better hash functionI Use forwarding pointersI Create multiple dimensions in the hyperspace for a

single attributeI The default hash functions work well for workloads

that we’ve seen in practice

Emin Gün Sirer Cornell University / HyperDex, Inc. http://hyperdex.org/ 3 / 17

Page 79: HyperDex: A Consistent, Fault-tolerant, Searchable, Transactional NoSQL Store - VNGRS Tech Talks

Experimental Setup

Lab ClusterI 14 MachinesI Intel Xeon 2.5 GHz E5420 ×

2I 16 GB RAMI 500 GB SATA HDDI Debian 6.0I Linux 2.6.32

VICCI ClusterI 70 MachinesI Intel Xeon 2.66 GHz X5650 × 2I 48 GB RAMI 1 TB SATA HDD × 3I Virtualized Fedora 12I Linux 2.6.32

Emin Gün Sirer Cornell University / HyperDex, Inc. http://hyperdex.org/ 4 / 17

Page 80: HyperDex: A Consistent, Fault-tolerant, Searchable, Transactional NoSQL Store - VNGRS Tech Talks

Experimental Setup

I Use the Yahoo! Cloud Serving BenchmarkI Each system makes two replicas of the dataI MongoDB: Writes to the client’s outgoing socket

bufferI Cassandra: Writes to one storage node’s filesystemI HyperDex: Writes to both replicas in three

subspaces

Emin Gün Sirer Cornell University / HyperDex, Inc. http://hyperdex.org/ 5 / 17

Page 81: HyperDex: A Consistent, Fault-tolerant, Searchable, Transactional NoSQL Store - VNGRS Tech Talks

YCSB Throughput

0

5

10

15

20

25

30

35

40

CassandraMongoDB HyperDex

Thro

ughp

ut(th

ousa

ndop

/s)

Workload AWorkload BWorkload CWorkload DWorkload FWorkload E

Emin Gün Sirer Cornell University / HyperDex, Inc. http://hyperdex.org/ 6 / 17

Page 82: HyperDex: A Consistent, Fault-tolerant, Searchable, Transactional NoSQL Store - VNGRS Tech Talks

95% get / 5% put Latency

0

20

40

60

80

100

1 10 100 1000

CD

F(%

)

Latency (ms)

YCSB Workload B

Cassandra (R)Cassandra (U)MongoDB (R)MongoDB (U)HyperDex (R)HyperDex (U)

Emin Gün Sirer Cornell University / HyperDex, Inc. http://hyperdex.org/ 7 / 17

Page 83: HyperDex: A Consistent, Fault-tolerant, Searchable, Transactional NoSQL Store - VNGRS Tech Talks

100% put Latency

0

20

40

60

80

100

1 10 100 1000

CD

F(%

)

Latency (ms)

YCSB Load Dataset

CassandraMongoDBHyperDex

Emin Gün Sirer Cornell University / HyperDex, Inc. http://hyperdex.org/ 8 / 17

Page 84: HyperDex: A Consistent, Fault-tolerant, Searchable, Transactional NoSQL Store - VNGRS Tech Talks

search Latency

0

20

40

60

80

100

1 10 100 1000

CD

F(%

)

Latency (ms)

YCSB Workload E

CassandraMongoDBHyperDex

Emin Gün Sirer Cornell University / HyperDex, Inc. http://hyperdex.org/ 9 / 17

Page 85: HyperDex: A Consistent, Fault-tolerant, Searchable, Transactional NoSQL Store - VNGRS Tech Talks

Cluster Size

I Netflix: App-specific clusters of 6-48 Cassandrainstances

I Google BigTable:I 66% of clusters < 20 tablet serversI 84% of clusters < 100 tablet serversI 96% of clusters < 500 tablet servers

I Justin Sheehy, Basho Inc.:I Typical cluster is 6-12 Riak nodesI Largest clusters < 100 Riak nodes

Emin Gün Sirer Cornell University / HyperDex, Inc. http://hyperdex.org/ 10 / 17

Page 86: HyperDex: A Consistent, Fault-tolerant, Searchable, Transactional NoSQL Store - VNGRS Tech Talks

Related WorkI Multi-dimensional database systems on a single host

I Grid File, KD-Tree, Multi-dimensional BST, Quad-Tree,R-Tree, Universal B-Tree

I Distributed database systems maintain distributed indicesI Distributed B-Tree, P-Tree, Sinfonia

I Peer-to-peer systems are only eventually consistentI Arpeggio, CAN, Chord, Consistent Hashing, Mercury,

MURK, Pastry, SkipIndex, SWAM-V, TapestryI Space-filling curves suffer from the curse of dimensionality

I MAAN, SCRAP, Squid, ZNetI NoSQL systems/key-value stores give up search, consistency or

fault-toleranceI CouchDB, MongoDB, Neo4j, PNUTS, Redis, TXCache,

BigTable, Cassandra, COPS, Distributed Data Structures,Dynamo, Fawn KV, HBase, HyperTable, LazyBase,Masstree, Memcached, RAMCloud, Riak, SILT, Spanner,Spinnaker, TSSL, Voldemort

Emin Gün Sirer Cornell University / HyperDex, Inc. http://hyperdex.org/ 11 / 17

Page 87: HyperDex: A Consistent, Fault-tolerant, Searchable, Transactional NoSQL Store - VNGRS Tech Talks

Linear Transactions

G-I

D-F

A-CY-Z

V-X

S-U

P-RM-O

J-L

T1

T2

T3

Servers are arranged into a chain; dependencies aretracked at each point where two chains overlapEmin Gün Sirer Cornell University / HyperDex, Inc. http://hyperdex.org/ 12 / 17

Page 88: HyperDex: A Consistent, Fault-tolerant, Searchable, Transactional NoSQL Store - VNGRS Tech Talks

Linear Transactions

G-I

D-F

A-CY-Z

V-X

S-U

P-RM-O

J-L

T1

T2T3

Servers are arranged into a chain; dependencies aretracked at each point where two chains overlapEmin Gün Sirer Cornell University / HyperDex, Inc. http://hyperdex.org/ 12 / 17

Page 89: HyperDex: A Consistent, Fault-tolerant, Searchable, Transactional NoSQL Store - VNGRS Tech Talks

Dependency Management PreventsCycles

s2

s1

s0s8

s7

s6

s5s4

s3

T1

T2

T3

s0 s1 s2

deps = [T2,T3]

deps = [T1,T2]

Emin Gün Sirer Cornell University / HyperDex, Inc. http://hyperdex.org/ 13 / 17

Page 90: HyperDex: A Consistent, Fault-tolerant, Searchable, Transactional NoSQL Store - VNGRS Tech Talks

Dependency Management PreventsCycles

s2

s1

s0s8

s7

s6

s5s4

s3

T1

T2

T3

s0 s1 s2deps = [T2,T3]

deps = [T1,T2]

s0 abides by T3 T1 because of the dependencies withinT1Emin Gün Sirer Cornell University / HyperDex, Inc. http://hyperdex.org/ 13 / 17

Page 91: HyperDex: A Consistent, Fault-tolerant, Searchable, Transactional NoSQL Store - VNGRS Tech Talks

Dependency Management PreventsCycles

s2

s1

s0s8

s7

s6

s5s4

s3

T1

T2

T3

s0 s1 s2

deps = [T2,T3]

deps = [T1,T2]

s0 abides by T1 T3 because of the dependencies withinT3Emin Gün Sirer Cornell University / HyperDex, Inc. http://hyperdex.org/ 13 / 17

Page 92: HyperDex: A Consistent, Fault-tolerant, Searchable, Transactional NoSQL Store - VNGRS Tech Talks

Dependency Management PreventsCycles

s2

s1

s0s8

s7

s6

s5s4

s3

T1

T2

T3

s0 s1 s2

deps = [T2,T3]

deps = [T1,T2]

s0 is free to commit T1 and T3 in natural arrival orderEmin Gün Sirer Cornell University / HyperDex, Inc. http://hyperdex.org/ 13 / 17

Page 93: HyperDex: A Consistent, Fault-tolerant, Searchable, Transactional NoSQL Store - VNGRS Tech Talks

TPC-C Payment Transactions

0

20

40

60

80

100

1 10 100 1000

CD

F(%

)

Latency (ms)

HyperDex2PCDexWarp

Emin Gün Sirer Cornell University / HyperDex, Inc. http://hyperdex.org/ 14 / 17

Page 94: HyperDex: A Consistent, Fault-tolerant, Searchable, Transactional NoSQL Store - VNGRS Tech Talks

TPC-C Stock Level Transactions

0

20

40

60

80

100

1 10 100 1000

CD

F(%

)

Latency (ms)

HyperDex2PCDexWarp

Emin Gün Sirer Cornell University / HyperDex, Inc. http://hyperdex.org/ 15 / 17

Page 95: HyperDex: A Consistent, Fault-tolerant, Searchable, Transactional NoSQL Store - VNGRS Tech Talks

TPC-C Retries

0

20

40

60

80

100

0 1 2 3 4 5

CD

F(%

)

Retries

2PCDexWarp

Emin Gün Sirer Cornell University / HyperDex, Inc. http://hyperdex.org/ 16 / 17

Page 96: HyperDex: A Consistent, Fault-tolerant, Searchable, Transactional NoSQL Store - VNGRS Tech Talks

The CAP Theorem

I Fallacy. The CAP Theorem is not equivalent to theFLP Result.

I The FLP Result is a deep result that provides insightsinto the nature and complexity of consensusalgorithms.

I CAP is a simple tautology, like the Not-A Theorem.I CAP and FLP do not reduce to one another.

Emin Gün Sirer Cornell University / HyperDex, Inc. http://hyperdex.org/ 17 / 17