Progressive NOSQL: Cassandra

Post on 24-Jan-2015

1.216 views 2 download

description

Tom Wilkie's talk at Progressive NOSQL conference in London on 11/05/12.

Transcript of Progressive NOSQL: Cassandra

A tunably consistent, highly-available, Distributed Database

Tom Wilkie @tom_wilkieFounder & VP Engineering, Acunu

1

• Overview

• Distribution

• Storage

• Datamodel

• Usecases

2

• Overview

• Distribution

• Storage

• Datamodel

• Usecases

3

• A distributed database for Big Data

• Scale out on commodity servers

• Best of bread performance

• Multi-master architecture, no SPOF

• Powerful multi data centre support

4

4

5

5

BigTable, 2006 Dynamo, 2007

Open sourced, 2008

Incubator, 2009TLP, 2010v1.0 2011

6

BigTable: ...

• Simple but powerful datamodel

• Write-optimised storage system

• Consistent, available but not partition tolerant

• Master-slave distribution system, SPOF

http://goo.gl/7T1Ej

7

7

Dynamo: ...

• Sophisticated distribution system with tradable consistency and availability

• Over-simple datamodel

http://goo.gl/Q80b4

8

8

• Overview

• Distribution

• Storage

• Datamodel

• Usecases

9

Distribution: Consistent Hashing

10

r1, c1 → v1r2, c2 → v2r3, c3 → v3

10

Distribution: Scaling

11

11

Distribution: Scaling

12

12

Distribution: Scaling

• .

13

13

Distribution: Scaling

14

14

Distribution: Scaling

15

15

Distribution: Replication

16

r1, c1 → v1

16

Distribution: Replication

17

17

Distribution: ConsistencyTuneable, per-operation consistency

Timestamped values, N > R + W

18

RW

18

Distribution: Read Repair

19

19

Distribution: Read Repair

20

20

Distribution: Read Repair

21

21

Distribution: Read Repair

22

22

• Overview

• Distribution

• Storage

• Datamodel

• Usecases

23

Writing to Cassandra

Row Key Column Column Column Column

24

24

Memtable

In the JVM

On disk Commit log

Writing to Cassandra

25

Row Colu Colu Colu Colu

25

Writing to Cassandra

Full Memtable

26

Commit log

In the JVM

On disk

26

New Memtable

SSTable

27

Writing to CassandraIn the JVM

On disk Commit log

27

Writing to Cassandra

28

SSTableOn disk Commit log SSTable

SSTableSSTableSSTableSSTable

28

Writing to Cassandra

29

SSTable

On disk Commit log

29

Reading from Cassandra

30

MemtableIn the JVM

On disk SSTableCommit log

SSTableindex

Bloom filter

Row cacheOff-heap (no GC)

Key cache

1

2

3 4 5

6

31

31

• Overview

• Distribution

• Storage

• Datamodel

• Usecases

32

SQL Cassandra

Keyspace

Column Family

Database

Tablerow/ col_1 col_1

row/key col_1 col_1row/key col_1 col_2

33

33

col1 col2 col3 col4 col5 col6 col7row1 x x xrow2 x x x x xrow3 x x x x xrow4 x x x xrow5 x x x xrow6 xrow7 x x x

34

34

alice: { m2: { Sender: bob, Subject: ‘paper!’, ... }}

bob: { m1: { Sender: alice, Subject: ‘rock?’, ... }}

charlie: { m1: { Sender: alice, Subject: ‘rock?’, ... }, m2: { Sender: bob, Subject: ‘paper!’, ... }}

35

35

• Overview

• Distribution

• Storage

• Datamodelling

• Usecases

36

37Confidential 6

Location ServicesWeb, SCM, Retail

Fraud Detection

Cloud Monitoring

Oil/Gas Sensors

Social MediaSocial Gaming Ad Marketplaces

Smart Metering

Perfect for high velocity data

Wednesday, 25 April 1237

Not Covered...

• Distribution: Hinted Handoff, Anti-entropy repair, Counter distribution

• Storage: Counter storage, different compaction strategies, partitioning etc

• Datamodel: de-normalisation, TTLs, secondary indexes, CQL, super-columns, schema optional

• Operations: backup, nodetool, performance tuning

• Integration: Hadoop, Client Libraries etc

38

38

• Distributed, scalable database

• Opensource, widely used

• Tunably consistent

• Highly-available

• Partition tolerant

• Write-optimised

• Schema-optional

39

Data Platform

40

Commodity Hardware

Apache Cassandra

Acunu Storage Engine

Control Center

Data driven applications Web UI

Configured and tuned OS

Acunu Analytics

Data Platform

41

Control Center

“I've had the EC2 instance running for a little while and I have to say, I'm impressed. You guys have done well with

this product.”- Lloyd, JustDevelopIt

42

Control Center

“The new UI has been critical in helping us work out what is wrong in our code”

- Matt, TellyBug

43

Castle: Built for Big Data

• Storage engine optimized for large slow disks, many cores, Big Data workloads

• Enterprise density on commodity hardware

• Lightning disk rebuilds:10x faster than RAID

Acunu Kernel

Userspace

Doubling Arrays

arrays range

querieskey

insert

insertqueues

Bloom filters

x

user

spac

ein

terfa

ceke

rnel

spac

ein

terfa

cedo

ublin

g a

rray

map

ping

laye

rm

odlis

t btre

em

appi

ng la

yer

bloc

k m

appi

ng &

cach

eing

laye

r

"Extent" layerextent

allocatorfreespacemanager

btreerange

queries

key get

key insert

Version tree

Streaming interfacekey

insertkey get

bufferedvalue get

bufferedvalue insert

range queries

Cache

flusher

extent blockcache

prefetcher

In-kernel workloads

shared buffersasync, sharedmemory ring

Shared memory interfacekeys

values

Arrays

value arrays

btree

key get

arraysmanagement

merges

• Opensource (GPLv2, MIT for user libraries)

• http://bitbucket.org/acunu

• Loadable Kernel Module, targeting CentOS’s 2.6.18

• http://www.acunu.com/blogs/andy-twigg/why-

Castle

http://goo.gl/gzihe

44

44

45

0

1

2

3

4

5

RAID10, 8 Disks RAID5, 8 Disks RDA, 8 Disks RDA, 15 Disks

Re

bu

ild

Tim

e (

Ho

urs

)

Rebuild time

46

46

Analytics

• Simple, real-time, incremental analytics

• Push processing into ingest phase

events

counterupdates

Acunu Analytics

Click streamSensor data

etc

47

tom@acunu.com@tom_wilkie

www.acunu.com

Questions?

48

Introduction

49

Live & historicalaggregates...

49

50

Realtime trends...

50

51

Drill downsand roll ups

51

52

Solution Con

Scalability$$$

Not realtimeInefficient Recomputation

Spartan query semantics => complex, DIY solutions

52