Fit For Purpose: The New Database Revolution Findings Webcast

89
One Size Doesn’t Fit All The database revolution April 25, 2012 Mark R. Madsen http:// ThirdNature.net Robin Bloor http://Bloorgroup.com Wednesday, April 25, 12

Transcript of Fit For Purpose: The New Database Revolution Findings Webcast

Page 1: Fit For Purpose: The New Database Revolution Findings Webcast

One Size Doesn’t Fit AllThe database revolution

April 25, 2012

Mark R. Madsenhttp://ThirdNature.net

Robin Bloorhttp://Bloorgroup.com

Wednesday, April 25, 12

Page 3: Fit For Purpose: The New Database Revolution Findings Webcast

Analysts Host

Bloor Madsen

Wednesday, April 25, 12

Page 4: Fit For Purpose: The New Database Revolution Findings Webcast

Introduction

Significant and revolutionary changes are taking place in database technology

In order to investigate and analyze these changes and where they may lead, The Bloor Group has teamed up with Third Nature to launch an Open Research project.

This is the final webinar in a series of webinars and research activities that have comprised part of the project

All published research will be made available through our web site: Databaserevolution.com

Wednesday, April 25, 12

Page 5: Fit For Purpose: The New Database Revolution Findings Webcast

Sponsors of This Research

Wednesday, April 25, 12

Page 6: Fit For Purpose: The New Database Revolution Findings Webcast

General Webinar Structure

Market Changes, Database Changes (Some Of The Findings)

Let’s Talk About Performance

How to Select A Database

Wednesday, April 25, 12

Page 7: Fit For Purpose: The New Database Revolution Findings Webcast

Market Changes, Database Changes

Wednesday, April 25, 12

Page 8: Fit For Purpose: The New Database Revolution Findings Webcast

Database Performance Bottlenecks

CPU saturation

Memory saturation

Disk I/O channel saturation

Locking

Network saturation

Parallelism – inefficient load balancing

Wednesday, April 25, 12

Page 9: Fit For Purpose: The New Database Revolution Findings Webcast

Multiple Database Roles

BIApp

BIAppBI

AppBI

AppBIApp

BIApp

BIApp

BIAppBI

App

OLAPCubesOLAP

CubesData

WarehouseStaging

Area

OperationalDataStore

DataMartsData

Marts

PersonalData

StoresPersonal

DataStores

ContentDBMS

File or DBMS

Transactional Systems BI and Analytics Systems

Unstructured Data

Structured Data

File orDBMS

File orDBMS

AppAppAppAppAppApp

File orDBMS

DBMSDBMS

DBMS

Now there are more...Wednesday, April 25, 12

Page 10: Fit For Purpose: The New Database Revolution Findings Webcast

The Origin of Big Data

+ Embedded Systems Data

+ Social Network Data

+ Web Data

+ Supply Chain & Cust. Data

+ Personal Data

+ Unstructured Data

CorporateDatabases

Wednesday, April 25, 12

Page 11: Fit For Purpose: The New Database Revolution Findings Webcast

Wednesday, April 25, 12

Page 12: Fit For Purpose: The New Database Revolution Findings Webcast

Big Data = Scale Out

Server 1

Data is compressed andpartitioned on disk by column and by range

Query

Sub Query 1

Sub Query 2

The query is decomposed into a sub-query for each node

CommonMemory

The columnar database scales up and out by adding more serversDatabase

Table

Cache

CPU CPU

Server 2

CPU CPU

Server 1

CPU CPU

CommonMemory

Cache

CommonMemory

Cache

DataDataDataDataDataDataDataData

DataDataDataData

Wednesday, April 25, 12

Page 13: Fit For Purpose: The New Database Revolution Findings Webcast

Let’s Stop Using the Term NoSQL

As the graph indicates, it’s just not

helpful. In fact it’s downright confusing.

nosql

Data Volume

Single Table

Star Schema

Snow Flake

TNF Schema

Nested Data

Graph Data

Complex Data

OLAP

newsqloldsql

Wednesday, April 25, 12

Page 14: Fit For Purpose: The New Database Revolution Findings Webcast

Wednesday, April 25, 12

Page 15: Fit For Purpose: The New Database Revolution Findings Webcast

NoSQL DirectionsSome NDBMS do not attempt to provide all ACID properties. (Atomicity, Consistency, Isolation, Durability)

Some NDBMS deploy a distributed scale-out architecture with data redundancy.

XML DBMS using XQuery are NDBMS.

Some documents stores are NDBMS (OrientDB, Terrastore, etc.)

Object databases are NDBMS (Gemstone, Objectivity, ObjectStore, etc.)

Key value stores = schema-less stores (Cassandra, MongoDB, Berkeley DB, etc.)

Graph DBMS (DEX, OrientDB, etc.) are NDMBS

Large data pools (BigTable, Hbase, Mnesia, etc.) are NDBMS

Wednesday, April 25, 12

Page 16: Fit For Purpose: The New Database Revolution Findings Webcast

The Joys of SQL?

SQL: very good for set manipulation. Works for OLTP and many query environments.

Not good for nested data structures (documents, web pages, etc.)

Not good for ordered data sets

Not good for data graphs (networks of values)

Wednesday, April 25, 12

Page 17: Fit For Purpose: The New Database Revolution Findings Webcast

Wednesday, April 25, 12

Page 18: Fit For Purpose: The New Database Revolution Findings Webcast

The “Impedance Mismatch”

The RDBMS stores data organized according to table structures

The OO programmer manipulates data organized according to complex object structures, which may have specific methods associated with them.

The data does not simply map to the structure it has within the database

Consequently a mapping activity is necessary to get and put data

Basically: hierarchies, types, result sets, crappy APIs, language bindings, tools

Wednesday, April 25, 12

Page 19: Fit For Purpose: The New Database Revolution Findings Webcast

Wednesday, April 25, 12

Page 20: Fit For Purpose: The New Database Revolution Findings Webcast

The SQL Barrier

SQL has:DDL (for data definition)

DML (for Select, Project and Join)

But it has no MML (Math) or TML (Time)

Usually result sets are brought to the client for further analytical manipulation, but this creates problems

Alternatively doing all analytical manipulation in the database creates problems

Wednesday, April 25, 12

Page 21: Fit For Purpose: The New Database Revolution Findings Webcast

Wednesday, April 25, 12

Page 22: Fit For Purpose: The New Database Revolution Findings Webcast

Hadoop/MapReduce

Hadoop is a parallel processing environment

Map/Reduce is a parallel processing framework

Hbase turns Hadoop into a database of a kind

Hive adds an SQL capability

Pig adds analytics

BackUp/Recov

HDFS

Node 1

MappingProcess

Scheduler

BackUp/Recov

HDFS

Node i

MappingProcess

BackUp/Recov

Node i+1

ReducingProcess

BackUp/Recov

Node j

ReducingProcess

BackUp/Recov

Node k

ReducingProcess

Map Partition Combine Reduce

BackUp/Recov

Wednesday, April 25, 12

Page 23: Fit For Purpose: The New Database Revolution Findings Webcast

Wednesday, April 25, 12

Page 24: Fit For Purpose: The New Database Revolution Findings Webcast

Market Forces

A new set of products appear

They include some fundamental innovations

A few are sufficiently popular to last

Fashion and marketing drive greater adoption

Products defects begin to be addressed

They eventually challenge the dominant products

Wednesday, April 25, 12

Page 25: Fit For Purpose: The New Database Revolution Findings Webcast

Let’s Talk About Performance

Wednesday, April 25, 12

Page 26: Fit For Purpose: The New Database Revolution Findings Webcast

Performance%and%Scalability%

Page 27: Fit For Purpose: The New Database Revolution Findings Webcast

Scalability%and%performance%are%not%the%same%thing%

Page 28: Fit For Purpose: The New Database Revolution Findings Webcast

Throughput:"the"number"of"tasks"completed"in"a"given"5me"period"A"measure"of"how"much"work"is"or"can"be"done"by"a"system"in"a"set"amount"of"5me,"e.g."TPM"or"data"loaded"per"hour."

It’s"easy"to"increase"throughput"without"improving"response"5me."

Page 14

Performance%measures%

Page 29: Fit For Purpose: The New Database Revolution Findings Webcast

Performance%measures%

Response'8me:"the"speed"of"a"single"task"

Response"5me"is"usually"the"measure"of"an"individual's"experience"using"a"system.""

Response"5me"=""5me"interval"/"throughput"

Page 15

Page 30: Fit For Purpose: The New Database Revolution Findings Webcast

Scalability%vs%throughput%vs%response%<me%

Scalability"="consistent"performance"for"a"task"over"an"increase"in"a"scale"factor"

Page 31: Fit For Purpose: The New Database Revolution Findings Webcast

Three%possible%scale%factors%

Number of users!

Computations!

Amount of data!

Page 32: Fit For Purpose: The New Database Revolution Findings Webcast

Scale:%Data%Volume%

The"different"ways"people"count"make"establishing"rules"of"thumb"for"sizing"hard."

How"do"you"measure"it?"▪  Row"counts"▪  Transac5on"counts"▪  Data"size"▪  Raw"data"vs"loaded"data"▪  Schema"objects"

People's8ll'have'trouble'scaling'for'databases'as'large'as'a'single'PC'hard'drive.'

Page 33: Fit For Purpose: The New Database Revolution Findings Webcast

Scale:%Concurrency%(ac<ve%and%passive)%

Page 34: Fit For Purpose: The New Database Revolution Findings Webcast

Scalability%rela<onships%

As"concurrency"increases,"response"5me"(usually)"decreases,"

This"can"be"addressed"somewhat"via"workload"management"tools."

When"a"system"hits"a"bogleneck,"response"5me"and"throughput"will"ohen"get"worse,"not"just"level"off."

Page 35: Fit For Purpose: The New Database Revolution Findings Webcast

“Linear%Scalability”%

This"is"the"part"of"the"chart"most"vendors"show."

If you’re lucky they leave the bottom axis on so you know where their system flatlines.

Page 36: Fit For Purpose: The New Database Revolution Findings Webcast

Scale:%Computa<onal%Complexity%

Page 37: Fit For Purpose: The New Database Revolution Findings Webcast

A"key"point"worth"remembering:"

Performance"over"size"<>"performance"over"complexity"

Analy5cs"performance"is"about"the"intersec5on"of"both."

Database"performance"for"BI"is"mostly"related"to"size"and"query"complexity."

Page 38: Fit For Purpose: The New Database Revolution Findings Webcast

SOME%TECHNOLOGY%STUFF%

Page 39: Fit For Purpose: The New Database Revolution Findings Webcast

Large%Memories%and%Large%Databases%

Not"as"fast"as"you"expect"because"of"how"databases"were"designed"(op5mized"for"small"memories"and"disk"access)."For"example:"sequen5al"scans"and"cache"serializa5on"

512GB DB buffer cache

1B rows, 100/block = 640GB table unread

LRU overwrites older blocks

Page 40: Fit For Purpose: The New Database Revolution Findings Webcast

In_Memory%Databases%Today%

1.  Maybe"not"as"fast"you"think."Depends"en5rely"on"the"database"(e.g."VectorWise)"

2.  Applied"mainly"to"shared?everything"systems"

3.  Very"large"memories"are"more"applicable"to"shared?nothing"than"shared?memory"systems"

7.  S5ll"an"expensive"way"to"get"performance"

" "Box?limited "Limited"by"node"scaling"" "e.g."2"TB"max "e.g."16"nodes,"512GB"per"="8TB"

Page 41: Fit For Purpose: The New Database Revolution Findings Webcast

Hardware%changes%enable%new%so`ware%models%

The"extra"CPU"allows"us"to"do"things"in"sohware"that"we"avoided"in"the"past"because"of"scarce"resources."

Compression"techniques"and"columnar"database"architectures"which"that"consumed"too"much"are"now"possible."

Page 42: Fit For Purpose: The New Database Revolution Findings Webcast

Improving%Query%Performance:%Columnar%Databases%

Marge"Inovera"

Anita"Bath"

Ivan"Awfulitch"

Nadia"Geddit"

$150,000"

$120,000"

$166,000"

$36,000"

1"

2"

3"

4"

In a row-store model these three rows would be stored in sequential order as shown here, packed into a block.

In a column store they would be divided into columns and stored in different blocks.

ID% Name% Salary% Posi<on%

1" Marge"Inovera" $150,000" Sta5s5cian"

2" Anita"Bath" $120,000" Sewer"inspector"

3" Ivan"Awfulitch" $160,000" Dermatologist"

4" Nadia"Geddit" $36,000" DBA"

Sta5s5cian"

Sewer"inspector"

Dermatologist"

DBA"

Page 43: Fit For Purpose: The New Database Revolution Findings Webcast

Inser<ng%data%into%a%columnar%database%

Marge"Inovera"

Anita"Bath"

Ivan"Awfulitch"

Nadia"Geddit"

$150,000"

$120,000"

$166,000"

$36,000"

1"

2"

3"

4"

Each column is stored in its own set of blocks, written to disk separately.

Extra work for writes over rowstore, update complexity, delete complexity.

Sta5s5cian"

Sewer"inspector"

Dermatologist"

DBA"

Page 44: Fit For Purpose: The New Database Revolution Findings Webcast

Reading%from%a%columnar%database%

Marge"Inovera"

Anita"Bath"

Ivan"Awfulitch"

Nadia"Geddit"

$150,000"

$120,000"

$166,000"

$36,000"

1"

2"

3"

4"

SELECT * FROM emp WHERE ID = 1

4 reads, extract & stitch

Sta5s5cian"

Sewer"inspector"

Dermatologist"

DBA"

Page 45: Fit For Purpose: The New Database Revolution Findings Webcast

Column%elimina<on%and%I/O%

Marge"Inovera"

Anita"Bath"

Ivan"Awfulitch"

Nadia"Geddit"

$150,000"

$120,000"

$166,000"

$36,000"

1"

2"

3"

4"

SELECT AVG(salary) FROM emp

1 read

Sta5s5cian"

Sewer"inspector"

Dermatologist"

DBA"

Page 46: Fit For Purpose: The New Database Revolution Findings Webcast

How%do%we%scale%performance%for%queries?%

Faster"CPUs"means"quicker"response"5me,"increased"throughput."

Query

CPU

Make CPU faster

Parallelize query execution

Add CPUs

Parallel"query"execu5on"resolves"response"5me"but"it"consumes"more"resources,"reducing"concurrency"and"possibly"throughput."

More"CPUs"means"more"throughput."

Page 47: Fit For Purpose: The New Database Revolution Findings Webcast

Early%query%performance%scaling:%table%par<<oning%

Table"par55oning"distributes"rows"across"table"par55ons"by"range,"hash"or"round"robin"when"you"insert"or"load"the"data."

QI Sales Table

fn

Q2 Sales Table Q3 Sales Table Q4 Sales Table

Page 48: Fit For Purpose: The New Database Revolution Findings Webcast

Scale_up%vs.%Scale_out%Parallelism%

Uniprocessor"environments"required"chip"upgrades."

SMP"servers"can"grow"to"a"point,"then"it’s"a"forklih"upgrade"to"a"bigger"box."

MPP"servers"grow"by"adding"mode"nodes."

Slide 34 Copyright"Third"Nature,"Inc."

(a)"Scaling"up"with"a"larger"server"(b)"Scaling"out"with"many"small"servers"

Page 49: Fit For Purpose: The New Database Revolution Findings Webcast

Sharding,%aka%Par<<oning%at%the%Node%Level%

Sharding"is"basically"horizontal"par55oning"applied"across"mul5ple"database"servers."

Each"node"holds"a"(hopefully)"self?consistent"por5on"of"the"database."

Good"as"long"as"queried"data"lives"on"a"single"node."

One large database = several smaller databases

Query redirect

Page 50: Fit For Purpose: The New Database Revolution Findings Webcast

Sharding,%Databases%and%Queries%

What"happens"when"you"need"to"scan"a"full"table"or"join"tables"across"nodes?"Mul5ple"queries"and"s5tching"at"the"applica5on"level."

Sharding"works"well"for"fixed"access"paths,"uniform"query"plans,"and"data"sets"that"can"be"isolated."Mainly"this"describes"an"OLTP?style"workload."

Page 51: Fit For Purpose: The New Database Revolution Findings Webcast

Cloud%Hardware%Architecture%

It’s"a"scale?out"model."Uniform"virtual"node"building"blocks."

This"is"the"future"of"sohware"deployments,"albeit"with"increasing"node"sizes,"so"paying"agen5on"to"early"adopters"today"will"pay"off."

This"implies"that"an"MPP"database"architecture"will"be"needed"for"scale."

X

Page 52: Fit For Purpose: The New Database Revolution Findings Webcast

MPP%Database%Architecture%

Slide 38 Copyright"Third"Nature,"Inc."

Worker"nodes"

Leader"node(s)"used"by"some"

High"speed"interconnect"

Some"use"separate"loader"nodes"

Some database are symmetric (all nodes are the same). Some allow mixed worker node sizes. Some are leaderless.

Some problems with leaders, loaders, e.g. less automated management of the environment, treating bottlenecks

Page 53: Fit For Purpose: The New Database Revolution Findings Webcast

Key%to%MPP:%data%distribu<on%

Slide 39 Copyright"Third"Nature,"Inc."

Table data is evenly spread across all nodes.

The good: scalability to petabyte range, much faster filtering and selection on scans.

The bad: data skew (values, not rowcounts), aggregate function bottlenecks, concurrency challenges, complex multi-table joins with unlike distributions.

Single logical view of a table

Page 54: Fit For Purpose: The New Database Revolution Findings Webcast

MPP%challenges%mostly%hinge%on%data%distribu<on%

Imagine"fact"&"dim"tables"spread"across"all"nodes."

You"need"to"get"dim"data"to"each"node"to"join"with"fact"rows"stored"there."

Cross?node"joins"result"in"data"shipping."This"is"where"inter?node"latency,"data"skew,"node"skew"can"bog"down"query"performance."

The"real"test"of"an"MPP"database"is"not"how"fast"it"can"scan"data."That’s"easy."Test"joins"in"a"PoC."

Fact tb

Dim tb

Node 1

Fact tb

Dim tb

Node 2

Page 55: Fit For Purpose: The New Database Revolution Findings Webcast

MATCHING%PROBLEMS%TO%TECHNOLOGIES%

Page 56: Fit For Purpose: The New Database Revolution Findings Webcast

Solving%the%Problem%Depends%on%the%Diagnosis%

Page 57: Fit For Purpose: The New Database Revolution Findings Webcast

Three%General%Workloads%

Online"Transac5on"Processing"▪  Read,"write,"update"▪  User"concurrency"is"the"common"performance"limiter"

▪  Low"data,"compute"complexity"

Business"Intelligence"/"Data"warehousing"▪  Assumed"to"be"read?only,"but"really"read"heavy,"write"heavy,"usually"separated"in"5me"

▪  Data"size"is"the"common"performance"limiter"

▪  High"data"complexity,"low"compute"complexity"

Analy5cs"▪  Read,"write"▪  Data"size"and"complexity"of"algorithm"are"the"limiters"

▪  Moderate"data","high"compute"complexity"

Page 58: Fit For Purpose: The New Database Revolution Findings Webcast

Three%General%Workloads%

But…"

BI"is"not"read"only"

OLTP"is"not"write?only"

Analy5cs"is"not"purely"computa5on"

Page 59: Fit For Purpose: The New Database Revolution Findings Webcast

Types%of%workloads%

Write?biased:""▪ OLTP"▪ OLTP,"batch"▪ OLTP,"lite"▪ Object"persistence"▪ Data"ingest,"batch"▪ Data"ingest,"real?5me"

Read?biased:"▪ Query"▪ Query,"simple"retrieval"▪ Query,"complex"▪ Query?hierarchical"/"object"/"network"

▪ Analy5c"

Mixed

Inline analytic execution, operational BI

Page 60: Fit For Purpose: The New Database Revolution Findings Webcast

What%you%need%depends%on%workload%&%need%

Op5mizing"for:"▪  Response"5me?"▪  Throughput?"▪  both?"

Concerned"about"rapid"growth"in"data?"

Unpredictable"spikes"in"use?"

Bulk"loads"or"incremental"inserts"and/or"updates?"

Page 61: Fit For Purpose: The New Database Revolution Findings Webcast

Important%workload%parameters%to%know%

•  Read?intensive""vs."write?intensive"

Page 62: Fit For Purpose: The New Database Revolution Findings Webcast

Important%workload%parameters%to%know%

•  Read?intensive""vs."write?intensive"•  Mutable"vs."immutable"data"

Page 63: Fit For Purpose: The New Database Revolution Findings Webcast

Important%workload%parameters%to%know%

•  Read?intensive""vs."write?intensive"•  Mutable"vs."immutable"data"

•  Immediate"vs."eventual"consistency"

Page 64: Fit For Purpose: The New Database Revolution Findings Webcast

Important%workload%parameters%to%know%

•  Read?intensive""vs."write?intensive"•  Mutable"vs."immutable"data"

•  Immediate"vs."eventual"consistency"

•  Short"vs."long"access"latency"

Page 65: Fit For Purpose: The New Database Revolution Findings Webcast

Important%workload%parameters%to%know%

•  Read?intensive""vs."write?intensive"•  Mutable"vs."immutable"data"

•  Immediate"vs."eventual"consistency"

•  Short"vs."long"data"latency"•  Predictable"vs."unpredictable"data"access"pagerns"

Page 66: Fit For Purpose: The New Database Revolution Findings Webcast

Important%workload%parameters%to%know%

•  Read?intensive""vs."write?intensive"•  Mutable"vs."immutable"data"

•  Immediate"vs."eventual"consistency"

•  Short"vs."long"data"latency"•  Predictable"vs."unpredictable"data"access"pagerns"•  Simple"vs."complex"data"types"

Page 67: Fit For Purpose: The New Database Revolution Findings Webcast

You"must"understand"your"workload"mix"?"throughput"and"response"5me"requirements"aren’t"enough."▪  100"simple"queries"accessing"month?to?date"data"

▪  90"simple"queries"accessing"month?to?date"data"and"10"complex"queries"using"two"years"of"history"

▪  Hazard"calcula5on"for"the"en5re"customer"master"

▪  Performance"problems"are"rarely"due"to"a"single"factor.""

Page 68: Fit For Purpose: The New Database Revolution Findings Webcast

Two%useful%concepts%to%characterize%queries%

Selec7vity"–"The"restric5veness"of"a"query"when"accessing"data."A"highly"selec5ve"query"filters"out"most"rows."Low"selec5ve"queries"read"most"of"the"rows."

"High "Low"SELECT SUM(salary) FROM emp WHERE ID = 1

SELECT SUM(salary) FROM emp

Page 69: Fit For Purpose: The New Database Revolution Findings Webcast

Two%useful%concepts%to%characterize%queries%

Retrieval"–"The"restric5veness"of"a"query"when"returning"data."High"retrieval"brings"back"most"of"the"rows."Low"retrieval"brings"back"rela5vely"few"rows."

"High "Low"SELECT name, salary FROM emp

SELECT SUM(salary) FROM emp

Page 70: Fit For Purpose: The New Database Revolution Findings Webcast

Selec<vity%and%number%of%columns%queried%

Row"store"or"column"store,"indexed"or"not?"

Chart from “The Mimicking Octopus: Towards a one-size-fits-all Database Architecture”, Alekh Jindal

Page 71: Fit For Purpose: The New Database Revolution Findings Webcast

Characteris<cs%of%query%workloads%

Workload% Selec<vity% Retrieval% Repe<<on% Complexity%

Repor<ng%/%BI% Moderate% Low% Moderate% Moderate%

Dashboards%/%scorecards%

Moderate% Low% High% Low%

Ad_hoc%query%and%analysis%

Low%to%high%

Moderate%to%low%

Low% Low%to%moderate%

Analy<cs%(batch)% Low% High% Low%to%High% Low*%

Analy<cs%(inline)% High% Low% High% Low*%

Opera<onal%/%embedded%BI%

High% Low% High% Low%

* Low for retrieving the data, high if doing analytics in SQL

Page 72: Fit For Purpose: The New Database Revolution Findings Webcast

Characteris<cs%of%read_write%workloads%

Workload% Selec<vity% Retrieval% Repe<<on% Complexity%

Online%OLTP% High% Low% High% Low%

Batch%OLTP% Moderate%to%low%

Moderate%to%high%

High% Moderate%to%high%

Object%persistence%

High% Low% High% Low%

Bulk%ingest% Low%(write)% n/a% High% Low%

Real<me%ingest% High%(write)% n/a% High% Low%

With ingest workloads we’re dealing with write-only, so selectivity and retrieval don’t apply in the same way, instead it’s write volume.

Page 73: Fit For Purpose: The New Database Revolution Findings Webcast

Workload%parameters%and%DB%types%at"data"scale"

Workload%parameters%

Write_biased%

Read_biased%

Updateable%data%

Eventual%consistency%ok?%

Un_predictable%query%path%

Compute%intensive%

Standard%RDBMS%

Parallel%RDBMS%

NoSQL%(kv,%dht,%obj)%

Hadoop*%

Streaming%database%

You see the problem: it’s an intersection of multiple parameters, and this chart only includes the first tier of parameters. Plus, workload factors can completely invert these general rules of thumb.

Page 74: Fit For Purpose: The New Database Revolution Findings Webcast

Workload%parameters%and%DB%types%at"data"scale"

Workload%parameters%

Complex%queries%

Selec<ve%queries%

Low%latency%queries%

High%concurrency%

High%ingest%rate%

Standard%RDBMS%

Parallel%RDBMS%

NoSQL%(kv,%dht,%obj)%

Hadoop%

Streaming%database%

You have to look at the combination of workload factors: data scale, concurrency, latency & response time, then chart the parameters.

Page 75: Fit For Purpose: The New Database Revolution Findings Webcast

Problem:%Architecture%Can%Define%Op<ons%

Page 76: Fit For Purpose: The New Database Revolution Findings Webcast

A%general%rule%for%the%read_write%axes%

As"workloads"increase"in"both"intensity"and"complexity,"we"move"into"a"realm"of"specialized"databases"adapted"to"specific"workloads."

Write intensity

Read intensity

OldSQL

NewSQL

NoSQL

Page 77: Fit For Purpose: The New Database Revolution Findings Webcast

In%general…%

Rela5onal"row"store"databases"for"conven5onally"tooled"low"to"mid?scale"OLTP"Rela5onal"databases"for"ACID"requirements"

Parallel"databases"(row"or"column)"for"unpredictable"or"variable"query"workloads"Specialized"databases"for"complex"data"query"workjloads"

NoSQL"(KVS,"DHT)"for"high"scale"OLTP"NoSQL"(KVS,"DHT)"for"low"latency"read?mostly"data"access"Parallel"databases"(row"or"column)"for"analy5c"workloads"over"tabular"data"NoSQL"/"Hadoop"for"batch"analy5c"workloads"over"large"data"volumes"

Page 78: Fit For Purpose: The New Database Revolution Findings Webcast

How To Select A Database

Wednesday, April 25, 12

Page 79: Fit For Purpose: The New Database Revolution Findings Webcast

Wednesday, April 25, 12

Page 80: Fit For Purpose: The New Database Revolution Findings Webcast

How To Select A Database - (1)1.What are the data management requirements and policies (if any) in

respect of:

- Data security (including regulatory requirements)?

- Data cleansing?

- Data governance?

- Deployment of solutions in the cloud?

- If a deployment environment is mandated, what are its technical characteristics and limitations? Best of breed, no standards for anything, “polyglot persistence” = silos on steroids, data integration challenges, shifting data movement architectures

2.What kind of data will be stored and used?

- Is it structured or unstructured?

- Is it likely to be one big table or many tables?

Wednesday, April 25, 12

Page 81: Fit For Purpose: The New Database Revolution Findings Webcast

How To Select A Database - (2)3.What are the data volumes expected to be?

- What is the expected daily ingest rate?

- What will the data retention/archiving policy be?

- How big do we expect the database to grow to? (estimate a range).4. What are the applications that will use the database?

- Estimate by user numbers and transaction numbers

- Roughly classify transactions as OLTP, short query, long query, long query with analytics.

- What are the expectations in respect of growth of usage (per user) and growth of user population?

5.What are the expected service levels?

- Classify according to availability service levels

- Classify according to response time service levels

- Classify on throughput where appropriate

Wednesday, April 25, 12

Page 82: Fit For Purpose: The New Database Revolution Findings Webcast

How To Select A Database - (3)6.What is the budget for this project and what does that cover?7.What is the outline project plan?

- Timescales

- Delivery of benefits

- When are costs incurred?

8.Who will make up the project team?

- Internal staff

- External consultants

- Vendor consultants

9.What is the policy in respect of external support, possibly including vendor consultancy for the early stages of the project?

Wednesday, April 25, 12

Page 83: Fit For Purpose: The New Database Revolution Findings Webcast

How To Select A Database - (4)10.What are the business benefits?

- Which ones can be quantified financially?

- Which ones can only be guessed at (financially)?

- Are there opportunity costs?

Wednesday, April 25, 12

Page 84: Fit For Purpose: The New Database Revolution Findings Webcast

A random selection of databasesSybase IQ, ASETeradata, Aster DataOracle, RACMicrosoft SQLServer, PDWIBM DB2s, NetezzaParaccelKognitioEMC/GreenplumOracle ExadataSAP HANAInfobrightMySQLMarkLogicTokyo Cabinet

EnterpriseDB LucidDBVectorwiseMonetDBExasolIlluminateVerticaInfiniDB1010 DataSANDEndecaXtreme DataIMSHive

AlgebraixIntersystems CachéStreambaseSQLStreamCoral8IngresPostgresCassandraCouchDBMongoHbaseRedisRainStorScalaris

And a few hundred more…Wednesday, April 25, 12

Page 85: Fit For Purpose: The New Database Revolution Findings Webcast

Product%selec<on%op<ons%

The"Subtrac5on"Model"▪  Start"with"a"full"set,"remove"what’s"bad,"evaluate"the"remainder"▪ Conven5onal"analyst"model"

▪ Works"best"with"a"stable"market"

The"Addi5on"Model"▪  Start"with"an"empty"set,"add"what’s"good,"evaluate"the"results"

▪  The"designer"model"▪ Works"best"in"an"emerging"or"changing"market"

Page 86: Fit For Purpose: The New Database Revolution Findings Webcast

Product Selection

Preliminary investigation

Short-list (usually arrived at by elimination)

Be sure to set the goals and control the process.

Evaluation by technical analysis and modeling

Evaluation by proof of concept.

Do not be afraid to change your mind

Negotiation

Wednesday, April 25, 12

Page 87: Fit For Purpose: The New Database Revolution Findings Webcast

Conclusion

Wherein all is revealed, or ignorance exposed

Wednesday, April 25, 12

Page 88: Fit For Purpose: The New Database Revolution Findings Webcast

Wednesday, April 25, 12

Page 89: Fit For Purpose: The New Database Revolution Findings Webcast

Thank YouFor YourAttention

Wednesday, April 25, 12