How to Select an Analytic DBMS

39
How to Select an Analytic DBMS Overview, checklists, and tips by Curt A. Monash, Ph.D. President, Monash Research Editor, DBMS2 contact @monash.com http://www.monash.com http://www.DBMS2.com

description

 

Transcript of How to Select an Analytic DBMS

Page 1: How to Select an Analytic DBMS

How to Select an Analytic DBMS

Overview, checklists, and tips

byCurt A. Monash, Ph.D.

President, Monash ResearchEditor, DBMS2

contact @monash.comhttp://www.monash.comhttp://www.DBMS2.com

Page 2: How to Select an Analytic DBMS

Curt Monash

Analyst since 1981, own firm since 1987 Covered DBMS since the pre-relational days Also analytics, search, etc.

Publicly available research Blogs, including DBMS2 (www.DBMS2.com -- the

source for most of this talk) Feed at www.monash.com/blogs.html White papers and more at www.monash.com

User and vendor consulting

Page 3: How to Select an Analytic DBMS

Our agenda

Why are there such things as specialized analytic DBMS?

What are the major analytic DBMS product alternatives?

What are the most relevant differentiations among analytic DBMS users?

What’s the best process for selecting an analytic DBMS?

Page 4: How to Select an Analytic DBMS

Why are there specialized analytic DBMS?

General-purpose database managers are optimized for updating short rows …

… not for analytic query performance 10-100X price/performance differences

are not uncommon

At issue is the interplay between storage, processors, and RAM

Page 5: How to Select an Analytic DBMS

Moore’s Law, Kryder’s Law, and a huge exception

Growth factors:

Transistors/chip:

>100,000 since 1971 Disk density:

>100,000,000 since 1956 Disk speed:

12.5 since 1956

The disk speed barrier dominates everything!

0%

5%

10%

15%

20%

25%

30%

35%

40%

45%

Compound Annual Growth Rate

Transistors/Chipssince 1971

Disk Density since 1956

Disk Speed since 1956

04/08/23 DRAFT!! THIRD TEST!!

Page 6: How to Select an Analytic DBMS

Software strategies to optimize analytic I/O

Minimize data returned Classic query optimization

Minimize index accesses Page size

Precalculate results Materialized views OLAP cubes

Return data sequentially Store data in columns Stash data in RAM

Page 7: How to Select an Analytic DBMS

Hardware strategies to optimize analytic I/O

Lots of RAM Parallel disk access!!! Lots of networking

Tuned MPP (Massively Parallel Processing) is ideal.

“Recommended configurations” are a mixed bag.

Page 8: How to Select an Analytic DBMS

Specialty hardware strategies

Custom or unusual chips (rare) Custom or unusual interconnects Fixed configurations of common parts

Appliances or recommended configurations

And there’s also SaaS.

Page 9: How to Select an Analytic DBMS

18 contenders (and there are more)

Aster Data Dataupia Exasol Greenplum HP Neoview IBM DB2 BCUs Infobright/MySQL Kickfire/MySQL Kognitio Microsoft Madison

Netezza Oracle Exadata Oracle w/o Exadata ParAccel SQL Server w/o

Madison Sybase IQ Teradata Vertica

Page 10: How to Select an Analytic DBMS

General areas of feature differentiation

Most influenced by architecture Query performance Update/load performance Alternate datatypes

Most influenced by product maturity Compatibilities Advanced analytics Manageability and availability Encryption and security

Page 11: How to Select an Analytic DBMS

Major analytic DBMS product groupings

Architecture is a good first categorization

Traditional OLTP Row-based MPP Columnar (Not covered tonight) MOLAP/array-based

Page 12: How to Select an Analytic DBMS

Traditional OLTP examples

Oracle (especially pre-Exadata) IBM DB2 (especially mainframe) Microsoft SQL Server (pre-Madison)

Page 13: How to Select an Analytic DBMS

Analytic optimizations for OLTP DBMS

Performance Two major kinds of precalculation

Star indexes Materialized views

Other specialized indexes Query optimization tools

Other OLAP extensions SQL 2003 Other embedded analytics

Page 14: How to Select an Analytic DBMS

Drawbacks

Complexity and people cost Hardware cost Software cost Absolute performance

Page 15: How to Select an Analytic DBMS

Legitimate use scenarios

When TCO isn’t an issue Undemanding performance (and therefore

administration too) When specialized features matter

OLTP-like Integrated MOLAP Edge-case analytics

Rigid enterprise standards Small enterprise/true single-instance

Page 16: How to Select an Analytic DBMS

Row-based MPP examples

Teradata DB2 (open systems version) Netezza Oracle Exadata (sort of) DATAllegro/Microsoft Madison Greenplum Aster Data Kognitio HP Neoview

Page 17: How to Select an Analytic DBMS

Typical design choices in row-based MPP

“Random” (hashed or round-robin) data distribution among nodes

Large block sizes Suitable for scans rather than random accesses

Limited indexing alternatives Or little optimization for using the full boat

Carefully balanced hardware High-end networking

Page 18: How to Select an Analytic DBMS

Tradeoffs among row MPP alternatives

Enterprise standards Vendor size Hardware lock-in Total system price Features

Page 19: How to Select an Analytic DBMS

Columnar DBMS examples

Sybase IQ Vertica InfoBright SAND ParAccel Kickfire Exasol MonetDB SAP BI Accelerator (sort of)

Page 20: How to Select an Analytic DBMS

Columnar pros and cons

Bulk retrieval is faster Pinpoint I/O is slower Compression is easier Memory-centric processing is easier MPP is not as crucial

Being columnar reduces I/O So does (better) compression

Page 21: How to Select an Analytic DBMS

Segmentation made (too) simple

One database to rule them all One analytic database to rule them all Frontline analytic database Very, very big analytic database Big analytic database handled very cost-

effectively

Page 22: How to Select an Analytic DBMS

Basics of systematic segmentation

Use cases Metrics Platform preferences

There isn’t just one checklist.

Page 23: How to Select an Analytic DBMS

Use cases – a first cut

Light reporting Diverse EDW Big Data Operational analytics

Page 24: How to Select an Analytic DBMS

Metrics – a first cut

Total raw/user data Below 1-2 TB, references abound 10 TB is another major breakpoint

Total concurrent users 5, 15, 50, or 500?

Data freshness Hours Minutes Seconds

Page 25: How to Select an Analytic DBMS

Basic platform issues

Enterprise standards Appliance-friendliness Need for MPP? Cloud/SaaS

Page 26: How to Select an Analytic DBMS

The selection process in a nutshell

Figure out what you’re trying to buy Make a shortlist Do free POCs* Evaluate and decide

*The only part that’s even slightly specific to the analytic DBMS category

Page 27: How to Select an Analytic DBMS

Figure out what you’re trying to buy

Inventory your use cases Current Known future Wish-list/dream-list future

Set constraints People and platforms Money

Establish target SLAs Must-haves Nice-to-haves

Page 28: How to Select an Analytic DBMS

Use-case checklist -- generalities

Database growth As time goes by … More detail New data sources

Users (human) Users/usage (automated) Freshness (data and query results)

Page 29: How to Select an Analytic DBMS

Use-case checklist – traditional BI

Reports Today Future

Dashboards and alerts Today Future Latency

Ad-hoc Users Now that we have great response time …

Page 30: How to Select an Analytic DBMS

Use-case checklist – predictive analytics

How much do you think it would improve results to Run more models? Model on more data? Add more variables? Increase model complexity?

Which of those can the DBMS help with anyway?

What about scoring? Real-time Other latency issues

Page 31: How to Select an Analytic DBMS

SLA realism

What kind of turnaround truly matters? Customer or customer-facing users Executive users Analyst users

How bad is downtime? Customer or customer-facing users Executive users Analyst users

Page 32: How to Select an Analytic DBMS

Short list constraints

Cash cost But purchases are heavily negotiated

Deployment effort Appliances can be good

Platform politics You might as well consider incumbent(s) Appliances can be frowned on

Page 33: How to Select an Analytic DBMS

Filling out the shortlist

Who matches your requirements in theory?

What kinds of evidence do you require? References?

How many? How relevant?

A careful POC? Analyst recommendations? General “buzz”?

Page 34: How to Select an Analytic DBMS

A checklist for shortlists

What’s your tolerance for specialized hardware? What’s your tolerance for set-up effort? What’s your tolerance for ongoing administration? What are your insert and update requirements? At what volumes will you run fairly simple

queries? What are your complex queries like? For which third-party tools do you need support?

and, most important,

Are you madly in love with your current DBMS?

Page 35: How to Select an Analytic DBMS

Proof-of-Concept basics

The better you match your use cases, the more reliable the POC is

Most of the effort is in the set-up You might as well do POCs for several

vendors – at (almost) the same time! Where is the POC being held?

Page 36: How to Select an Analytic DBMS

The three big POC challenges

Getting data Real?

Politics Privacy

Synthetic? Hybrid?

Picking queries And more?

Realistic simulation(s) Workload Platform Talent

Page 37: How to Select an Analytic DBMS

POC tips

Don’t underestimate requirements Don’t overestimate requirements Get SOME data ASAP Don’t leave the vendor in control Test what you’ll actually be buying Use the baseball bat

Page 38: How to Select an Analytic DBMS

Evaluate and decide

It all comes down to

Cost Speed Risk

and in some cases

Time to value Upside

Page 39: How to Select an Analytic DBMS

Further information

Curt A. Monash, Ph.D.President, Monash Research

Editor, DBMS2

contact @monash.comhttp://www.monash.comhttp://www.DBMS2.com