Amazon Redshift Performance Metrics vs Competitors

5
Amazon Redshift Performance Metrics vs competitors Self-Hosting According to Amazon’s calculation “it generally costs between $19,000 and $25,000 per terabyte per year, at list prices, to build and run a good-sized data warehouse on your own. Amazon Redshift, all-in, will cost you less than $1,000 per terabyte per year." Redshift vs other vendor offerings Redshift Teradata HP Vertica EMC GreenPlum Oracle Database Columnar Data Storage Available Available Available Available Available Advanced Compressio n Available Available Available Available Available Supports ‘Sort key’ for batter dynamic sorts Supported Not Supported Not Supported Not Supported Not Supported Can run on ‘Virtualiz ed Platforms’ Yes. Since Amazon Redshift is built upon PostgreSQL it has inherent capability to run on commodity machines running virtual platforms Informatio n not Available Not Supported Vertica 6.1 does support Hardware Virtual Machine but nowhere close to Redshift’s offering of Data as a Service Informatio n not Available Informatio n not Available Index Support Not Available Supported Not Supported No Informatio Supported

description

Amazon Redshift Performance Metrics vs Competitors

Transcript of Amazon Redshift Performance Metrics vs Competitors

Page 1: Amazon Redshift Performance Metrics vs Competitors

Amazon Redshift Performance Metrics vs competitors

Self-HostingAccording to Amazon’s calculation “it generally costs between $19,000 and $25,000 per terabyte per year, at list prices, to build and run a good-sized data warehouse on your own. Amazon Redshift, all-in, will cost you less than $1,000 per terabyte per year."

Redshift vs other vendor offerings

Redshift Teradata HP Vertica EMC GreenPlum

Oracle Database

Columnar Data Storage

Available Available Available Available Available

Advanced Compression

Available Available Available Available Available

Supports ‘Sort key’ for batter dynamic sorts

Supported Not Supported Not Supported Not Supported Not Supported

Can run on ‘Virtualized Platforms’

Yes.Since Amazon

Redshift is built upon

PostgreSQL it has inherent capability to

run on commodity machines

running virtual platforms

Information not Available

Not SupportedVertica 6.1

does support Hardware

Virtual Machine but

nowhere close to Redshift’s offering of Data as a Service

Information not Available

Information not Available

Index Support Not Available Supported Not Supported No Information

Available

Supported

Redshift vs the Hadoop Open Source Platform

Apache Hadoop is an open-source software framework for distributed storage and distributed processing of Big Data.

Redshift Hadoop

Nodes Possible 100 Unlimited

Max Node Size 16 Tb Unlimited

Performance Performs better at Terabyte level data( which is usually sufficient for most businesses)

Performs better at Petabyte level data( only relevant for large businesses which will

Page 2: Amazon Redshift Performance Metrics vs Competitors

anyways want to maintain their own warehouse)

Ease of Migration As it uses PostgreSQL as the underlying database and SQL queries it is already familiar to most developers

System administrators will need to learn Hadoop architecture and tools as they are quite different and developers will need to learn coding in Pig or MapReduce.

Data formats accepted Limited. Presently no support for XML, data arrays etc

All datatypes supported

Total Cost of Running Hadoop vs Redshift on a per Query basis

Page 3: Amazon Redshift Performance Metrics vs Competitors

Thus we can conclude that Redshift is more suited to most businesses except the very large ones (like a database for entire Tata Group) where Hadoop might be a better choice albeit at a higher cost than Redshift.

Query Performance with other technologies

Some Additional Information which I thought might be useful for other parts of the project

Page 4: Amazon Redshift Performance Metrics vs Competitors

The distinction between the previously available Amazon Relational Database Service (RDS) and Redshift is that the latter is exclusively for warehousing and analytics (as opposed to transactional database uses) and is capable of big-data scale. "RDS is based on Microsoft SQL Server, Oracle and MySQL, and those aren't systems that are designed to do petabyte-scale data warehousing,"

http://www.informationweek.com/software/information-management/amazon-debuts-low-cost-big-data-warehousing/d/d-id/1107568?

http://dwh-bi-etl-reviews.quora.com/Amazon-Redshift-%E2%80%93-Differentiators-and-Limitations

http://www.vertica.com/2010/11/23/life-beyond-indices-the-query-benefits-of-storing-sorted-data/

http://aws.amazon.com/documentation/redshift/

http://snowplowanalytics.com/blog/2013/09/27/how-much-does-snowplow-cost-to-run/