PostgreSQL and NoSQL · © 2ndQuadrant 2011 Title:logo_noText.eps Creator:Adobe Illustrator(R) 13.0...

26
© 2ndQuadrant 2011 Title:log Creator Creatio PostgreSQL and NoSQL

Transcript of PostgreSQL and NoSQL · © 2ndQuadrant 2011 Title:logo_noText.eps Creator:Adobe Illustrator(R) 13.0...

Page 1: PostgreSQL and NoSQL · © 2ndQuadrant 2011 Title:logo_noText.eps Creator:Adobe Illustrator(R) 13.0 CreationDate:11/17/10 LanguageLevel:2 Scalability • If all access to data is

© 2ndQuadrant 2011

Title:logo_noText.eps Creator:Adobe Illustrator(R) 13.0 CreationDate:11/17/10 LanguageLevel:2

PostgreSQLand NoSQL

Page 2: PostgreSQL and NoSQL · © 2ndQuadrant 2011 Title:logo_noText.eps Creator:Adobe Illustrator(R) 13.0 CreationDate:11/17/10 LanguageLevel:2 Scalability • If all access to data is

© 2ndQuadrant 2011

Title:logo_noText.eps Creator:Adobe Illustrator(R) 13.0 CreationDate:11/17/10 LanguageLevel:2

Simon Riggs

• 30 years development experience– Binary toggle, Pilot, Assembler, Basic, Fortran,

Prolog, Pascal, COBOL, C, perl, bash, DCL, JCL, REXX, PL/SQL, blah blah blah

• 25 years professional database experience– Developer on pre-SQL relational analysis tool

• 20 years as a Consulting Database Architect– Architect of first British Airways Data Warehouse

• Database Architect at Top UK Bank/Mortgager

• Certifiable in 4 separate RDBMS

Page 3: PostgreSQL and NoSQL · © 2ndQuadrant 2011 Title:logo_noText.eps Creator:Adobe Illustrator(R) 13.0 CreationDate:11/17/10 LanguageLevel:2 Scalability • If all access to data is

© 2ndQuadrant 2011

Title:logo_noText.eps Creator:Adobe Illustrator(R) 13.0 CreationDate:11/17/10 LanguageLevel:2

Old **** !

Page 4: PostgreSQL and NoSQL · © 2ndQuadrant 2011 Title:logo_noText.eps Creator:Adobe Illustrator(R) 13.0 CreationDate:11/17/10 LanguageLevel:2 Scalability • If all access to data is

© 2ndQuadrant 2011

Title:logo_noText.eps Creator:Adobe Illustrator(R) 13.0 CreationDate:11/17/10 LanguageLevel:2

Historical Perspective

• No data storage

• Logical Key stores

• Codasyl

• Relational

• SQL

• Object Relational

• Nested Relational

• Parallel DBMS

• ROLAP

• Column stores

• Streaming data

• EAI, ETL, ELT

• XML & Documents

• GIS systems

Page 5: PostgreSQL and NoSQL · © 2ndQuadrant 2011 Title:logo_noText.eps Creator:Adobe Illustrator(R) 13.0 CreationDate:11/17/10 LanguageLevel:2 Scalability • If all access to data is

© 2ndQuadrant 2011

Title:logo_noText.eps Creator:Adobe Illustrator(R) 13.0 CreationDate:11/17/10 LanguageLevel:2

Acquisition and Assimilation

• TCP/IP

• Filesystems

• Tools

• Drivers

• SQL

• Gateways

• Database languages

• GIS

Page 6: PostgreSQL and NoSQL · © 2ndQuadrant 2011 Title:logo_noText.eps Creator:Adobe Illustrator(R) 13.0 CreationDate:11/17/10 LanguageLevel:2 Scalability • If all access to data is

© 2ndQuadrant 2011

Title:logo_noText.eps Creator:Adobe Illustrator(R) 13.0 CreationDate:11/17/10 LanguageLevel:2

Natural and Inevitable

• Databases become Data Operating Systems

• Teradata now includes– Hybrid row/column store

• Oracle now includes– Replication technology from Golden Gate

– InMemory technology from TimesTen

– MOLAP technology from Express

– MVCC technology from PostgreSQL

Page 7: PostgreSQL and NoSQL · © 2ndQuadrant 2011 Title:logo_noText.eps Creator:Adobe Illustrator(R) 13.0 CreationDate:11/17/10 LanguageLevel:2 Scalability • If all access to data is

© 2ndQuadrant 2011

Title:logo_noText.eps Creator:Adobe Illustrator(R) 13.0 CreationDate:11/17/10 LanguageLevel:2

Positioning

SQLNoSQL

OLTP

Analytic

Page 8: PostgreSQL and NoSQL · © 2ndQuadrant 2011 Title:logo_noText.eps Creator:Adobe Illustrator(R) 13.0 CreationDate:11/17/10 LanguageLevel:2 Scalability • If all access to data is

© 2ndQuadrant 2011

Title:logo_noText.eps Creator:Adobe Illustrator(R) 13.0 CreationDate:11/17/10 LanguageLevel:2

Positioning

SQLNoSQL

OLTP

Analytic

Page 9: PostgreSQL and NoSQL · © 2ndQuadrant 2011 Title:logo_noText.eps Creator:Adobe Illustrator(R) 13.0 CreationDate:11/17/10 LanguageLevel:2 Scalability • If all access to data is

© 2ndQuadrant 2011

Title:logo_noText.eps Creator:Adobe Illustrator(R) 13.0 CreationDate:11/17/10 LanguageLevel:2

Positioning

SQLNoSQL

OLTP

Analytic

OracleBerkeleyDB

Express OLAP

Parallel Query

Page 10: PostgreSQL and NoSQL · © 2ndQuadrant 2011 Title:logo_noText.eps Creator:Adobe Illustrator(R) 13.0 CreationDate:11/17/10 LanguageLevel:2 Scalability • If all access to data is

© 2ndQuadrant 2011

Title:logo_noText.eps Creator:Adobe Illustrator(R) 13.0 CreationDate:11/17/10 LanguageLevel:2

Positioning

SQLNoSQL

OLTP

Analytic

SQLServer

Integrated OLAP

Parallel Query

Page 11: PostgreSQL and NoSQL · © 2ndQuadrant 2011 Title:logo_noText.eps Creator:Adobe Illustrator(R) 13.0 CreationDate:11/17/10 LanguageLevel:2 Scalability • If all access to data is

© 2ndQuadrant 2011

Title:logo_noText.eps Creator:Adobe Illustrator(R) 13.0 CreationDate:11/17/10 LanguageLevel:2

PostgreSQL includes

• Relational and SQL

• Object relational

• XML technology

• Extensible document datatypes (hstore)

• What isn't included? Need to look elsewhere to see what is available

Page 12: PostgreSQL and NoSQL · © 2ndQuadrant 2011 Title:logo_noText.eps Creator:Adobe Illustrator(R) 13.0 CreationDate:11/17/10 LanguageLevel:2 Scalability • If all access to data is

© 2ndQuadrant 2011

Title:logo_noText.eps Creator:Adobe Illustrator(R) 13.0 CreationDate:11/17/10 LanguageLevel:2

Early Non SQL Datastores

• Berkeley-DB 1986-'06

• C-ISAM 1980s

• FAME – time series database

• BLAST - Fast biodata – specialist formats

• Express 1970-1995+

• Ab Initio 2000+

• SAS/S-Plus 1966+

Page 13: PostgreSQL and NoSQL · © 2ndQuadrant 2011 Title:logo_noText.eps Creator:Adobe Illustrator(R) 13.0 CreationDate:11/17/10 LanguageLevel:2 Scalability • If all access to data is

© 2ndQuadrant 2011

Title:logo_noText.eps Creator:Adobe Illustrator(R) 13.0 CreationDate:11/17/10 LanguageLevel:2

Later Non SQL Datastores

• Key access: Mongo, Cassandra, BigTable, SDS

• Specialist data formats

• MOLAP – SQLServer

• Parallel file scans – Map/Reduce - Hadoop

Page 14: PostgreSQL and NoSQL · © 2ndQuadrant 2011 Title:logo_noText.eps Creator:Adobe Illustrator(R) 13.0 CreationDate:11/17/10 LanguageLevel:2 Scalability • If all access to data is

© 2ndQuadrant 2011

Title:logo_noText.eps Creator:Adobe Illustrator(R) 13.0 CreationDate:11/17/10 LanguageLevel:2

Other database products

• VoltDB

• SciDB

• MayBMS

• CopperEye

• MarkLogic XML server

• JackRabbit

• Distributed hash table– CHORD etc

Page 15: PostgreSQL and NoSQL · © 2ndQuadrant 2011 Title:logo_noText.eps Creator:Adobe Illustrator(R) 13.0 CreationDate:11/17/10 LanguageLevel:2 Scalability • If all access to data is

© 2ndQuadrant 2011

Title:logo_noText.eps Creator:Adobe Illustrator(R) 13.0 CreationDate:11/17/10 LanguageLevel:2

Positioning

SQLNoSQL

OLTP

Analytic

PostgreSQL

Teradata, Greenplum

Page 16: PostgreSQL and NoSQL · © 2ndQuadrant 2011 Title:logo_noText.eps Creator:Adobe Illustrator(R) 13.0 CreationDate:11/17/10 LanguageLevel:2 Scalability • If all access to data is

© 2ndQuadrant 2011

Title:logo_noText.eps Creator:Adobe Illustrator(R) 13.0 CreationDate:11/17/10 LanguageLevel:2

Positioning

SQLNoSQL

OLTP

Analytic

PostgreSQL

Teradata, GreenplumHadoop etc

Key-ValueMongo, SDS, Riak,

OpenLDAP

Page 17: PostgreSQL and NoSQL · © 2ndQuadrant 2011 Title:logo_noText.eps Creator:Adobe Illustrator(R) 13.0 CreationDate:11/17/10 LanguageLevel:2 Scalability • If all access to data is

© 2ndQuadrant 2011

Title:logo_noText.eps Creator:Adobe Illustrator(R) 13.0 CreationDate:11/17/10 LanguageLevel:2

Positioning

SQLNoSQL

OLTP

AnalyticHadoop etc

Key-ValueMongo, SDS, Riak,

OpenLDAP PostgreSQL

Greenplum, Teradata

Page 18: PostgreSQL and NoSQL · © 2ndQuadrant 2011 Title:logo_noText.eps Creator:Adobe Illustrator(R) 13.0 CreationDate:11/17/10 LanguageLevel:2 Scalability • If all access to data is

© 2ndQuadrant 2011

Title:logo_noText.eps Creator:Adobe Illustrator(R) 13.0 CreationDate:11/17/10 LanguageLevel:2

Differentiating factors

• Performance

• User API

• Scalability

• Data Model

Page 19: PostgreSQL and NoSQL · © 2ndQuadrant 2011 Title:logo_noText.eps Creator:Adobe Illustrator(R) 13.0 CreationDate:11/17/10 LanguageLevel:2 Scalability • If all access to data is

© 2ndQuadrant 2011

Title:logo_noText.eps Creator:Adobe Illustrator(R) 13.0 CreationDate:11/17/10 LanguageLevel:2

Performance Architecture

• SQL Parsing overhead

• Planning overhead

• Locking overhead 16%

• Execution overhead 14%

• Transaction Log overhead 11%

• In Memory database 34%

• “OLTP Through the Looking Glass, and What We Found There” Harizopolous et al, SIGMOD 2008

Page 20: PostgreSQL and NoSQL · © 2ndQuadrant 2011 Title:logo_noText.eps Creator:Adobe Illustrator(R) 13.0 CreationDate:11/17/10 LanguageLevel:2 Scalability • If all access to data is

© 2ndQuadrant 2011

Title:logo_noText.eps Creator:Adobe Illustrator(R) 13.0 CreationDate:11/17/10 LanguageLevel:2

User API

• SQL API– API that delivers SQL language calls

– Execute works, but Prepare/Execute is preferred

– Many functions, many options

– Too difficult for most developers

– Hence many SQL API wrappers andObject Relational Mappers

• Memcache– Simple API with very few calls

Page 21: PostgreSQL and NoSQL · © 2ndQuadrant 2011 Title:logo_noText.eps Creator:Adobe Illustrator(R) 13.0 CreationDate:11/17/10 LanguageLevel:2 Scalability • If all access to data is

© 2ndQuadrant 2011

Title:logo_noText.eps Creator:Adobe Illustrator(R) 13.0 CreationDate:11/17/10 LanguageLevel:2

Scalability

• If all access to data is by single records only,then we can spread data and data requests across multiple nodes

• The true benefit of a simple API is it keeps requests simple, so the workload scales

• This only works for equality conditions (=)

• As soon as you do an ordered index scan it becomes an all-node operation and that isnot scalable

Page 22: PostgreSQL and NoSQL · © 2ndQuadrant 2011 Title:logo_noText.eps Creator:Adobe Illustrator(R) 13.0 CreationDate:11/17/10 LanguageLevel:2 Scalability • If all access to data is

© 2ndQuadrant 2011

Title:logo_noText.eps Creator:Adobe Illustrator(R) 13.0 CreationDate:11/17/10 LanguageLevel:2

Data Model

• 3NF– Static Types

• NoSQL – 2NF – Just one table

– Dynamic Types

– Static Types

THING

Page 23: PostgreSQL and NoSQL · © 2ndQuadrant 2011 Title:logo_noText.eps Creator:Adobe Illustrator(R) 13.0 CreationDate:11/17/10 LanguageLevel:2 Scalability • If all access to data is

© 2ndQuadrant 2011

Title:logo_noText.eps Creator:Adobe Illustrator(R) 13.0 CreationDate:11/17/10 LanguageLevel:2

Scalability Restrictions

• You must access all your data by the Key only

– Like having a table with only one index

• Any other indexing needs to be built by you

– Can't just do a CREATE INDEX

• If you perform all node operations they are non-scalable and so get more expensive as you add nodes to the cluster

– Quickly use up all the resources on the cluster

• One table, one index...

• Just like putting your application on an LDAP server

Page 24: PostgreSQL and NoSQL · © 2ndQuadrant 2011 Title:logo_noText.eps Creator:Adobe Illustrator(R) 13.0 CreationDate:11/17/10 LanguageLevel:2 Scalability • If all access to data is

© 2ndQuadrant 2011

Title:logo_noText.eps Creator:Adobe Illustrator(R) 13.0 CreationDate:11/17/10 LanguageLevel:2

Scalability Restrictions

• If you are willing to accept those restrictions

• Why not use PL/Proxy?

• Same benefits, same restrictions as NoSQL

• But underneath, its PostgreSQL, so you have all the benefits of durability and functionality

Page 25: PostgreSQL and NoSQL · © 2ndQuadrant 2011 Title:logo_noText.eps Creator:Adobe Illustrator(R) 13.0 CreationDate:11/17/10 LanguageLevel:2 Scalability • If all access to data is

© 2ndQuadrant 2011

Title:logo_noText.eps Creator:Adobe Illustrator(R) 13.0 CreationDate:11/17/10 LanguageLevel:2

Conclusions

• Many NoSQL concepts already in PostgreSQL

• We have much to learn from other data store servers

• Further advances arriving soon

• Scaling databases causes some hard trade-offs

• NoSQL approaches do fit some applications

– PL/Proxy fits for all of those applications also

• General Purpose database applications don't need and can't use NoSQL

• PostgreSQL will continue to be an assimilation point for the best open source database tech

Page 26: PostgreSQL and NoSQL · © 2ndQuadrant 2011 Title:logo_noText.eps Creator:Adobe Illustrator(R) 13.0 CreationDate:11/17/10 LanguageLevel:2 Scalability • If all access to data is

© 2ndQuadrant 2011

Title:logo_noText.eps Creator:Adobe Illustrator(R) 13.0 CreationDate:11/17/10 LanguageLevel:2