PostgreSQL as an Alternative to MSSQL

Post on 16-Apr-2017

437 views 6 download

Transcript of PostgreSQL as an Alternative to MSSQL

Alexei KrasnerNov 2015

PostgreSQL as MSSQL Alternative

What is PostgreSQL▪ Powerful, open source object-relational database system.▪ 15 years of active development and strong reputation.▪ Runs on all major operating systems (Linux, Unix, Mac

OS, Windows…).▪ Enterprise class database.▪ Large and responsive community.▪ Winner of the 2015 Database Trends and Applications

Readers Choice:– The most advanced open source database.– Best relational database.

Lets Start With Standards▪ Fully ACID compliant.▪ Includes most of SQL:2008 data types along with

storage of binary objects.▪ Conforms to the ANSI-SQL:2008 standard:– Full support for subqueries (including sub-selects).– Read-Committed and serializable transaction isolation levels.– Full support for Primary keys, Foreign Keys, Joins, Views, Triggers,

Stored Procedures, Restrictions (check, unique and not null) and Cascading.

– Fully relational system catalog – multiple schema per database.▪ Native programming interfaces: Java, .NET, C/C++, Perl,

Python, ODBC

Continue With a Little of Splurging▪ Multi-Version Concurrency Control (MVCC).▪ Asynchronous Replication, Load Balancing and Online/Hot Backups with

Point in Time Recovery.▪ Write Ahead Logging – fault tolerance.▪ Performance:

– Sophisticated Query Planner/Optimizer.– Compound, Unique, Partial and functional indexes.

▪ Supports: – International character sets, multi-byte encodings, Unicode, locale awareness.– Built-in Types – Geospatial, XML, JSON\JSONB, Ranges and Arrays!– NoSQL – Key-Value store with incredible performance and Full Text Search.

▪ Highly customizable and extensible.

Before We Dive – Generalized Search Tree (GiST)▪ Advanced indexing system – different sorting and

searching algorithms:– B-tree, B+-tree, R-tree, Partial Sum trees, ranked B+-trees etc.– API for creating custom data types and extensible query methods

for search.▪ Decide WHAT to persist, HOW to persist and a way to

SEARCH for it.▪ Exceeds the general search algorithms using standard

B\R-trees.▪ Foundation for many public projects – OpenFTS and

PostGIS

Features Deep Dive

▪ MVCC▪ Partitioning▪ Useful Data Types– Date and Time– Interval– Array– Ranges– JSON– HSTORE– XML

▪ PostGIS – Geographic

▪ Full Text Search▪ Server Side

Programming▪ Backup and Restore▪ High Availability,

Load Balancing and Replication– Sharding

▪ Big Data Readiness

Multi Version Concurrency Control - MVCC▪ Reads should never block writes and

vice versa.▪ Each transaction sees a snapshot of

data (version).– Protection from viewing inconsistency –

transaction isolation.▪ Avoidance of explicit locking solutions

– minimize lock contention.▪ Table\Row level locking mechanism is

still available – although proper MVCC usage will provide performance benefits.

Partitioning – Table Inheritance▪ Support of basic table partitioning via the table

inheritance concept.– Includes known partitioning benefits:▪ Improved heavy load query performance (on a single partition).▪ Sequential scan of a partition instead of index usage.▪ Bulk loads and deletes accomplished by adding or removing partitions.▪ Infrequent data can be migrated to a cheaper\slower storage solution.

– Range Partitioning:▪ Table partitioned into “ranges” defined by a single\set key column (e.g.

dates).– List Partitioning:▪ Table partitioned into a list of discrete values as partitioning keys.

– Hundred partitions is an acceptable limit, thousands of partitions will crucially harm performance.

Useful Data Types▪ Date and Time – Date, Time, TimeStamp and

TimeStamp with zone.– Converted to and from Unix time.– Supports the INTERVAL type.– Very convenient casting and conversion to text.– Performance wise searching and sorting algorithms (including

zone\offset).▪ INTERVAL – representation of a period of time.– Possible negative interval values (e.g. year ago).– Intuitive arithmetic and persistence of time durations– Easy casting and converting to relevant types.– Performance wise searching and sorting algorithms on intervals.

Useful Data Types Cont.▪ Array – supported as first-class datatype (actual field in

a table).– Contain any datatype (sub arrays too).– Parameters to functions as an array.– Usages – Functions results, aggregations, get\set array of data in\

from the application.▪ Range – Supported as first-class datatype.– Put range on TIME, INT or NUMERIC as a single data value.– Possible dedicated indexes to support queries utilizing ranges.– Exposed methods to define custom ranges.

Useful Data Types Cont.▪ JSON – full support along with large dedicated set of utility

functions.– Known JSON\JSONB benefits – data transfer and integration

standard.– Transformation from\to types and tables.– Retrieval and construction of JSON data.– Parsing, casting and conversion.

▪ HSTORE – Fast key-value store as a datatype.– NoSQL capabilities – flexibility of schema-less data store.– Still ACID compliant.– Interchange data between JSON and HSTORE.

Useful Data Types Cont.▪ XML – Supported as a first-class datatype.– Check well formedness + type-safe operations.– Querying using Xpath.– Producing XML content, Predicates, Processing, Mapping tables to

XML etc.

PostGIS▪ Fully featured, reliable geospatial database project base on GiST

(Following ISO OGC)▪ SQL types and functions to manage vector geometries (spatial

data).▪ Capabilities:– Support for three dimensional data.– Support for geospatial formats (KML, GeoJSON)– Processing and analytics functions for vector and raster data.– Map “rastering” and geo queries.– Geo searches and reverse geo searches.

▪ Huge popularity and respect extension module – compered to ArcGIS

Full Text Search▪ Online indexing of data and relevance ranking for

database searches.▪ Good Enough:– Stemming– Ranking– Multilingual– Fuzzy searches (misspelling)\ Accent.

Server Side Programming▪ Super Extensible – functions, data types, procedural

languages, operators, aggregates etc.– Embedding Functions and Stored Procedures using procedural– PL/pgSQL, PL/Tcl, PL/Perl, PL/Python

▪ Triggers – tables, views and foreign tables.▪ Event Triggers – database global trigger.▪ Rule System – Query modification based on given rules.

Backup and Restore▪ Extremely flexible dump utility – migration, replication

and backups becomes more reliable, controllable and configurable.– Compressed format or plain SQL (human readable).– Single table or whole database cluster.

▪ Approaches:– SQL Dump – file with generated SQL commands. On restore the

backed up commands will be replayed.– File system level backup – direct copy of PostgreSQL data files.

Restore will include reattaching the data files.– Continuous archiving – backing up Write Ahead Log (WAL) files.

On restore log commands will be replayed.

High Availability, Load Balancing and ReplicationFeature Shared Disk

FailoverFile System Replication

Transaction Log Shipping

Trigger-Based Master-Standby Replication

Statement-Based Replication Middleware

Asynchronous Multimaster Replication

Synchronous Multimaster Replication

Most Common Implementation NAS DRBD Streaming Repl. Slony pgpool-II Bucardo  

Communication Method shared disk disk blocks WAL table rows SQL table rows table rows and row

locksNo special hardware required   X X X X X X

Allows multiple master servers         X X X

No master server overhead X   X   X    

No waiting for multiple servers X   with sync off X   X  

Master failure will never lose data X X with sync on   X   X

Standby accept read-only queries     with hot X X X X

Per-table granularity       X   X XNo conflict resolution necessary

X X X X     X

Sharding and Replication▪ Pure Sharding:– pg_shard – popular sharding extension for PostgreSQL.▪ Running on Linux!

– BDR/UDR Project – Bi-Directional Replication which adds multi-master replication to PostgreSQL.▪ Running on Linux! Migration to windows only in a non-near future.▪ Forked of the main PostgreSQL source.

– Postgres-XL – all purpose fully ACID open source scale-out db solution. ▪ Running on Linux!▪ Forked of the main PostgreSQL source.

Sharding and Replication Cont.▪ Via Replication:– Hot Standby – Reducing read loads from Master to slaves

(horizontal scale).– Streaming (or Bucardo, or other possible option) replication to

slaves.– Load balancing “write” queries to Master, “read” queries to

slaves.

PostgreSQL and Big Data▪ PostgreSQL was used a decade before Hadoop launched, for

large data volumes and complex analytics (as the only pure open source).

▪ Today heavily used in mid-sized warehouses and data-marts (1-10 TB).

▪ Source of code for many big data systems:– Netezza (IBM).– Greenplum (Pivotal) – Open Source Massively Parallel Data Warehouse.– PipelineDB – open source, run SQL queries continuously on streaming data.– EnterpriseDB and CitusDB (commercial license) – fully scaled out Postgres.– Redshift (Amazon).

▪ PostgreSQL project continuously provide new features and better performance to support big data usage.

PostgreSQL and Big Data – Features▪ Serious NoSQL database competitor.– JSON\B advanced features and ongoing massive development plan .– Extensions that provide NoSQL like API.

▪ Faster Sorts – text and long numeric sorting improvements.▪ TABLESAMPLE – result set of pseudo-random number of

rows to provide a data glimpse for further analysis.▪ Cubes, Rollups and Grouping Sets – summarizing and

exploring huge data sets in the OLAP way.▪ BRIN indexes – much faster, suits for TBs size tables on

incrementally increasing value fields (like timestamps or integers).

PostgreSQL and Big Data – Features Cont.▪ Foreign Data Wrappers – linking external data (for

querying like local) for hybrid solutions.– Foreign schema import.– JOIN pushdowns

▪ Vacuum (garbage collection – deleting) – became parallel with multi-process mode (maintaining several large tables at once).

▪ Scaling UP – Multicore scalability improvements.

Enterprise Wise

▪ Open Source▪ Reliability▪ Authentication▪ Logging▪ Documentation▪ Support▪ Maintenance

Open Source▪ Available under the open source license – PostgreSQL

License.▪ Using, modifying and distributing in any open\close

form.▪ Extending and patching the relational database per

project\client etc.▪ Variety of modules, extensions and tools based on its

open source license.

Reliability▪ PostgreSQL is relatively bug-free (compared to MSSQL).▪ Very large community reporting, fixing\workarounds

bugs.▪ Constantly growing community

Authentication▪ Trust Authentication.▪ Password Authentication.▪ GSSAPI\SSPI Authentication – using Kerberos.▪ Ident Authentication.▪ Peer Authentication.▪ LDAP Authentication▪ RADIUS Authentication.▪ Certificate Authentication.▪ Pluggable Authentication Modules.

Logging▪ Logs in one place.– Unlike MSSQL – error logs, event log, profiler log, agent log…

▪ Easily configurable logging level.▪ Easily redirect to CSV files and shipped to tables.▪ Easily redirect to System Log, Windows Event Log.▪ Logs are human readable with a great sysadmin value.

Documentation▪ There is nothing more to add than a link:

http://www.postgresql.org/docs/

Support▪ Community based support – seems like a fast one too.▪ Numerous companies specialized in enterprise support:

http://www.postgresql.org/support/professional_support/▪ Enterprise database management companies like:

EnterpriseDB▪ Total Cost of Ownership is significantly lower even with

enterprise support. (Based on reports. e.g. Gartner 2015).

vs. MySQL

▪ ACID fully! compliant.▪ Subqueries and Joins.▪ Better locking mechanism.▪ JSON\JSONB support.▪ NoSQL and Key-Value store.▪ Advanced GIS abilities.▪ Full Text Search abilities.▪ Advanced and attractive data types.▪ Way better and useful extensibility patterns. ▪ Licensing issues.

vs. PostgreSQL

▪ Partitioning based on table inheritance (Pros. and Cons.)

▪ Can be an overkill in case of simple read-heavy operations. (Improved in newer versions).

▪ Replication and Clustering (especially multi-master). Not “there” yet, but on a right track.

▪ Popularity – not as popular as MySQL (for example) but gains popularity constantly, as opposite to MySQL.

▪ Expertise issues – different syntax and administration (compared to MSSQL).

THANK YOU