PostgreSQL, your NoSQL databaseReuven M. Lerner, PhD • [email protected]
DevConTLV, Monday June 22th, 2015
Who am I?
Training
• Python
• PostgreSQL
• Git
• Ruby
Writing• Linux Journal
• Blog: http://blog.lerner.co.il/
• Tweeting: @reuvenmlerner
• ebook: "Practice Makes Python"
• E-mail courses
• My programming newsletter
Curating
• Full-stack Web development
• http://DailyTechVideo.com • @DailyTechVideo
• Learning Mandarin Chinese?
• http://MandarinWeekly.com • @MandarinWeekly
What is a database?
• Store data securely
• Retrieve data flexibly
• Do this as efficiently as possible
My first database• Text files!
• They're really fast to work with
• They're really flexible
• But all of the data handling is in our application!
• So things are slow
• And when there's more than one user, it gets bad
Things would be better if:• The database let us structure our data
• The database did most of the computing work (high speed and centralized), freeing up our application
• The database handled constraints and errors
• The database took care of simultaneous reads, writes in the form of transactions
• The database handled errors well, reporting them rather than dying on us
Relational model• EF Codd, an IBM researcher, proposed it in 1970
• Replaced the previous hierarchical model
• Normalized data = easier, more flexible
• Eight relational operations:
• Union, intersection, difference, product
• Selection (WHERE), projection (select a, b), join, division
Query languages• Codd spoke in terms of mathematics.
• This was implemented using query languages
• SQL was not the first, or the only, query language!
• Codd wrote Alpha
• Stonebreaker wrote Quel
• IBM (but not Codd!) wrote SEQUEL
• Larry Ellison made his own version of SEQUEL… and thus was born the new, more generic name, SQL
Brief history• 1977-1985: Ingres (Stonebreaker)
• 1986-1994: Postgres (Stonebreaker)
• 1995: Postgres + SQL = PostgreSQL
• 1996: Open-source project, run by the “global development group”
• Ever since, one major release per year
• Current is 9.4, with 9.5 due in the autumn
It's getting popular…• Rock solid
• High performance
• Extensible
• Heroku
• (Also: Thanks, Oracle!)
So, what is NoSQL?
• It's not really NoSQL.
• Rather, it's non-relational.
NoSQL isn't new!
• Pre-relational databases
• Object databases
• Key-value stores (e.g., Berkeley DB)
So, why NoSQL?
• Not everything is easily represented with tables
• Sometimes we want a more flexible schema — the database equivalent of dynamic typing
• Some data is bigger, or comes faster, than a single relational database can handle
NoSQL isn't a definition!
• "I want to travel using a non-flying vehicle."
• "I want a non-meat dinner."
• "I want to read a non-fiction book."
Key-value stores
• Examples: Redis, Riak
• Think of it as a hash table server, with typed data
• Especially useful for caching, but also good for many name-value data sets
• Very fast, very reliable, very useful
Document databases
• Examples: MongoDB, CouchDB
• We love JSON, right? Use it to store everything!
• JSON will prevail!
What's wrong with this?• New systems to learn, install, configure, and tune
• New query language(s) to learn, often without the expressive power of SQL
• Non-normalized data!
• Splitting our data across different systems might lead to duplication or corruption
• What about transactions? What about ACID?
Is NoSQL wrong?• No, of course not.
• Different needs require different solutions.
• But let's not throw out 40+ years of database research, just because NoSQL is new and cool.
• Engineering is all about trade-offs. There is no perfect solution. Optimize for certain things.
When you discovered hash tables, did you stop using arrays?
SQL vs. NoSQL
• As a developer, I can then choose between SQL and NoSQL
• NoSQL can be faster, more flexible, and easier
• But SQL databases have a lot of advantages, and it's a shame to throw out so many years of advancement
But wait!• PostgreSQL has an extension mechanism
• Add new data types
• Add new functions
• Connect to external databases
• PostgreSQL is becoming a platform for data storage and retrieval, and not just a database
HSTORE
• HSTORE is a data type, just like INTEGER, TIMESTAMP, or TEXT
• If you define a column as HSTORE, it can contain key-value pairs
• Keys and values are both strings
Create a tableCREATE EXTENSION HSTORE;
CREATE TABLE People (
id SERIAL,
info HSTORE,
PRIMARY KEY(id)
);
Add a HSTORE value
INSERT INTO people(info)
VALUES ('foo=>1, bar=>abc, baz=>stuff');
Look at our values[local]/reuven=# select * from people;
+----+------------------------------------------+
| id | info |
+----+------------------------------------------+
| 1 | "bar"=>"abc", "baz"=>"stuff", "foo"=>"1" |
+----+------------------------------------------+
(1 row)
Add (or replace) a pair
UPDATE People
SET info = info || 'abc=>def';
Remove a pair
UPDATE People
SET info = delete(info, 'abc');
What else?• Everything you would want in a hash table…
• Check for a key
• Remove a key-value pair
• Get the keys
• Get the values
• Turn the hstore into a PostgreSQL array or JSON
Indexes
• PostgreSQL has several types of indexes
• You can index HSTORE columns with GIN and GIST indexes, which lets you search inside
• You can also index HSTORE columns with HASH indexes, for finding equal values
HSTORE isn't Redis• But it does give you lots of advantages
• Super reliable
• CHECK constraints
• Combine HSTORE queries with other queries
• Transactions!
• Master-slave replication for scalability
JSON and JSONB• In the last few versions, PostgreSQL has added
JSON support
• First, basic JSON support
• Then, some added operators
• Now, JSONB support — high-speed binary JSON storage
Creating a table with JSONB
CREATE TABLE People (
id SERIAL,
info JSONB
);
Adding values
INSERT INTO people (info)
VALUES ('{"first":"Reuven",
"last":"Lerner"}'),
('{"first":"Atara",
"last":"Lerner-Friedman"}');
Retrieving valuesselect info from people;
+-----------------------------------------------+
| info |
+-----------------------------------------------+
| {"last": "Lerner", "first": "Reuven"} |
| {"last": "Lerner-Friedman", "first": "Atara"} |
+-----------------------------------------------+
(2 rows)
ExtractSELECT info->'last' as last,
info->'first' as first
FROM People;
┌───────────────────┬──────────┐
│ last │ first │
├───────────────────┼──────────┤
│ "Lerner" │ "Reuven" │
│ "Lerner-Friedman" │ "Atara" │
└───────────────────┴──────────┘
(2 rows)
Use the inside dataselect * from people order by info->'first' DESC;
+----+-----------------------------------------------+
| id | info |
+----+-----------------------------------------------+
| 4 | {"last": "Lerner", "first": "Reuven"} |
| 5 | {"last": "Lerner-Friedman", "first": "Atara"} |
+----+-----------------------------------------------+
(2 rows)
JSONB operators
• Checking for existence
• Reading inside of the JSONB
• Retrieving data as text, or as JSON objects
Indexes
• You can even index your JSONB columns!
• You can use functional and partial indexes on JSONB
Performance• EnterpriseDB (a PostgreSQL support company)
compared JSONB with MongoDB
• High-volume inserts: PostgreSQL was 2.2x faster than MongoDB
• Inserts: PostgreSQL was 3x faster
• Disk space: MongoDB used 35% more
• JSONB is slower than MongoDB in updates, however
Foreign data wrappers• Let's say that you have a NoSQL database
• However, you want to integrate that data into your PostgreSQL system
• That's fine — just use a "foreign data wrapper"
• To PostgreSQL, it looks like a table. But in reality, it's retrieving (and setting) data in the NoSQL database!
Using a FDW• Download, install the extension
• Create a foreign server
• Create a foreign table
• Now you can read from and write to the foreign table
• How is NoSQL mapped to a table? Depends on the FDW
Available NoSQL FDWs• Cassandra
• CouchDB
• MongoDB
• Neo4j
• Redis
• RethinkDB
Schema changes• NoSQL loves to talk about "no schemas"
• But schemas make our data predictable, and help us to exclude bad data
• You can always use ALTER TABLE to change the schema — adding, removing, and renaming columns, or even modifying data types or constraints
Summary• New problems can require new solutions
• But let's not give up all of the great solutions we've created over the last few decades
• PostgreSQL has proven itself, time and again, as an SQL solution
• But it's becoming a platform — one which includes NoSQL data types, and integrates with NoSQL databases
Top Related