ru@postgresql.org Twitter: @postgresmen - Percona · In Postgres: Functions = UDFs (user-defined...

Post on 25-Nov-2018

235 views 0 download

Transcript of ru@postgresql.org Twitter: @postgresmen - Percona · In Postgres: Functions = UDFs (user-defined...

Nikolay Samokhvalov

Twitter: @postgresmenru@postgresql.org

History

Year of Birth: 1995

History

1995: Postgres95 – POSTQUEL query language replaced with SQL

History

1995: Postgres95 – POSTQUEL query language replaced with SQL

1996: Postgres95 departed from academia, renamed to PostgreSQL

History

1995: Postgres95 – POSTQUEL query language replaced with SQL

1996: Postgres95 departed from academia, renamed to PostgreSQL

1998: PL/pgSQL added (PostgreSQL 6.4)

And a bit more history...

Object Management in POSTGRES Using ProceduresM. Stonebraker

http://www.dtic.mil/dtic/tr/fulltext/u2/a181411.pdf

What’s now?- Postgres speaks a lot of PL languages:

- “native”: PL/pgSQL- included: PL/Tcl, PL/Perl, PL/Python- additional-traditional: PL/Java, PL/R, PL/sh, PL/v8 (JavaScript)- not active: PL/Scheme, PL/PHP, PL/Ruby- special/exotic/new:

- PL/Proxy (sharding, from Skype), - PL/Container (Python, R), - plgo (Go), etc.- PgOpenCL (GPU!)

- Functions can also be created in:- C (anything is possible!)- SQL (plain! standard! with [recursive] CTEs!)

What are Stored Procedures?

In Postgres:Functions = UDFs (user-defined functions) = Stored Procedures

(in other DBMSes: you can include your function/UDF to a SELECT,while you can only PERFORM/EXEC/EXECUTE a stored procedure)

Functions & Triggers

Functions & Triggers

Why?

Reason #1: Data Clearness & Integrity

Data Checks (format, constraints, etc)(Ruby or Python or PHP or …)

Reason #1: Data Clearness & Integrity

Data Checks (format, constraints, etc)in App (Ruby or Python or PHP or …)

Reason #1: Data Clearness & Integrity

Data Checks (format, constraints, etc)in App (Ruby or Python or PHP or …)

Reason #1: Data Clearness & Integrity

App (Ruby or Python or PHP or …)

CHECKS

Reason #1: Data Clearness & Integrity

App (Ruby or Python or PHP or …)

CHECKS

Control your Data Quality

Data Validation, an example: validate email address

Source: https://www.postgresql.org/message-id/20050907175305.GA20501%40isis.sigpipe.cz

Reason #2: Access Control

- SECURITY DEFINER allows a user to do what she/he cannot usually do (but under strict control)- GRANT/REVOKE – a standard way to control permissions - Good approach: forbid direct access to tables, provide functions and views with proper GRANTs- Pay attention to:

- objects (tables, views, functions)- columns (can REVOKE/GRANT individually!)- rows (check what Row-Level Security is)

Reason #3: speed (first of all, IO/network-related)

DBMS (Postgres 9.6) – AWS RDS, USA,Client (psql) – somewhere in Germany.Getting all 10M rows is ~7x slower

Use your RDBMS for Data Manipulation. It is not just a Storage.

Reason #3: speedThere are a LOT of cases here.

- ORMs (ActiveRecord, Hibernate, etc) and how people work with them- Analytics (doing R or python calculations inside RAM, etc)- Massive data updates (retrieve IDs and then DELETE rows? Doh.

Just look around and you’ll find more.

Again: Work with Data Inside Database First.

Pay attention to:- cardinality (how many rows you touch?)- RTT (round trip time), reduce network calls

Reason #4: Data Integration

Data Manipulation Logicin App (Ruby or Python or PHP or …)

Something*

* ElasticSearch, Sphix, Analytics DBMS, etc

Reason #4: Data Integration

Data Manipulation Logicin App (Ruby or Python or PHP or …)

Something*

* ElasticSearch, Sphix, Analytics DBMS, etc

Reason #4: Data Integration

Data Manipulation Logicin App (Ruby or Python or PHP or …)

Something*

* ElasticSearch, Sphix, Analytics DBMS, etc

Reason #4: Data Integration

App (Ruby or Python or PHP or …)

Something*

* ElasticSearch, Sphix, Analytics DBMS, etc

DataManipulation

Use:- functions, triggers,- Foreign Data Wrappers (FDW),- Logical Decoding (e.g. pglogical)

#5: HTTP API w/o middleware, “declarative”http://postgrest.com - PostgREST

Written in HaskellMIT licenseActively developing

chat: https://gitter.im/begriffs/postgrest

CREATE VIEW v1.person

AS SELECT * FROM public.person; → /person

CREATE FUNCTION v1.myfunc(...) … → /rpc/myfunc

LANGUAGE ...;

(write functions in any language: SQL, plpgsql, plpython, plr, plv8, etc!)

GET → SELECTPOST → INSERTPATCH → UPDATEDELETE → DELETE

Only POST

#6: PL/Proxy: sharding

- All work via functions- Special functions (in PL/Proxy “language”) are in the

middle- Developed in Skype, and still there- Yandex.Mail migrated from Oracle to Postgres +

PL/Proxy in 2014-2016 (300+ TB, 250k RPS)

#6: PL/Proxy: sharding

#7: MADlib: Machine Learning inside your DBMS

- A lot of ML algorithms implemented (added in each release)- PL/Python- Very easy and quick start to do machine learning with your Postgres data

http://madlib.incubator.apache.org/

Cons● Tooling can be considered week (packaging, dependences, editors,

debugging, profiling, etc)

● Version control and schema migrations

● Testing

● Stored Procedures consume resource in DBMS. Can be tricky to scale○ Example: call external API via plpythonu function and save data -- consumes CPU on your

server unpredictably!

Cons - fixes● Tooling can be considered week (packaging, dependences, editors,

debugging, profiling, etc) vim+plpgsql highlighting; DataGrip, Debugger, Profiler (pgAdmin)

● Version control and schema migrations Sqitch and others

● Testing pgTAP

● Stored Procedures are consuming resource in DBMS. Can be tricky to scale○ Example: call external API via plpythonu function and save data -- consumes CPU on your

server unpredictably!

Avoid I/O things inside your master if you need to scale

Thank you!

Twitter: @postgresmen (new Postgres tweets daily!)

ru@postgresql.org

RuPostgres.org