Python in the database

25
Python in the database Writing PostgreSQL triggers and functions with Python

Transcript of Python in the database

Python in the databaseWriting PostgreSQL triggers and functions

with Python

Who Am I

Brian Sutherland

Partner at Vanguardistas LLCWorked with PostgreSQL and Python for yearsUsed them to build Metro Publisher (SaaS)

What is PostgreSQL?

A non-NoSQL database…Actually an SQL databaseExtremely extensiblePerformantFirst released 1995Very good general purpose Database

But we’re here to talk about Triggers

Code which executes inside the database process in response to events

e.g. INSERT/UPDATE/DELETE a new row

Triggers

Used for● auditing/logging● sending email● validation● denormalization/cache● cache invalidation● replication

Triggers

PostgreSQL allows writing triggers in a number of languages:● C● Java● Javascript● Python● ...

Triggers

Use with caution● Principle of least astonishment

○ INSERT can send email● Transactions

○ Serialization Errors○ Idempotency○ Transaction Rollback

PL/Python

● Python 2 and 3● Basic Postgres types are converted to

Python● An “untrusted” language● One interpreter per database session

Calendaring Example

Web application for calendaring● Recurring events● Read queries must be fastHigh number of database reads compared to writes

Calendaring Example

Every Weekday at 3PM until 1 January 2020>>> from dateutil.rrule import *

>>> list(rrule(DAILY,

byweekday=[MO, TU, WE, TH, FR],

dtstart=datetime(2014, 11, 10, 15),

until=datetime(2020, 1, 1)))

[datetime.datetime(2014, 11, 10, 15, 0),

datetime.datetime(2014, 11, 11, 15, 0),

datetime.datetime(2014, 11, 12, 15, 0),

…]

Calendaring Example

Naïve Implementation● Store only the rule in the database● On display, expand the rule using dateutil● Render calendar in HTML

Calendaring Example

FAILThere are 100 000 events in the database, find all events which occur between 3 and 4 PM today

Calculating………………………...

Calendaring Example

Find another way, use triggers to● Pre-calculate occurrences● Store them in another “cache” table● Use PostgreSQL indexes to make queries

fast

Calendaring Example

Trigger on the “event” generates occurrencesStore the occurrences in an “occ” table

Thanks to indexing, this query is FAST:SELECT * FROM occWHERE dtstart > X AND dtend < Y

Calendaring Example

PostgreSQL has a range type which makes things even faster:

SELECT * FROM occWHERE occuring && tsrange(X, Y)

Calendaring Example

Creating the triggerCREATE LANGUAGE "plpython2u";

CREATE FUNCTION event_occs () RETURNS trigger AS $$

from my.plpy.generate_event_occs import generate_event_occs

generate_event_occs(TD["new"])

return "OK"

$$ LANGUAGE plpython2u;

CREATE TRIGGER event_gen_occs BEFORE UPDATE OR INSERT ON

event FOR EACH ROW EXECUTE PROCEDURE event_occs();

Calendaring ExampleMuch simplified function:import plpy

def generate_event_occs(new):

d = plpy.prepare("DELETE FROM occ WHERE event_id=$1" , ["int"])

plpy.execute(d, [new[‘event_id’]])

i = plpy.prepare("INSERT INTO occ VALUES ($1,$2)", ["int", “tsrange”])

for period in rrule(new):

plpy.execute(i, [new[‘event_id’], period])

JSON Validation example

Store JSON in a columnWe want to make sure there is a “type” key

JSON Validation exampleMuch simplified function:

CREATE FUNCTION check_foo() RETURNS trigger AS $$

from json import loads

foo = loads(TD["new"]["foo"])

if "type" not in foo or foo["type"] not in ["a", "b"]:

raise Exception("Invalid Type")

return "OK"

$$ LANGUAGE plpython2u;

Best Practices

Immediately import and call a python functionCREATE FUNCTION event_occs () RETURNS trigger AS $$

from my.plpy.generate_event_occs import generate_event_occs

generate_event_occs(TD["new"])

return "OK"

$$ LANGUAGE plpython2u;

Best Practices

Import time can kill performance as modules are re-imported every database connection

The ugly

● Except for some very basic types, Python 2 gets fed byte strings in the “database encoding”

● A little better in Python 3 which gets unicode● Debugging is interesting... (try running PDB

inside the PostgreSQL process)

The REALLY ugly (Fixed?)ERROR: Exception: oopsCONTEXT: Traceback (most recent call last):

PL/Python function "generate_event_occs", line 3, in <module> return generate_event_occs(event, rrule, SD) PL/Python function "generate_event_occs", line 256, in generate_event_occs PL/Python function "generate_event_occs", line 73, in generate_occurrences PL/Python function "generate_event_occs", line 97, in generate PL/Python function "generate_event_occs", line 424, in _handle_byday PL/Python function "generate_event_occs", line 206, in resolvePL/Python function "generate_event_occs"PL/pgSQL function content.generate_event_occs() line 7 at assignmentSQL statement "UPDATE content.event SET dtstart_occs=NULL WHERE uuid=ev_uuid"PL/pgSQL function content.event_rrule_set_occ_bounds() line 12 at SQL statement

Conclusions

VERY useful for complex code in the database if you already program in pythonPython has a lot of libraries which can be used

It has warts, but is a lifesaver

Questions?

Documentation:

http://www.postgresql.org/docs/devel/static/plpython.html