Assuring spatial and temporal data integrity via constraints and triggers in PostgreSQL

51
Assuring spatial and temporal data integrity via constraints and triggers in PostgreSQL Adrian C. Prelipcean [email protected] [email protected]

Transcript of Assuring spatial and temporal data integrity via constraints and triggers in PostgreSQL

Page 1: Assuring spatial and temporal data integrity via constraints and triggers in PostgreSQL

Assuring spatial and temporal data

integrity via constraints and triggers in

PostgreSQL

Adrian C. [email protected]

[email protected]

Page 2: Assuring spatial and temporal data integrity via constraints and triggers in PostgreSQL

Ego page

Adrian C. Prelipcean

1. PhD student at KTH– Transportation systems and geoinformatics– Data collection and analysis methods for travel behaviour– Lab responsible for multiple courses

I GIS Algorithms (Java)I Spatial Databases (PostgreSQL, PostGIS, pgRouting)I Web and Mobile GIS (NodeJS, ExpressJS, Apache

Cordova, PostgreSQL)

2. Tech responsible at Airmee– Airmee - a smart platform for urban logistics– Currently working on MVPs (from the backend to the

frontend)– PostgreSQL as the backend

2

Page 3: Assuring spatial and temporal data integrity via constraints and triggers in PostgreSQL

Background

Travel behaviour - understand how people travel and makedecisionsWe used to ask people how they are travelling - fill in paperforms, answer telephone interviews, fill in web forms.

I Found out how to adapt the infrastructure of our cities tohandle a growing population.

I However, also found out that:

– Physical mail is not that popular.– Filling up forms is not really a hobby.– Method doesn’t scale well.

3

Page 4: Assuring spatial and temporal data integrity via constraints and triggers in PostgreSQL

Background

Travel behaviour - understand how people travel and makedecisionsWe used to ask people how they are travelling - fill in paperforms, answer telephone interviews, fill in web forms.

I Found out how to adapt the infrastructure of our cities tohandle a growing population.

I However, also found out that:– Physical mail is not that popular.

– Filling up forms is not really a hobby.– Method doesn’t scale well.

3

Page 5: Assuring spatial and temporal data integrity via constraints and triggers in PostgreSQL

Background

Travel behaviour - understand how people travel and makedecisionsWe used to ask people how they are travelling - fill in paperforms, answer telephone interviews, fill in web forms.

I Found out how to adapt the infrastructure of our cities tohandle a growing population.

I However, also found out that:– Physical mail is not that popular.– Filling up forms is not really a hobby.

– Method doesn’t scale well.

3

Page 6: Assuring spatial and temporal data integrity via constraints and triggers in PostgreSQL

Background

Travel behaviour - understand how people travel and makedecisionsWe used to ask people how they are travelling - fill in paperforms, answer telephone interviews, fill in web forms.

I Found out how to adapt the infrastructure of our cities tohandle a growing population.

I However, also found out that:– Physical mail is not that popular.– Filling up forms is not really a hobby.– Method doesn’t scale well.

3

Page 7: Assuring spatial and temporal data integrity via constraints and triggers in PostgreSQL

New ways to collect same data

Use end user devices (mostly smartphones), to (legally) collecthow people move (GPS points).

I Old way

– Ask people to fill up forms

I New way

– Ask people to look at their GPS points and fill up forms(?)

I New way 2.0

– Use [buzzwords] machine learning / AI to avoid asking peopleto fill up forms

– But we need data to train the algorithms– Also, algorithms will give wrong predictions– How about a continuous feedback loop between users and ML

algorithms– What can go wrong?

4

Page 8: Assuring spatial and temporal data integrity via constraints and triggers in PostgreSQL

New ways to collect same data

Use end user devices (mostly smartphones), to (legally) collecthow people move (GPS points).

I Old way– Ask people to fill up forms

I New way

– Ask people to look at their GPS points and fill up forms(?)

I New way 2.0

– Use [buzzwords] machine learning / AI to avoid asking peopleto fill up forms

– But we need data to train the algorithms– Also, algorithms will give wrong predictions– How about a continuous feedback loop between users and ML

algorithms– What can go wrong?

4

Page 9: Assuring spatial and temporal data integrity via constraints and triggers in PostgreSQL

New ways to collect same data

Use end user devices (mostly smartphones), to (legally) collecthow people move (GPS points).

I Old way– Ask people to fill up forms

I New way

– Ask people to look at their GPS points and fill up forms(?)I New way 2.0

– Use [buzzwords] machine learning / AI to avoid asking peopleto fill up forms

– But we need data to train the algorithms– Also, algorithms will give wrong predictions– How about a continuous feedback loop between users and ML

algorithms– What can go wrong?

4

Page 10: Assuring spatial and temporal data integrity via constraints and triggers in PostgreSQL

New ways to collect same data

Use end user devices (mostly smartphones), to (legally) collecthow people move (GPS points).

I Old way– Ask people to fill up forms

I New way

– Ask people to look at their GPS points and fill up forms(?)I New way 2.0

– Use [buzzwords] machine learning / AI to avoid asking peopleto fill up forms

– But we need data to train the algorithms– Also, algorithms will give wrong predictions– How about a continuous feedback loop between users and ML

algorithms– What can go wrong?

4

Page 11: Assuring spatial and temporal data integrity via constraints and triggers in PostgreSQL

New ways to collect same data

Use end user devices (mostly smartphones), to (legally) collecthow people move (GPS points).

I Old way– Ask people to fill up forms

I New way– Ask people to look at their GPS points and fill up forms(?)

I New way 2.0

– Use [buzzwords] machine learning / AI to avoid asking peopleto fill up forms

– But we need data to train the algorithms– Also, algorithms will give wrong predictions– How about a continuous feedback loop between users and ML

algorithms– What can go wrong?

4

Page 12: Assuring spatial and temporal data integrity via constraints and triggers in PostgreSQL

New ways to collect same data

Use end user devices (mostly smartphones), to (legally) collecthow people move (GPS points).

I Old way– Ask people to fill up forms

I New way– Ask people to look at their GPS points and fill up forms(?)

I New way 2.0– Use [buzzwords] machine learning / AI to avoid asking people

to fill up forms

– But we need data to train the algorithms– Also, algorithms will give wrong predictions– How about a continuous feedback loop between users and ML

algorithms– What can go wrong?

4

Page 13: Assuring spatial and temporal data integrity via constraints and triggers in PostgreSQL

New ways to collect same data

Use end user devices (mostly smartphones), to (legally) collecthow people move (GPS points).

I Old way– Ask people to fill up forms

I New way– Ask people to look at their GPS points and fill up forms(?)

I New way 2.0– Use [buzzwords] machine learning / AI to avoid asking people

to fill up forms– But we need data to train the algorithms

– Also, algorithms will give wrong predictions– How about a continuous feedback loop between users and ML

algorithms– What can go wrong?

4

Page 14: Assuring spatial and temporal data integrity via constraints and triggers in PostgreSQL

New ways to collect same data

Use end user devices (mostly smartphones), to (legally) collecthow people move (GPS points).

I Old way– Ask people to fill up forms

I New way– Ask people to look at their GPS points and fill up forms(?)

I New way 2.0– Use [buzzwords] machine learning / AI to avoid asking people

to fill up forms– But we need data to train the algorithms– Also, algorithms will give wrong predictions

– How about a continuous feedback loop between users and MLalgorithms

– What can go wrong?

4

Page 15: Assuring spatial and temporal data integrity via constraints and triggers in PostgreSQL

New ways to collect same data

Use end user devices (mostly smartphones), to (legally) collecthow people move (GPS points).

I Old way– Ask people to fill up forms

I New way– Ask people to look at their GPS points and fill up forms(?)

I New way 2.0– Use [buzzwords] machine learning / AI to avoid asking people

to fill up forms– But we need data to train the algorithms– Also, algorithms will give wrong predictions– How about a continuous feedback loop between users and ML

algorithms

– What can go wrong?

4

Page 16: Assuring spatial and temporal data integrity via constraints and triggers in PostgreSQL

New ways to collect same data

Use end user devices (mostly smartphones), to (legally) collecthow people move (GPS points).

I Old way– Ask people to fill up forms

I New way– Ask people to look at their GPS points and fill up forms(?)

I New way 2.0– Use [buzzwords] machine learning / AI to avoid asking people

to fill up forms– But we need data to train the algorithms– Also, algorithms will give wrong predictions– How about a continuous feedback loop between users and ML

algorithms– What can go wrong?

4

Page 17: Assuring spatial and temporal data integrity via constraints and triggers in PostgreSQL

Data

5

Page 18: Assuring spatial and temporal data integrity via constraints and triggers in PostgreSQL

Data

5

Page 19: Assuring spatial and temporal data integrity via constraints and triggers in PostgreSQL

Data

5

Page 20: Assuring spatial and temporal data integrity via constraints and triggers in PostgreSQL

Dealing with predictionsI Machine learning algorithms infer

– Time periods for trips (when people travel betweendestinations)

– Time periods for triplegs (when people travel with differenttravel modes [between destinations])

– Destination of trips (where people travel)– Purpose of trips (why people travel)– Travel modes (how people travel)

I Users can look at the output– And confirm that it is correct– Or modify it to bring it to the correct state

I AloneI Using a weird UII ...

6

Page 21: Assuring spatial and temporal data integrity via constraints and triggers in PostgreSQL

Dealing with predictionsI Machine learning algorithms infer

– Time periods for trips (when people travel betweendestinations)

– Time periods for triplegs (when people travel with differenttravel modes [between destinations])

– Destination of trips (where people travel)– Purpose of trips (why people travel)– Travel modes (how people travel)

I Users can look at the output– And confirm that it is correct– Or modify it to bring it to the correct state

I AloneI Using a weird UII ...

6

Page 22: Assuring spatial and temporal data integrity via constraints and triggers in PostgreSQL

What can go wrong?I Invalid periods

– e.g., traveled to work between 08:00 AM and 07:00 AM(same day)

I Overlapping periods– e.g., traveled to work between 08:00 AM and 09:00 AM, and

traveled to the supermarket between 08:30 AM and 09:30

I Discontinuous periods– e.g., traveled to work between 08:00 AM to 09:00 AM,

worked between 09:00 AM and 11:30 AM worked, andtraveled to a restaurant between 13:00 AM to 13:20

I Improper nests– e.g., traveled to work between 08:00 AM to 09:00 AM, during

which I walked between 08:20 and 09:00

7

Page 23: Assuring spatial and temporal data integrity via constraints and triggers in PostgreSQL

How to prevent things from going wrong?

8

Page 24: Assuring spatial and temporal data integrity via constraints and triggers in PostgreSQL

How to prevent things from going wrong?

8

Page 25: Assuring spatial and temporal data integrity via constraints and triggers in PostgreSQL

How to prevent things from going wrong?

8

Page 26: Assuring spatial and temporal data integrity via constraints and triggers in PostgreSQL

Avoiding headaches

Constraints

I Good for enforcing conditions on the table’s attributes

I Specified at table declaration or afterward, via ALTERI Types:

– Check constraints (row-level), e.g., age >0– Not null constraints (row-level), e.g., name not null– Unique constraints (table-level), e.g., two people cannot have

the same ID– Primary key (table-level) - unique + not null– Foreign key (between tables) - referential integrity to other

tables– Exclude constraints (table level) - unique + table-level check

9

Page 27: Assuring spatial and temporal data integrity via constraints and triggers in PostgreSQL

Avoiding headaches

Triggers

I Useful for enforcing constraints across multiple tables(works fine for same table constraints, as well)

I Can be used to implement complex logicI Types

– Event triggerI Global, for the databaseI Captures DDL events (CREATE, ALTER, DROP,

COMMENT, GRANT, REVOKE)– Data change trigger

I Local, for the tableI Captures DML events (INSERT, UPDATE, TRUNCATE,

DELETE)I Can be run BEFORE or AFTER the data change eventI Can be conditioned by a WHEN clause

I Triggers can be written in any procedural language withevent trigger support, or in C

10

Page 28: Assuring spatial and temporal data integrity via constraints and triggers in PostgreSQL

Generate test data - Tables

11

Page 29: Assuring spatial and temporal data integrity via constraints and triggers in PostgreSQL

Generate test data - Tables

11

Page 30: Assuring spatial and temporal data integrity via constraints and triggers in PostgreSQL

Generate test data - Tables

11

Page 31: Assuring spatial and temporal data integrity via constraints and triggers in PostgreSQL

Generate test data - Locations

12

Page 32: Assuring spatial and temporal data integrity via constraints and triggers in PostgreSQL

Generate test data - Locations

12

Page 33: Assuring spatial and temporal data integrity via constraints and triggers in PostgreSQL

Generate test data - Trips and triplegs

13

Page 34: Assuring spatial and temporal data integrity via constraints and triggers in PostgreSQL

Generate test data - Trips and triplegs

13

Page 35: Assuring spatial and temporal data integrity via constraints and triggers in PostgreSQL

Check Constraints

ERROR: new row for relation "trips" violates check

constraint "valid_periods"

DETAIL: Failing row contains

(5, 2016-10-25 07:37:28.6, 2016-10-25 06:37:28.6)

14

Page 36: Assuring spatial and temporal data integrity via constraints and triggers in PostgreSQL

Not Null Constraints

ERROR: null value in column "from_time" violates

not-null constraint

DETAIL: Failing row contains (6, null, null).

15

Page 37: Assuring spatial and temporal data integrity via constraints and triggers in PostgreSQL

Primary Key Constraints

ERROR: duplicate key value violates unique constraint

"trips_pkey"

DETAIL: Key (id)=(1) already exists.

16

Page 38: Assuring spatial and temporal data integrity via constraints and triggers in PostgreSQL

Foreign Key Constraints

ERROR: insert or update on table "triplegs" violates

foreign key constraint "triplegs_trip_id_fkey"

DETAIL: Key (trip_id)=(100) is not present in

table "trips".

17

Page 39: Assuring spatial and temporal data integrity via constraints and triggers in PostgreSQL

Exclusion Constraints

ERROR: conflicting key value violates exclusion

constraint "non_overlapping_trip_periods"

DETAIL: Key (tsrange(from_time, to_time))

= (["2016-10-25 12:17:00","2016-10-25 12:20:00"))

conflicts with existing key

(tsrange(from_time, to_time))

= (["2016-10-25 12:00:00","2016-10-25 13:00:00")).

18

Page 40: Assuring spatial and temporal data integrity via constraints and triggers in PostgreSQL

Triggers - definition example

19

Page 41: Assuring spatial and temporal data integrity via constraints and triggers in PostgreSQL

Triggers - minimum two locations per tripleg

ERROR: proposed period modification of tripleg 1

does not contain enough locations

20

Page 42: Assuring spatial and temporal data integrity via constraints and triggers in PostgreSQL

Triggers - time update of trip affects neighbors

21

Page 43: Assuring spatial and temporal data integrity via constraints and triggers in PostgreSQL

Triggers - time update of trip affects neighbors

21

Page 44: Assuring spatial and temporal data integrity via constraints and triggers in PostgreSQL

Triggers - time update of trip affects neighbors

21

Page 45: Assuring spatial and temporal data integrity via constraints and triggers in PostgreSQL

Triggers - time update of trip affects (nested)

triplegs

ERROR: proposed period modification of tripleg 4

does not contain enough locations

I Trigger attached to the Triplegs invalidated anoperation on the Trips table and rolled back thetransaction

I As long as the rules are properly defined andimplemented, data integrity is maintained

22

Page 46: Assuring spatial and temporal data integrity via constraints and triggers in PostgreSQL

Triggers - time update of trip affects (nested)

triplegs

ERROR: proposed period modification of tripleg 4

does not contain enough locations

I Trigger attached to the Triplegs invalidated anoperation on the Trips table and rolled back thetransaction

I As long as the rules are properly defined andimplemented, data integrity is maintained

22

Page 47: Assuring spatial and temporal data integrity via constraints and triggers in PostgreSQL

Triggers - time update of trip affects (nested)

triplegs

ERROR: proposed period modification of tripleg 4

does not contain enough locations

I Trigger attached to the Triplegs invalidated anoperation on the Trips table and rolled back thetransaction

I As long as the rules are properly defined andimplemented, data integrity is maintained

22

Page 48: Assuring spatial and temporal data integrity via constraints and triggers in PostgreSQL

Wrapping up

Triggers and constraintsI Advantages

– Can be used to implement your unified custom logic (writeonce, fix once)

– Constraints are great for enforcing checks on a row-level ortable-level (exclusion constraints), as well as referentialintegrity (foreign keys)

– For multiple table logic (or single table logic), triggers can doalmost anything - use responsibly

I Disadvantages– Documentation can be overwhelming– Cascading triggers can throw you in a loop (remove the

WHEN conditions on trip_period_integrity trigger)– Performance overhead– Cascading triggers are difficult to debug and the actions are

not always intuitiveI Advice

– Write unit tests (pgTap) to validate that a changed / newtrigger doesn’t break your functionality

23

Page 49: Assuring spatial and temporal data integrity via constraints and triggers in PostgreSQL

ReferencesI Docs:

– Triggers - https://www.postgresql.org/docs/current/static/sql-createtrigger.html

– Constraints - https://www.postgresql.org/docs/current/static/ddl-constraints.html

– Exclusion constraints -https://www.postgresql.org/docs/current/static/

sql-createtable.html#SQL-CREATETABLE-EXCLUDE

– Examples of exclusion constraints on ranges -https://www.postgresql.org/docs/current/static/

rangetypes.html#RANGETYPES-CONSTRAINTI Other presentations on constraints and triggers

– Jim Mlodgenski (more on triggers) -http://www.slideshare.net/jim_mlodgenski/

an-introduction-to-postresql-triggers

– Robert Haas (triggers cheat sheet) - http://www.slideshare.net/pgconf/introduction-to-triggers

– Magnus Hagander (exclusion constraints) -https://www.hagander.net/talks/BeyondUNIQUE.pdf

24

Page 50: Assuring spatial and temporal data integrity via constraints and triggers in PostgreSQL

References (part 2)I More on exclusion constraints

– Depesz - https://www.depesz.com/2010/01/03/waiting-for-8-5-exclusion-constraints/

– Davis Jeff -http://thoughts.davisjeff.com/2010/09/25/

exclusion-constraints-are-generalized-sql-unique/

I Github examples:– For this talk -

https://github.com/adrianprelipcean/PUG_Stockholm

– For the tracking system mentioned at the beginning (database+ mobile apps + annotation UI) -https://github.com/Badger-MEILI

25

Page 51: Assuring spatial and temporal data integrity via constraints and triggers in PostgreSQL

Thank you for your attention!Questions and Discussions

Adrian C. Prelipceanhttp://adrianprelipcean.github.io/[email protected]@Adi Prelipcean