Assuring spatial and temporal data integrity via constraints and triggers in PostgreSQL

Post on 19-Jan-2017

241 views 1 download

Transcript of Assuring spatial and temporal data integrity via constraints and triggers in PostgreSQL

Assuring spatial and temporal data

integrity via constraints and triggers in

PostgreSQL

Adrian C. Prelipceanacpr@kth.se

adrianprelipceanc@gmail.com

Ego page

Adrian C. Prelipcean

1. PhD student at KTH– Transportation systems and geoinformatics– Data collection and analysis methods for travel behaviour– Lab responsible for multiple courses

I GIS Algorithms (Java)I Spatial Databases (PostgreSQL, PostGIS, pgRouting)I Web and Mobile GIS (NodeJS, ExpressJS, Apache

Cordova, PostgreSQL)

2. Tech responsible at Airmee– Airmee - a smart platform for urban logistics– Currently working on MVPs (from the backend to the

frontend)– PostgreSQL as the backend

2

Background

Travel behaviour - understand how people travel and makedecisionsWe used to ask people how they are travelling - fill in paperforms, answer telephone interviews, fill in web forms.

I Found out how to adapt the infrastructure of our cities tohandle a growing population.

I However, also found out that:

– Physical mail is not that popular.– Filling up forms is not really a hobby.– Method doesn’t scale well.

3

Background

Travel behaviour - understand how people travel and makedecisionsWe used to ask people how they are travelling - fill in paperforms, answer telephone interviews, fill in web forms.

I Found out how to adapt the infrastructure of our cities tohandle a growing population.

I However, also found out that:– Physical mail is not that popular.

– Filling up forms is not really a hobby.– Method doesn’t scale well.

3

Background

Travel behaviour - understand how people travel and makedecisionsWe used to ask people how they are travelling - fill in paperforms, answer telephone interviews, fill in web forms.

I Found out how to adapt the infrastructure of our cities tohandle a growing population.

I However, also found out that:– Physical mail is not that popular.– Filling up forms is not really a hobby.

– Method doesn’t scale well.

3

Background

Travel behaviour - understand how people travel and makedecisionsWe used to ask people how they are travelling - fill in paperforms, answer telephone interviews, fill in web forms.

I Found out how to adapt the infrastructure of our cities tohandle a growing population.

I However, also found out that:– Physical mail is not that popular.– Filling up forms is not really a hobby.– Method doesn’t scale well.

3

New ways to collect same data

Use end user devices (mostly smartphones), to (legally) collecthow people move (GPS points).

I Old way

– Ask people to fill up forms

I New way

– Ask people to look at their GPS points and fill up forms(?)

I New way 2.0

– Use [buzzwords] machine learning / AI to avoid asking peopleto fill up forms

– But we need data to train the algorithms– Also, algorithms will give wrong predictions– How about a continuous feedback loop between users and ML

algorithms– What can go wrong?

4

New ways to collect same data

Use end user devices (mostly smartphones), to (legally) collecthow people move (GPS points).

I Old way– Ask people to fill up forms

I New way

– Ask people to look at their GPS points and fill up forms(?)

I New way 2.0

– Use [buzzwords] machine learning / AI to avoid asking peopleto fill up forms

– But we need data to train the algorithms– Also, algorithms will give wrong predictions– How about a continuous feedback loop between users and ML

algorithms– What can go wrong?

4

New ways to collect same data

Use end user devices (mostly smartphones), to (legally) collecthow people move (GPS points).

I Old way– Ask people to fill up forms

I New way

– Ask people to look at their GPS points and fill up forms(?)I New way 2.0

– Use [buzzwords] machine learning / AI to avoid asking peopleto fill up forms

– But we need data to train the algorithms– Also, algorithms will give wrong predictions– How about a continuous feedback loop between users and ML

algorithms– What can go wrong?

4

New ways to collect same data

Use end user devices (mostly smartphones), to (legally) collecthow people move (GPS points).

I Old way– Ask people to fill up forms

I New way

– Ask people to look at their GPS points and fill up forms(?)I New way 2.0

– Use [buzzwords] machine learning / AI to avoid asking peopleto fill up forms

– But we need data to train the algorithms– Also, algorithms will give wrong predictions– How about a continuous feedback loop between users and ML

algorithms– What can go wrong?

4

New ways to collect same data

Use end user devices (mostly smartphones), to (legally) collecthow people move (GPS points).

I Old way– Ask people to fill up forms

I New way– Ask people to look at their GPS points and fill up forms(?)

I New way 2.0

– Use [buzzwords] machine learning / AI to avoid asking peopleto fill up forms

– But we need data to train the algorithms– Also, algorithms will give wrong predictions– How about a continuous feedback loop between users and ML

algorithms– What can go wrong?

4

New ways to collect same data

Use end user devices (mostly smartphones), to (legally) collecthow people move (GPS points).

I Old way– Ask people to fill up forms

I New way– Ask people to look at their GPS points and fill up forms(?)

I New way 2.0– Use [buzzwords] machine learning / AI to avoid asking people

to fill up forms

– But we need data to train the algorithms– Also, algorithms will give wrong predictions– How about a continuous feedback loop between users and ML

algorithms– What can go wrong?

4

New ways to collect same data

Use end user devices (mostly smartphones), to (legally) collecthow people move (GPS points).

I Old way– Ask people to fill up forms

I New way– Ask people to look at their GPS points and fill up forms(?)

I New way 2.0– Use [buzzwords] machine learning / AI to avoid asking people

to fill up forms– But we need data to train the algorithms

– Also, algorithms will give wrong predictions– How about a continuous feedback loop between users and ML

algorithms– What can go wrong?

4

New ways to collect same data

Use end user devices (mostly smartphones), to (legally) collecthow people move (GPS points).

I Old way– Ask people to fill up forms

I New way– Ask people to look at their GPS points and fill up forms(?)

I New way 2.0– Use [buzzwords] machine learning / AI to avoid asking people

to fill up forms– But we need data to train the algorithms– Also, algorithms will give wrong predictions

– How about a continuous feedback loop between users and MLalgorithms

– What can go wrong?

4

New ways to collect same data

Use end user devices (mostly smartphones), to (legally) collecthow people move (GPS points).

I Old way– Ask people to fill up forms

I New way– Ask people to look at their GPS points and fill up forms(?)

I New way 2.0– Use [buzzwords] machine learning / AI to avoid asking people

to fill up forms– But we need data to train the algorithms– Also, algorithms will give wrong predictions– How about a continuous feedback loop between users and ML

algorithms

– What can go wrong?

4

New ways to collect same data

Use end user devices (mostly smartphones), to (legally) collecthow people move (GPS points).

I Old way– Ask people to fill up forms

I New way– Ask people to look at their GPS points and fill up forms(?)

I New way 2.0– Use [buzzwords] machine learning / AI to avoid asking people

to fill up forms– But we need data to train the algorithms– Also, algorithms will give wrong predictions– How about a continuous feedback loop between users and ML

algorithms– What can go wrong?

4

Data

5

Data

5

Data

5

Dealing with predictionsI Machine learning algorithms infer

– Time periods for trips (when people travel betweendestinations)

– Time periods for triplegs (when people travel with differenttravel modes [between destinations])

– Destination of trips (where people travel)– Purpose of trips (why people travel)– Travel modes (how people travel)

I Users can look at the output– And confirm that it is correct– Or modify it to bring it to the correct state

I AloneI Using a weird UII ...

6

Dealing with predictionsI Machine learning algorithms infer

– Time periods for trips (when people travel betweendestinations)

– Time periods for triplegs (when people travel with differenttravel modes [between destinations])

– Destination of trips (where people travel)– Purpose of trips (why people travel)– Travel modes (how people travel)

I Users can look at the output– And confirm that it is correct– Or modify it to bring it to the correct state

I AloneI Using a weird UII ...

6

What can go wrong?I Invalid periods

– e.g., traveled to work between 08:00 AM and 07:00 AM(same day)

I Overlapping periods– e.g., traveled to work between 08:00 AM and 09:00 AM, and

traveled to the supermarket between 08:30 AM and 09:30

I Discontinuous periods– e.g., traveled to work between 08:00 AM to 09:00 AM,

worked between 09:00 AM and 11:30 AM worked, andtraveled to a restaurant between 13:00 AM to 13:20

I Improper nests– e.g., traveled to work between 08:00 AM to 09:00 AM, during

which I walked between 08:20 and 09:00

7

How to prevent things from going wrong?

8

How to prevent things from going wrong?

8

How to prevent things from going wrong?

8

Avoiding headaches

Constraints

I Good for enforcing conditions on the table’s attributes

I Specified at table declaration or afterward, via ALTERI Types:

– Check constraints (row-level), e.g., age >0– Not null constraints (row-level), e.g., name not null– Unique constraints (table-level), e.g., two people cannot have

the same ID– Primary key (table-level) - unique + not null– Foreign key (between tables) - referential integrity to other

tables– Exclude constraints (table level) - unique + table-level check

9

Avoiding headaches

Triggers

I Useful for enforcing constraints across multiple tables(works fine for same table constraints, as well)

I Can be used to implement complex logicI Types

– Event triggerI Global, for the databaseI Captures DDL events (CREATE, ALTER, DROP,

COMMENT, GRANT, REVOKE)– Data change trigger

I Local, for the tableI Captures DML events (INSERT, UPDATE, TRUNCATE,

DELETE)I Can be run BEFORE or AFTER the data change eventI Can be conditioned by a WHEN clause

I Triggers can be written in any procedural language withevent trigger support, or in C

10

Generate test data - Tables

11

Generate test data - Tables

11

Generate test data - Tables

11

Generate test data - Locations

12

Generate test data - Locations

12

Generate test data - Trips and triplegs

13

Generate test data - Trips and triplegs

13

Check Constraints

ERROR: new row for relation "trips" violates check

constraint "valid_periods"

DETAIL: Failing row contains

(5, 2016-10-25 07:37:28.6, 2016-10-25 06:37:28.6)

14

Not Null Constraints

ERROR: null value in column "from_time" violates

not-null constraint

DETAIL: Failing row contains (6, null, null).

15

Primary Key Constraints

ERROR: duplicate key value violates unique constraint

"trips_pkey"

DETAIL: Key (id)=(1) already exists.

16

Foreign Key Constraints

ERROR: insert or update on table "triplegs" violates

foreign key constraint "triplegs_trip_id_fkey"

DETAIL: Key (trip_id)=(100) is not present in

table "trips".

17

Exclusion Constraints

ERROR: conflicting key value violates exclusion

constraint "non_overlapping_trip_periods"

DETAIL: Key (tsrange(from_time, to_time))

= (["2016-10-25 12:17:00","2016-10-25 12:20:00"))

conflicts with existing key

(tsrange(from_time, to_time))

= (["2016-10-25 12:00:00","2016-10-25 13:00:00")).

18

Triggers - definition example

19

Triggers - minimum two locations per tripleg

ERROR: proposed period modification of tripleg 1

does not contain enough locations

20

Triggers - time update of trip affects neighbors

21

Triggers - time update of trip affects neighbors

21

Triggers - time update of trip affects neighbors

21

Triggers - time update of trip affects (nested)

triplegs

ERROR: proposed period modification of tripleg 4

does not contain enough locations

I Trigger attached to the Triplegs invalidated anoperation on the Trips table and rolled back thetransaction

I As long as the rules are properly defined andimplemented, data integrity is maintained

22

Triggers - time update of trip affects (nested)

triplegs

ERROR: proposed period modification of tripleg 4

does not contain enough locations

I Trigger attached to the Triplegs invalidated anoperation on the Trips table and rolled back thetransaction

I As long as the rules are properly defined andimplemented, data integrity is maintained

22

Triggers - time update of trip affects (nested)

triplegs

ERROR: proposed period modification of tripleg 4

does not contain enough locations

I Trigger attached to the Triplegs invalidated anoperation on the Trips table and rolled back thetransaction

I As long as the rules are properly defined andimplemented, data integrity is maintained

22

Wrapping up

Triggers and constraintsI Advantages

– Can be used to implement your unified custom logic (writeonce, fix once)

– Constraints are great for enforcing checks on a row-level ortable-level (exclusion constraints), as well as referentialintegrity (foreign keys)

– For multiple table logic (or single table logic), triggers can doalmost anything - use responsibly

I Disadvantages– Documentation can be overwhelming– Cascading triggers can throw you in a loop (remove the

WHEN conditions on trip_period_integrity trigger)– Performance overhead– Cascading triggers are difficult to debug and the actions are

not always intuitiveI Advice

– Write unit tests (pgTap) to validate that a changed / newtrigger doesn’t break your functionality

23

ReferencesI Docs:

– Triggers - https://www.postgresql.org/docs/current/static/sql-createtrigger.html

– Constraints - https://www.postgresql.org/docs/current/static/ddl-constraints.html

– Exclusion constraints -https://www.postgresql.org/docs/current/static/

sql-createtable.html#SQL-CREATETABLE-EXCLUDE

– Examples of exclusion constraints on ranges -https://www.postgresql.org/docs/current/static/

rangetypes.html#RANGETYPES-CONSTRAINTI Other presentations on constraints and triggers

– Jim Mlodgenski (more on triggers) -http://www.slideshare.net/jim_mlodgenski/

an-introduction-to-postresql-triggers

– Robert Haas (triggers cheat sheet) - http://www.slideshare.net/pgconf/introduction-to-triggers

– Magnus Hagander (exclusion constraints) -https://www.hagander.net/talks/BeyondUNIQUE.pdf

24

References (part 2)I More on exclusion constraints

– Depesz - https://www.depesz.com/2010/01/03/waiting-for-8-5-exclusion-constraints/

– Davis Jeff -http://thoughts.davisjeff.com/2010/09/25/

exclusion-constraints-are-generalized-sql-unique/

I Github examples:– For this talk -

https://github.com/adrianprelipcean/PUG_Stockholm

– For the tracking system mentioned at the beginning (database+ mobile apps + annotation UI) -https://github.com/Badger-MEILI

25

Thank you for your attention!Questions and Discussions

Adrian C. Prelipceanhttp://adrianprelipcean.github.io/acpr@kth.se@Adi Prelipcean