Assuring spatial and temporal data integrity via constraints and triggers in PostgreSQL
-
Upload
adrian-c-prelipcean -
Category
Technology
-
view
240 -
download
1
Transcript of Assuring spatial and temporal data integrity via constraints and triggers in PostgreSQL
Assuring spatial and temporal data
integrity via constraints and triggers in
PostgreSQL
Adrian C. [email protected]
Ego page
Adrian C. Prelipcean
1. PhD student at KTH– Transportation systems and geoinformatics– Data collection and analysis methods for travel behaviour– Lab responsible for multiple courses
I GIS Algorithms (Java)I Spatial Databases (PostgreSQL, PostGIS, pgRouting)I Web and Mobile GIS (NodeJS, ExpressJS, Apache
Cordova, PostgreSQL)
2. Tech responsible at Airmee– Airmee - a smart platform for urban logistics– Currently working on MVPs (from the backend to the
frontend)– PostgreSQL as the backend
2
Background
Travel behaviour - understand how people travel and makedecisionsWe used to ask people how they are travelling - fill in paperforms, answer telephone interviews, fill in web forms.
I Found out how to adapt the infrastructure of our cities tohandle a growing population.
I However, also found out that:
– Physical mail is not that popular.– Filling up forms is not really a hobby.– Method doesn’t scale well.
3
Background
Travel behaviour - understand how people travel and makedecisionsWe used to ask people how they are travelling - fill in paperforms, answer telephone interviews, fill in web forms.
I Found out how to adapt the infrastructure of our cities tohandle a growing population.
I However, also found out that:– Physical mail is not that popular.
– Filling up forms is not really a hobby.– Method doesn’t scale well.
3
Background
Travel behaviour - understand how people travel and makedecisionsWe used to ask people how they are travelling - fill in paperforms, answer telephone interviews, fill in web forms.
I Found out how to adapt the infrastructure of our cities tohandle a growing population.
I However, also found out that:– Physical mail is not that popular.– Filling up forms is not really a hobby.
– Method doesn’t scale well.
3
Background
Travel behaviour - understand how people travel and makedecisionsWe used to ask people how they are travelling - fill in paperforms, answer telephone interviews, fill in web forms.
I Found out how to adapt the infrastructure of our cities tohandle a growing population.
I However, also found out that:– Physical mail is not that popular.– Filling up forms is not really a hobby.– Method doesn’t scale well.
3
New ways to collect same data
Use end user devices (mostly smartphones), to (legally) collecthow people move (GPS points).
I Old way
– Ask people to fill up forms
I New way
– Ask people to look at their GPS points and fill up forms(?)
I New way 2.0
– Use [buzzwords] machine learning / AI to avoid asking peopleto fill up forms
– But we need data to train the algorithms– Also, algorithms will give wrong predictions– How about a continuous feedback loop between users and ML
algorithms– What can go wrong?
4
New ways to collect same data
Use end user devices (mostly smartphones), to (legally) collecthow people move (GPS points).
I Old way– Ask people to fill up forms
I New way
– Ask people to look at their GPS points and fill up forms(?)
I New way 2.0
– Use [buzzwords] machine learning / AI to avoid asking peopleto fill up forms
– But we need data to train the algorithms– Also, algorithms will give wrong predictions– How about a continuous feedback loop between users and ML
algorithms– What can go wrong?
4
New ways to collect same data
Use end user devices (mostly smartphones), to (legally) collecthow people move (GPS points).
I Old way– Ask people to fill up forms
I New way
– Ask people to look at their GPS points and fill up forms(?)I New way 2.0
– Use [buzzwords] machine learning / AI to avoid asking peopleto fill up forms
– But we need data to train the algorithms– Also, algorithms will give wrong predictions– How about a continuous feedback loop between users and ML
algorithms– What can go wrong?
4
New ways to collect same data
Use end user devices (mostly smartphones), to (legally) collecthow people move (GPS points).
I Old way– Ask people to fill up forms
I New way
– Ask people to look at their GPS points and fill up forms(?)I New way 2.0
– Use [buzzwords] machine learning / AI to avoid asking peopleto fill up forms
– But we need data to train the algorithms– Also, algorithms will give wrong predictions– How about a continuous feedback loop between users and ML
algorithms– What can go wrong?
4
New ways to collect same data
Use end user devices (mostly smartphones), to (legally) collecthow people move (GPS points).
I Old way– Ask people to fill up forms
I New way– Ask people to look at their GPS points and fill up forms(?)
I New way 2.0
– Use [buzzwords] machine learning / AI to avoid asking peopleto fill up forms
– But we need data to train the algorithms– Also, algorithms will give wrong predictions– How about a continuous feedback loop between users and ML
algorithms– What can go wrong?
4
New ways to collect same data
Use end user devices (mostly smartphones), to (legally) collecthow people move (GPS points).
I Old way– Ask people to fill up forms
I New way– Ask people to look at their GPS points and fill up forms(?)
I New way 2.0– Use [buzzwords] machine learning / AI to avoid asking people
to fill up forms
– But we need data to train the algorithms– Also, algorithms will give wrong predictions– How about a continuous feedback loop between users and ML
algorithms– What can go wrong?
4
New ways to collect same data
Use end user devices (mostly smartphones), to (legally) collecthow people move (GPS points).
I Old way– Ask people to fill up forms
I New way– Ask people to look at their GPS points and fill up forms(?)
I New way 2.0– Use [buzzwords] machine learning / AI to avoid asking people
to fill up forms– But we need data to train the algorithms
– Also, algorithms will give wrong predictions– How about a continuous feedback loop between users and ML
algorithms– What can go wrong?
4
New ways to collect same data
Use end user devices (mostly smartphones), to (legally) collecthow people move (GPS points).
I Old way– Ask people to fill up forms
I New way– Ask people to look at their GPS points and fill up forms(?)
I New way 2.0– Use [buzzwords] machine learning / AI to avoid asking people
to fill up forms– But we need data to train the algorithms– Also, algorithms will give wrong predictions
– How about a continuous feedback loop between users and MLalgorithms
– What can go wrong?
4
New ways to collect same data
Use end user devices (mostly smartphones), to (legally) collecthow people move (GPS points).
I Old way– Ask people to fill up forms
I New way– Ask people to look at their GPS points and fill up forms(?)
I New way 2.0– Use [buzzwords] machine learning / AI to avoid asking people
to fill up forms– But we need data to train the algorithms– Also, algorithms will give wrong predictions– How about a continuous feedback loop between users and ML
algorithms
– What can go wrong?
4
New ways to collect same data
Use end user devices (mostly smartphones), to (legally) collecthow people move (GPS points).
I Old way– Ask people to fill up forms
I New way– Ask people to look at their GPS points and fill up forms(?)
I New way 2.0– Use [buzzwords] machine learning / AI to avoid asking people
to fill up forms– But we need data to train the algorithms– Also, algorithms will give wrong predictions– How about a continuous feedback loop between users and ML
algorithms– What can go wrong?
4
Data
5
Data
5
Data
5
Dealing with predictionsI Machine learning algorithms infer
– Time periods for trips (when people travel betweendestinations)
– Time periods for triplegs (when people travel with differenttravel modes [between destinations])
– Destination of trips (where people travel)– Purpose of trips (why people travel)– Travel modes (how people travel)
I Users can look at the output– And confirm that it is correct– Or modify it to bring it to the correct state
I AloneI Using a weird UII ...
6
Dealing with predictionsI Machine learning algorithms infer
– Time periods for trips (when people travel betweendestinations)
– Time periods for triplegs (when people travel with differenttravel modes [between destinations])
– Destination of trips (where people travel)– Purpose of trips (why people travel)– Travel modes (how people travel)
I Users can look at the output– And confirm that it is correct– Or modify it to bring it to the correct state
I AloneI Using a weird UII ...
6
What can go wrong?I Invalid periods
– e.g., traveled to work between 08:00 AM and 07:00 AM(same day)
I Overlapping periods– e.g., traveled to work between 08:00 AM and 09:00 AM, and
traveled to the supermarket between 08:30 AM and 09:30
I Discontinuous periods– e.g., traveled to work between 08:00 AM to 09:00 AM,
worked between 09:00 AM and 11:30 AM worked, andtraveled to a restaurant between 13:00 AM to 13:20
I Improper nests– e.g., traveled to work between 08:00 AM to 09:00 AM, during
which I walked between 08:20 and 09:00
7
How to prevent things from going wrong?
8
How to prevent things from going wrong?
8
How to prevent things from going wrong?
8
Avoiding headaches
Constraints
I Good for enforcing conditions on the table’s attributes
I Specified at table declaration or afterward, via ALTERI Types:
– Check constraints (row-level), e.g., age >0– Not null constraints (row-level), e.g., name not null– Unique constraints (table-level), e.g., two people cannot have
the same ID– Primary key (table-level) - unique + not null– Foreign key (between tables) - referential integrity to other
tables– Exclude constraints (table level) - unique + table-level check
9
Avoiding headaches
Triggers
I Useful for enforcing constraints across multiple tables(works fine for same table constraints, as well)
I Can be used to implement complex logicI Types
– Event triggerI Global, for the databaseI Captures DDL events (CREATE, ALTER, DROP,
COMMENT, GRANT, REVOKE)– Data change trigger
I Local, for the tableI Captures DML events (INSERT, UPDATE, TRUNCATE,
DELETE)I Can be run BEFORE or AFTER the data change eventI Can be conditioned by a WHEN clause
I Triggers can be written in any procedural language withevent trigger support, or in C
10
Generate test data - Tables
11
Generate test data - Tables
11
Generate test data - Tables
11
Generate test data - Locations
12
Generate test data - Locations
12
Generate test data - Trips and triplegs
13
Generate test data - Trips and triplegs
13
Check Constraints
ERROR: new row for relation "trips" violates check
constraint "valid_periods"
DETAIL: Failing row contains
(5, 2016-10-25 07:37:28.6, 2016-10-25 06:37:28.6)
14
Not Null Constraints
ERROR: null value in column "from_time" violates
not-null constraint
DETAIL: Failing row contains (6, null, null).
15
Primary Key Constraints
ERROR: duplicate key value violates unique constraint
"trips_pkey"
DETAIL: Key (id)=(1) already exists.
16
Foreign Key Constraints
ERROR: insert or update on table "triplegs" violates
foreign key constraint "triplegs_trip_id_fkey"
DETAIL: Key (trip_id)=(100) is not present in
table "trips".
17
Exclusion Constraints
ERROR: conflicting key value violates exclusion
constraint "non_overlapping_trip_periods"
DETAIL: Key (tsrange(from_time, to_time))
= (["2016-10-25 12:17:00","2016-10-25 12:20:00"))
conflicts with existing key
(tsrange(from_time, to_time))
= (["2016-10-25 12:00:00","2016-10-25 13:00:00")).
18
Triggers - definition example
19
Triggers - minimum two locations per tripleg
ERROR: proposed period modification of tripleg 1
does not contain enough locations
20
Triggers - time update of trip affects neighbors
21
Triggers - time update of trip affects neighbors
21
Triggers - time update of trip affects neighbors
21
Triggers - time update of trip affects (nested)
triplegs
ERROR: proposed period modification of tripleg 4
does not contain enough locations
I Trigger attached to the Triplegs invalidated anoperation on the Trips table and rolled back thetransaction
I As long as the rules are properly defined andimplemented, data integrity is maintained
22
Triggers - time update of trip affects (nested)
triplegs
ERROR: proposed period modification of tripleg 4
does not contain enough locations
I Trigger attached to the Triplegs invalidated anoperation on the Trips table and rolled back thetransaction
I As long as the rules are properly defined andimplemented, data integrity is maintained
22
Triggers - time update of trip affects (nested)
triplegs
ERROR: proposed period modification of tripleg 4
does not contain enough locations
I Trigger attached to the Triplegs invalidated anoperation on the Trips table and rolled back thetransaction
I As long as the rules are properly defined andimplemented, data integrity is maintained
22
Wrapping up
Triggers and constraintsI Advantages
– Can be used to implement your unified custom logic (writeonce, fix once)
– Constraints are great for enforcing checks on a row-level ortable-level (exclusion constraints), as well as referentialintegrity (foreign keys)
– For multiple table logic (or single table logic), triggers can doalmost anything - use responsibly
I Disadvantages– Documentation can be overwhelming– Cascading triggers can throw you in a loop (remove the
WHEN conditions on trip_period_integrity trigger)– Performance overhead– Cascading triggers are difficult to debug and the actions are
not always intuitiveI Advice
– Write unit tests (pgTap) to validate that a changed / newtrigger doesn’t break your functionality
23
ReferencesI Docs:
– Triggers - https://www.postgresql.org/docs/current/static/sql-createtrigger.html
– Constraints - https://www.postgresql.org/docs/current/static/ddl-constraints.html
– Exclusion constraints -https://www.postgresql.org/docs/current/static/
sql-createtable.html#SQL-CREATETABLE-EXCLUDE
– Examples of exclusion constraints on ranges -https://www.postgresql.org/docs/current/static/
rangetypes.html#RANGETYPES-CONSTRAINTI Other presentations on constraints and triggers
– Jim Mlodgenski (more on triggers) -http://www.slideshare.net/jim_mlodgenski/
an-introduction-to-postresql-triggers
– Robert Haas (triggers cheat sheet) - http://www.slideshare.net/pgconf/introduction-to-triggers
– Magnus Hagander (exclusion constraints) -https://www.hagander.net/talks/BeyondUNIQUE.pdf
24
References (part 2)I More on exclusion constraints
– Depesz - https://www.depesz.com/2010/01/03/waiting-for-8-5-exclusion-constraints/
– Davis Jeff -http://thoughts.davisjeff.com/2010/09/25/
exclusion-constraints-are-generalized-sql-unique/
I Github examples:– For this talk -
https://github.com/adrianprelipcean/PUG_Stockholm
– For the tracking system mentioned at the beginning (database+ mobile apps + annotation UI) -https://github.com/Badger-MEILI
25
Thank you for your attention!Questions and Discussions
Adrian C. Prelipceanhttp://adrianprelipcean.github.io/[email protected]@Adi Prelipcean