Oaktable World 2014 Toon Koppelaars: database constraints polite excuse
-
Upload
kyle-hailey -
Category
Software
-
view
196 -
download
0
description
Transcript of Oaktable World 2014 Toon Koppelaars: database constraints polite excuse
Why SQL DBMS’s Still
Lack Full Declarative
Constraint Support
(A polite excuse)
Toon Koppelaars
Who am I?
• A database guy, relational dude
– Developer (with DBA experience)
• Oracle technology user since 1987
• Co-author of this book
• Today’s talk = last chapter
Agenda
• SQL Assertions?
– A few examples
• Validation Execution Models
– More efficient along the way
• Serializability
– Probably the biggest problem
• Conclusions
SQL Assertions?
• Examples
1. There is at most one president
2. One cannot manage more than two
departments
3. Department with president and/or
manager should have an administrator
Data integrity constraints
– Just like: CHECK, PK, UK, FK
– ‘all the other ones’
They constrain
data allowed in
our tables
They constrain
data allowed in
our tables
Syntax?
• Create assertion [some_name] as
check([some_SQL_expression]);
• For over 25 years part of Ansi/Iso
SQL standard
Example 1
• create assertion
at_most_one_president
as check
(1 =>
(select count(*)
from EMP e
where e.JOB = ‘PRESIDENT’
)
);
• Task of DBMS: make sure it’s true at all times
- Closed SQL
expression (no
free variables)
- Evaluates to
true or false
Example 2
• create assertion
cannot_manage_more_than_2
as check
(not exists
(select ‘x’
from (select d.MGR
,count(*) as cnt
from DEPT d
group by d.MGR)
where cnt > 2
)
);
Example 3
• create assertion admin_in_dept_with_vip as check (not exists (select ‘x’ from (select distinct e.DEPTNO from EMP e) e where exists (select ‘y’ from EMP e1 where e1.DEPTNO = e.DEPTNO and e1.JOB in (‘P’,’M’)) and not exists (select ‘z’ from EMP e2 where e2.DEPTNO = e.DEPTNO and e2.JOB = ‘ADMIN’) ) );
• Can this be the killer feature of
OracleXIII?
Imagine...
• If DBMS vendor would support these
– How much less lines of code one would
need to write in application
development
– How much less bugs this results into
– How easy we could accomodate
change requested by the business
– How data quality would improve
Point to be made
• SQL assertions ‘cover’ all the other
declarative constraints available
– CHECK can be written as assertion
– PK/UK can be written as assertion
– FK can be written as assertion
• We just have shorthands for these,
since these are so common in every
database design
Check writen as assertion
• create assertion hire_only_18_plus
as check
(not exists
(select ‘x’
from EMP e
where e.HIRED - e.BORN <
18*365
)
);
CHECK((HIRED – BORN) >= 18*365)
PK/UK written as assertion
• create assertion empno_is_unique
as check
(not exists
(select ‘x’
from EMP e1
,EMP e2
where e1.EMPNO = e2.EMPNO
and e1.rowid != e2.rowid
)
);
PRIMARY KEY (EMPNO)
FK written as assertion
• create assertion work_in_known_dept as check (not exists (select ‘x’ from EMP e where not exists (select ‘y’ from DEPT d where d.deptno = e.deptno ) ) ); FOREIGN KEY (DEPTNO) REFERENCES DEPT(DEPTNO)
Scientific Problem
• Why is CREATE ASSERTION not
supported?
• Think about this:
– The DBMS has our assertion expression
but:
• Would you accept full evaluation of given
expression for every insert/update/delete?
– No.
• Performance would be horrible in real world db’s
• You require a much more sophisticated execution
model in the real world
Scientific Problem
• The challenge is:
– Given: arbitrary complex assertion expression
+
– Given: arbitrary complex SQL statement
– What is the minimum required check that
needs to be executed by the DBMS for the
assertion to remain true?
• Can be ‘nothing’ if assertion is immune for the given
DML
For instance...
• Department that employs president or manager should also employ administrator – Obviously only when DML operates on EMP
1.Insert into emp values(100,’Smith’,’SALESREP’
,’1/1/80’,’20/11/07’,7000,20);
2.Insert into emp values(:b0,:b1,:b2
,:b3,:b4,:b5,:b6);
3.Insert into emp (select * from emp_loaded);
4.Update emp set msal=1.05*msal;
5.Delete from emp where empno=101;
Execution Models
• Using triggers:
– EM1: always
– EM2: involved tables
– EM3: involved columns
– EM4: table polarities
– EM5: involved literals (TE)
– EM6: TE + minimal check
Execution Model 1
• Evaluate every assertion on every
DML statement
– Evaluating every <boolean
expression with subqueries>
– On every DML statement
• In after-statement table trigger
– WOULD BE *VERY* INEFFICIENT
• Let’s quickly forget this “EM1”
Execution Model 2
• Only evaluate the assertions that involve the table that is being “DML-led”
– Finding involved tables by parsing the
assertion expression
– Per assertion 100% code generation of a
(insert/update/delete) “after statement” table
trigger
• Select from dual where [expression]
• Found? OK
• Not found raise_application_error
Execution Model 2
• Using examples 2 and 3: – Admin_in_dept_with_vip EMP table
– Cannot_manage_more_than_2 DEPT table
Execution Model 2
create trigger EMP_AIUDS_EMP01
after insert or update or delete on EMP
declare pl_dummy varchar(40);
begin
--
select 'Constraint EMP01 is satisfied' into pl_dummy
from DUAL
where not exists(select ‘a violation’
from (select distinct deptno from EMP) d
where exists(select e2.*
from EMP e2
where e2.DEPTNO = d.DEPTNO
and e2.JOB in ('PRESIDENT','MANAGER'))
and not exists(select e3.*
from EMP e3
where e3.DEPTNO = d.DEPTNO
and e3.JOB = 'ADMIN'));
--
exception when no_data_found then
raise_application_error(-20999,'Constraint EMP01 is violated.');
end;
Assertion
predicate
Execution Model 2
create trigger DEPT_AIUDS_DEPT01
after insert or update or delete on DEPT
declare pl_dummy varchar(1);
begin
--
select ‘a' into pl_dummy
from DUAL
where not exists (select ‘a violation’ from (select d.MGR ,count(*) as cnt from DEPT d group by d.MGR) where cnt > 2));
--
exception when no_data_found then
raise_application_error(-20999,'Constraint DEPT01 is violated.');
end;
EM2 could be supported declaratively
Execution Model 2
• Inefficiencies:
– EMP01 and DEPT01 are checked when updating columns that are not involved, for instance:
• Updating EMP.ENAME
• Updating DEPT.LOC
Execution Model 3
• For inserts + deletes: EM3 == EM2
• For updates: only evaluate
assertions that involve *columns*
being changed
– Simple parse will find columns
– Assumes ‘clean specification’
• Create trigger syntax allows
specification of columns that are
being changed
Execution Model 3
create trigger EMP_AUS_EMP01
after update of DEPTNO,JOB on EMP
declare pl_dummy varchar(40);
Begin
--
select 'Constraint EMP01 is satisfied' into pl_dummy
from DUAL
where not exists(select ‘department in violation’
from (select distinct deptno from EMP) d
where exists(select e2.*
from EMP e2
where e2.DEPTNO = d.DEPTNO
and e2.JOB in ('PRESIDENT','MANAGER'))
and not exists(select e3.*
from EMP e3
where e3.DEPTNO = d.DEPTNO
and e3.JOB = 'ADMIN'));
--
exception when no_data_found then
raise_application_error(-20999,'Constraint EMP01 is violated.');
end;
Execution Model 3
create trigger DEPT_AIUDS_DEPT01
after update of MGR on DEPT
declare pl_dummy varchar(40);
begin
--
select ‘a' into pl_dummy
from DUAL
where not exists (select ‘x’ from (select d.MGR ,count(*) as cnt from DEPT d group by d.MGR) where cnt > 2));
--
exception when no_data_found then
raise_application_error(-20999,'Constraint DEPT01 is violated.');
end;
EM3 could be supported declaratively
Execution Model 3
• Inefficiencies:
– Sometimes inserts (e)or deletes can
never violate a constraint
– For Cannot_manage_more_than_2,
deleting a department does not require
re-validation
– For Admin_in_dept_with_vip, both
inserts and deletes do require re-
validation
Execution Model 4
• For updates: EM4 = EM3
• For deletes and inserts EM4 drops unnecessary delete (e)or insert table triggers
– Polarity of a table for a given constraint • Positive: inserts can violate
• Negative: deletes can violate
• Neutral: both can violate
• Undefined: table is not involved
Polarity of table for given constraint can be
computed via special parsing
Execution Model 4
• EM4 maintains:
– All EM3 triggers, except for one:
• Drops:
– Cannot_manage_more_than_2 delete
trigger
EM4 could be supported declaratively
Execution Model 4
• Inefficiencies: – Admin_in_dept_with_vip: eg. inserting
SALESMAN, or deleting TRAINER does not require re-validation
• Start involving literals mentioned in assertions
• If assertion does not have literals then next EM is same as EM4 – Cannot_manage_more_than_2 has no literals
Execution Model 5
• How do we see: – Salesman is inserted? Trainer deleted?
• Could parse the DML statement
– But does not always work due to absence of literals
• Make use of column values of affected rows – Requires “Transition Effect” (TE) of a DML statement
• Common concept (aka. “delta”-tables)
• Inserted_rows, Updated_rows, Deleted_rows
– Maintaining TE is straightforward (see book)
• EM5: Only check the assertion when a property holds in the TE
Execution Model 5
• TE property for
Admin_in_dept_with_vip:
1. Inserted_rows holds a president or a
manager
or,
2. Deleted_rows holds an admininstrator
3. Updated rows shows that ...
Execution Model 5
create trigger EMP_AIS_EMP01
after insert on EMP
declare pl_dummy varchar(40);
begin
-- If this returns no rows, then EMP01 cannot be violated.
select 'EMP01 must be validated' into pl_dummy
from DUAL
where exists
(select 'A president or manager has just been inserted'
from inserted_rows
where JOB in ('PRESIDENT','MANAGER'));
--
begin
--
<same trigger code as EM4>
--
end;
exception when no_data_found then
-- No need to validate EMP01.
null;
--
end;
Execution Model 5
create trigger EMP_ADS_EMP01
after delete on EMP
declare pl_dummy varchar(40);
begin
-- If this returns no rows, then EMP01 cannot be violated.
select 'EMP01 must be validated' into pl_dummy
from DUAL
where exists
(select 'An administrator has just been deleted'
from deleted_rows
where JOB = 'ADMIN');
--
begin
--
<same trigger code as EM4>
--
end;
exception when no_data_found then
-- No need to validate EMP01.
null;
--
end;
Execution Model 5
• Update TE-property for EMP01:
select 'EMP01 is in need of validation'
from DUAL
where exists
(select 'Some department just won a president/
manager or just lost an administrator'
from updated_rows
where (n_job in ('PRESIDENT','MANAGER') and
o_job not in ('PRESIDENT','MANAGER')
or (o_job='ADMIN' and n_job<>'ADMIN')
or (o_deptno<>n_deptno and
(o_job='ADMIN' or n_job in
('PRESIDENT','MANAGER')))
• Can be deduced from insert + delete properties
• EM5 fully declarative too? – Here it gets complex...
Execution Model 5
• Inefficiencies:
– Admin_in_dept_with_vip:
triggers validate all departments
• Unacceptable in real-world databases
• Only some require re-validation
– Cannot_manage_more_than_2:
triggers validate all department
managers
• Unacceptable in real-world databases
• Only some require re-validation
Execution Model 6
• On TE-property + optimized
validation query
– Use the TE-query to find:
• Which deptno-values require re-validation
• Which mgr-values require re-validation
– Then use these values in the assertion-
expression
create trigger EMP_AIS_EMP01
after insert on EMP
declare pl_dummy varchar(40);
begin
--
for r in (select distinct deptno
from inserted_rows
where JOB in ('PRESIDENT','MANAGER'));
loop
begin
-- Note: this now uses r.deptno value from preceeding TE-query.
select 'Constraint EMP01 is satisfied' into pl_dummy
from DUAL
where not exists(select ‘department in violation’
from (select distinct deptno from EMP where deptno = r.deptno) d
where exists(select e2.*
from EMP e2
where e2.DEPTNO = d.DEPTNO
and e2.JOB in ('PRESIDENT','MANAGER'))
and not exists(select e3.*
from EMP e3
where e3.DEPTNO = d.DEPTNO
and e3.JOB = 'ADMIN'));
--
exception when no_data_found then
--
raise_application_error(-20999,
'Constraint EMP01 is violated for department '||to_char(r.deptno)||'.');
--
end;
end loop;
end;
Execution Model 6
• This requires:
– Detecting that the ASSERTION can be (re)written as a universal quantification
– Can sometimes be done in multiple ways
• Which to choose?
– EM6 fully declarative? • Complexity introduced in EM5 further
increases
• Remember: given any arbitrary complex assertion + dml-statement
Still not there yet...
• Then there is something else too...
– Which is often overseen by database
professionals
– And, which is neglected in every
research paper (I’ve read...) that deals
with generating constraint validation
code
Serializability
Concurrent Transactions
• Deptno 13 has two admins and one
manager
– TX1 deletes an admin from 13
• Does not yet commit
– TX2 deletes the other admin 13
• Commits
– TX1 commits
• Constraint is violated for deptno 13!
– TX1 and TX2 must be serialized
Concurrent Transactions
• Note: this is *not* about locking rows of data, but rather: locking a constraint No two TX’s can validate at same time
• We can use DBMS_LOCK to serialize these transactions – See book for example code
• Again complexity further increases
Concurrent Transactions
• Concurrency impact of acquiring rule locks
– EM1: One TX at a time
– EM2: One TX per table at a time
– EM3, EM4, EM5 slowly relaxes • Up to EM5: not acceptable
– EM6: Only if two TX’s actually validate *and* involve same deptno (EMP01 assertion)
• Acceptable
Another complicating
factor
• Deferrabilty...
– Involves temporarily storing violation
cases for re-evaluation at commit time
– More comments on that in chapter 11
of the book
The Polite Excuse
• Inefficient EM’s
– Could be supported, but: • Are unacceptable wgt. Performance &
transaction concurrency
• Efficient EM’s
– Aligned with business requirements • Performance and TX concurrency
But,
– Need more research to determine if they could be declaratively supported