Phase 2, Answering queries using views. February 2 nd, 2004.

62
Phase 2, Answering queries using views. February 2 nd , 2004

Transcript of Phase 2, Answering queries using views. February 2 nd, 2004.

Page 1: Phase 2, Answering queries using views. February 2 nd, 2004.

Phase 2, Answering queries using views.

February 2nd, 2004

Page 2: Phase 2, Answering queries using views. February 2 nd, 2004.

Project, Phase 2!

• Congratulations on completing Phase 1!

• Phase 2 is about data integration.

• Groups of 3: – One from each: billing, inventory and shipping– Groups formed by Wednesday night.– Email to Jessica: group name and html links

to phase 1.

Page 3: Phase 2, Answering queries using views. February 2 nd, 2004.

Data Integration

• Typically, organizations have multiple databases. Often hundreds.

• Databases were designed independently, but include overlapping and related data.

• Many applications require accessing multiple databases in a seamless fashion.

• A ton of research on this. Some products created in last few years.

Page 4: Phase 2, Answering queries using views. February 2 nd, 2004.

What you will do

• Part 1: a web store.– A customer should be able to browse your

inventory and make selections (fill a shopping cart).

– Then, the customer will provide billing and shipping selections.

– The customer doesn’t care that you didn’t coordinate in phase 1.

Page 5: Phase 2, Answering queries using views. February 2 nd, 2004.

What more will you do?

• Part 2: Manager wants to ask complex queries:– Needs a CEO workbench to ask certain

queries.– The queries will involve data from all 3

databases.– She doesn’t care.

Page 6: Phase 2, Answering queries using views. February 2 nd, 2004.

More Details

• You need to use the databases of phase 1.

• You may find problems with phase 1 databases when you want to integrate.

• You can’t change the schema of phase 1 databases, unless:– You write a petition. The petition explains why

the cost of changing the database is worth it for the benefits.

Page 7: Phase 2, Answering queries using views. February 2 nd, 2004.

Why will this be hard?

• You designed your databases independently:– Now that you want to integrate, things will not

fit together so nicely (i.e., will require thought).– You may have made varied modeling

assumptions.

• You need to query across multiple databases. No ‘easy’ way to do that.

Page 8: Phase 2, Answering queries using views. February 2 nd, 2004.

One last note

• Phase 3:– Your company will export a set of web

services. – The companies will communicate with each

other via web services.– We will do something interesting with that.

Page 9: Phase 2, Answering queries using views. February 2 nd, 2004.

Answering Queries Using Views

• What if we want to use a set of views to answer a query.

• Why?– The obvious reason…– Answering queries over web data sources.

• Very cool stuff! (i.e., I did a lot of research on this).

Page 10: Phase 2, Answering queries using views. February 2 nd, 2004.

Reusing a Materialized View• Suppose I have only the result of SeattleView: SELECT buyer, seller, product, store FROM Person, Purchase WHERE Person.city = ‘Seattle’ AND Person.per-name = Purchase.buyer

• and I want to answer the query SELECT buyer, seller FROM Person, Purchase WHERE Person.city = ‘Seattle’ AND Person.per-name = Purchase.buyer AND Purchase.product=‘gizmo’.

Then, I can rewrite the query using the view.

Page 11: Phase 2, Answering queries using views. February 2 nd, 2004.

Query Rewriting Using Views

Rewritten query: SELECT buyer, seller FROM SeattleView WHERE product= ‘gizmo’

Original query: SELECT buyer, seller FROM Person, Purchase WHERE Person.city = ‘Seattle’ AND Person.per-name = Purchase.buyer AND Purchase.product=‘gizmo’.

Page 12: Phase 2, Answering queries using views. February 2 nd, 2004.

Another Example• I still have only the result of SeattleView: SELECT buyer, seller, product, store FROM Person, Purchase WHERE Person.city = ‘Seattle’ AND Person.per-name = Purchase.buyer

• but I want to answer the query SELECT buyer, seller FROM Person, Purchase WHERE Person.city = ‘Seattle’ AND Person.per-name = Purchase.buyer AND Person.Phone LIKE ‘206 543 %’.

Page 13: Phase 2, Answering queries using views. February 2 nd, 2004.

And Now?• I still have only the result of SeattleView: SELECT buyer, seller, product, store FROM Person, Purchase, Product WHERE Person.city = ‘Seattle’ AND Person.per-name = Purchase.buyer AND Purchase.product = Product.name

• but I want to answer the query SELECT buyer, seller FROM Person, Purchase WHERE Person.city = ‘Seattle’ AND Person.per-name = Purchase.buyer.

Page 14: Phase 2, Answering queries using views. February 2 nd, 2004.

And Now?• I still have only the result of: SELECT seller, buyer, Sum(Price) FROM Purchase WHERE Purchase.store = ‘The Bon’ Group By seller, buyer

• but I want to answer the query SELECT seller, Sum(Price) FROM Purchase WHERE Person.store = ‘The Bon’ Group By seller

And what if it’s the other way around?

Page 15: Phase 2, Answering queries using views. February 2 nd, 2004.

Finally…• I still have only the result of: SELECT seller, buyer, Count(*) FROM Purchase WHERE Purchase.store = ‘The Bon’ Group By seller, buyer

• but I want to answer the query SELECT seller, Count(*) FROM Purchase WHERE Person.store = ‘The Bon’ Group By seller

Page 16: Phase 2, Answering queries using views. February 2 nd, 2004.

The General Problem

• Given a set of views V1,…,Vn, and a query Q, can we answer Q using only the answers to V1,…,Vn?

• Why do we care?– We can answer queries more efficiently. – We can query data sources on the WWW in a

principled manner.

• Many, many papers on this problem.• The best performing algorithm: The

MiniCon Algorithm, (Pottinger & (Ha)Levy, 2000).

Page 17: Phase 2, Answering queries using views. February 2 nd, 2004.

Querying the WWW• Assume a virtual schema of the WWW, e.g.,

– Course(number, university, title, prof, quarter)

• Every data source on the web contains the answer to a view over the virtual schema:

UW database: SELECT number, title, prof FROM Course WHERE univ=‘UW’ AND quarter=‘2/02’Stanford database: SELECT number, title, prof, quarter FROM Course WHERE univ=‘Stanford’User query: find all professors who teach “database systems”

Page 18: Phase 2, Answering queries using views. February 2 nd, 2004.

Constraints in SQL

• A constraint = a property that we’d like our database to hold

• The system will enforce the constraint by taking some actions:– forbid an update– or perform compensating updates

Page 19: Phase 2, Answering queries using views. February 2 nd, 2004.

Constraints in SQL

Constraints in SQL:• Keys, foreign keys• Attribute-level constraints• Tuple-level constraints• Global constraints: assertions

The more complex the constraint, the harder it is to check and to enforce

simplest

Mostcomplex

Page 20: Phase 2, Answering queries using views. February 2 nd, 2004.

Keys

OR:

CREATE TABLE Product (name CHAR(30) PRIMARY KEY,category VARCHAR(20))

CREATE TABLE Product (name CHAR(30) PRIMARY KEY,category VARCHAR(20))

CREATE TABLE Product (name CHAR(30),category VARCHAR(20)

PRIMARY KEY (name))

CREATE TABLE Product (name CHAR(30),category VARCHAR(20)

PRIMARY KEY (name))

Page 21: Phase 2, Answering queries using views. February 2 nd, 2004.

Keys with Multiple Attributes

CREATE TABLE Product (name CHAR(30),category VARCHAR(20),price INT,

PRIMARY KEY (name, category))

CREATE TABLE Product (name CHAR(30),category VARCHAR(20),price INT,

PRIMARY KEY (name, category))

Page 22: Phase 2, Answering queries using views. February 2 nd, 2004.

Other Keys

CREATE TABLE Product ( productID CHAR(10),

name CHAR(30),category VARCHAR(20),price INT,

PRIMARY KEY (productID), UNIQUE (name, category))

CREATE TABLE Product ( productID CHAR(10),

name CHAR(30),category VARCHAR(20),price INT,

PRIMARY KEY (productID), UNIQUE (name, category))

There is at most one PRIMARY KEY;there can be many UNIQUE

Page 23: Phase 2, Answering queries using views. February 2 nd, 2004.

Foreign Key Constraints

CREATE TABLE Purchase (prodName CHAR(30)

REFERENCES Product(name), date DATETIME)

CREATE TABLE Purchase (prodName CHAR(30)

REFERENCES Product(name), date DATETIME)

prodName is a foreign key to Product(name)name must be a key in Product

Referentialintegrity

constraints

Page 24: Phase 2, Answering queries using views. February 2 nd, 2004.

Name Category

Gizmo gadget

Camera Photo

OneClick Photo

ProdName Store

Gizmo Wiz

Camera Ritz

Camera Wiz

Product Purchase

Page 25: Phase 2, Answering queries using views. February 2 nd, 2004.

Foreign Key Constraints

• OR

• (name, category) must be a PRIMARY KEY

CREATE TABLE Purchase (prodName CHAR(30),category VARCHAR(20),

date DATETIME, FOREIGN KEY (prodName, category) REFERENCES Product(name, category)

CREATE TABLE Purchase (prodName CHAR(30),category VARCHAR(20),

date DATETIME, FOREIGN KEY (prodName, category) REFERENCES Product(name, category)

Page 26: Phase 2, Answering queries using views. February 2 nd, 2004.

Name Category

Gizmo gadget

Camera Photo

OneClick Photo

ProdName Store

Gizmo Wiz

Camera Ritz

Camera Wiz

Product Purchase

What happens during updates ?

Types of updates:

• In Purchase: insert/update

• In Product: delete/update

Page 27: Phase 2, Answering queries using views. February 2 nd, 2004.

What happens during updates ?

• SQL has three policies for maintaining referential integrity:

• Reject violating modifications (default)• Cascade: after a delete/update do a

delete/update• Set-null set foreign-key field to NULL

READING ASSIGNEMNT: 7.1.5, 7.1.6

Page 28: Phase 2, Answering queries using views. February 2 nd, 2004.

Constraints on Attributes and Tuples

• Constraints on attributes:NOT NULL -- obvious meaning...CHECK condition -- any condition !

• Constraints on tuplesCHECK condition

Page 29: Phase 2, Answering queries using views. February 2 nd, 2004.

CREATE TABLE Purchase (prodName CHAR(30)

CHECK (prodName IN SELECT Product.name FROM Product), date DATETIME NOT NULL)

CREATE TABLE Purchase (prodName CHAR(30)

CHECK (prodName IN SELECT Product.name FROM Product), date DATETIME NOT NULL)

Whatis the difference from

Foreign-Key ?

Page 30: Phase 2, Answering queries using views. February 2 nd, 2004.

General Assertions

CREATE ASSERTION myAssert CHECK NOT EXISTS(

SELECT Product.nameFROM Product, PurchaseWHERE Product.name = Purchase.prodNameGROUP BY Product.nameHAVING count(*) > 200)

CREATE ASSERTION myAssert CHECK NOT EXISTS(

SELECT Product.nameFROM Product, PurchaseWHERE Product.name = Purchase.prodNameGROUP BY Product.nameHAVING count(*) > 200)

Page 31: Phase 2, Answering queries using views. February 2 nd, 2004.

Final Comments on Constraints

• Can give them names, and alter later– Read in the book !!!

• We need to understand exactly when they are checked

• We need to understand exactly what actions are taken if they fail

Page 32: Phase 2, Answering queries using views. February 2 nd, 2004.

Triggers in SQL

• A trigger contains an event, a condition, an action.

• Event = INSERT, DELETE, UPDATE• Condition = any WHERE condition (may refer to

the old and the new values)• Action = more inserts, deletes, updates• Many, many more bells and whistles...• Read in the book (it only scratches the

surface...)

Page 33: Phase 2, Answering queries using views. February 2 nd, 2004.

TriggersEnable the database programmer to specify:• when to check a constraint,• what exactly to do.

A trigger has 3 parts:

• An event (e.g., update to an attribute)• A condition (e.g., a query to check)• An action (deletion, update, insertion)

When the event happens, the system will check the constraint, and if satisfied, will perform the action.

NOTE: triggers may cause cascading effects. Database vendors did not wait for standards with triggers!

Page 34: Phase 2, Answering queries using views. February 2 nd, 2004.

Elements of Triggers (in SQL3)

• Timing of action execution: before, after or instead of triggering event

• The action can refer to both the old and new state of the database.

• Update events may specify a particular column or set of columns.

• A condition is specified with a WHEN clause.

• The action can be performed either for• once for every tuple, or• once for all the tuples that are changed by the database operation.

Page 35: Phase 2, Answering queries using views. February 2 nd, 2004.

Example: Row Level Trigger

CREATE TRIGGER NoLowerPrices

AFTER UPDATE OF price ON ProductREFERENCING OLD AS OldTuple NEW AS NewTupleWHEN (OldTuple.price > NewTuple.price) UPDATE Product SET price = OldTuple.price WHERE name = NewTuple.name

FOR EACH ROW

Page 36: Phase 2, Answering queries using views. February 2 nd, 2004.

Statement Level Trigger

CREATE TRIGGER average-price-preserveINSTEAD OF UPDATE OF price ON Product

REFERENCING OLD_TABLE AS OldStuff NEW_TABLE AS NewStuffWHEN (1000 < (SELECT AVG (price) FROM ((Product EXCEPT OldStuff) UNION NewStuff))DELETE FROM Product WHERE (name, price, company) IN OldStuff;INSERT INTO Product (SELECT * FROM NewStuff)

Page 37: Phase 2, Answering queries using views. February 2 nd, 2004.

Bad Things Can Happen

CREATE TRIGGER Bad-trigger

AFTER UPDATE OF price IN ProductREFERENCING OLD AS OldTuple NEW AS NewTuple

WHEN (NewTuple.price > 50)

UPDATE Product SET price = NewTuple.price * 2 WHERE name = NewTuple.name

FOR EACH ROW

Page 38: Phase 2, Answering queries using views. February 2 nd, 2004.

Embedded SQL

• direct SQL (= ad-hoc SQL) is rarely used

• in practice: SQL is embedded in some application code

• SQL code is identified by special syntax

Page 39: Phase 2, Answering queries using views. February 2 nd, 2004.

Impedance Mismatch

• Example: SQL in C:– C uses int, char[..], pointers, etc– SQL uses tables

• Impedance mismatch = incompatible types

Page 40: Phase 2, Answering queries using views. February 2 nd, 2004.

The Impedance Mismatch Problem

Why not use only one language?

• Forgetting SQL: “we can quickly dispense with this idea” [textbook, pg. 351].

• SQL cannot do everything that the host language can do.

Solution: use cursors

Page 41: Phase 2, Answering queries using views. February 2 nd, 2004.

Programs with Embedded SQL

Host language + Embedded SQL

Preprocessor

Host Language + function calls

Host language compiler

Host language program

Preprocessor

Host language compiler

Call-levelinterface (CLI):ODBC,JDBC,

ADO

Page 42: Phase 2, Answering queries using views. February 2 nd, 2004.

Interface: SQL / Host Language

Values get passed through shared variables.

Colons precede shared variables when they occur within the SQL statements.

EXEC SQL: precedes every SQL statement in the host language.

The variable SQLSTATE provides error messages and status reports (e.g., “00000” says that the operation completed with noproblem).

EXEC SQL BEGIN DECLARE SECTION; char productName[30];EXEC SQL END DECLARE SECTION;

EXEC SQL BEGIN DECLARE SECTION; char productName[30];EXEC SQL END DECLARE SECTION;

Page 43: Phase 2, Answering queries using views. February 2 nd, 2004.

Example

Product (pname, price, quantity, maker)Purchase (buyer, seller, store, pname)Company (cname, city)Person(name, phone, city)

Page 44: Phase 2, Answering queries using views. February 2 nd, 2004.

Using Shared Variables

Void simpleInsert() {

EXEC SQL BEGIN DECLARE SECTION; char n[20], c[30]; /* product-name, company-name */

int p, q; /* price, quantity */ char SQLSTATE[6]; EXEC SQL END DECLARE SECTION;

/* get values for name, price and company somehow */

EXEC SQL INSERT INTO Product(pname, price, quantity, maker) VALUES (:n, :p, :q, :c); }

Void simpleInsert() {

EXEC SQL BEGIN DECLARE SECTION; char n[20], c[30]; /* product-name, company-name */

int p, q; /* price, quantity */ char SQLSTATE[6]; EXEC SQL END DECLARE SECTION;

/* get values for name, price and company somehow */

EXEC SQL INSERT INTO Product(pname, price, quantity, maker) VALUES (:n, :p, :q, :c); }

Page 45: Phase 2, Answering queries using views. February 2 nd, 2004.

Single-Row Select Statements

int getPrice(char *name) {

EXEC SQL BEGIN DECLARE SECTION; char n[20]; int p; char SQLSTATE[6]; EXEC SQL END DECLARE SECTION;

strcpy(n, name); /* copy name to local variable */

EXEC SQL SELECT price INTO :pFROM ProductWHERE Product.name = :n;

return p;}

int getPrice(char *name) {

EXEC SQL BEGIN DECLARE SECTION; char n[20]; int p; char SQLSTATE[6]; EXEC SQL END DECLARE SECTION;

strcpy(n, name); /* copy name to local variable */

EXEC SQL SELECT price INTO :pFROM ProductWHERE Product.name = :n;

return p;}

Page 46: Phase 2, Answering queries using views. February 2 nd, 2004.

Cursors

1. Declare the cursor

2. Open the cursor

3. Fetch tuples one by one

4. Close the cursor

Page 47: Phase 2, Answering queries using views. February 2 nd, 2004.

Cursors

void product2XML() { EXEC SQL BEGIN DECLARE SECTION; char n[20], c[30]; int p, q; char SQLSTATE[6]; EXEC SQL END DECLARE SECTION;

EXEC SQL DECLARE crs CURSOR FOR

SELECT pname, price, quantity, maker

FROM Product;

EXEC SQL OPEN crs;

Page 48: Phase 2, Answering queries using views. February 2 nd, 2004.

Cursors

printf(“<allProducts>\n”);while (1) { EXEC SQL FETCH FROM crs INTO :n, :p, :q, :c; if (NO_MORE_TUPLES) break; printf(“ <product>\n”); printf(“ <name> %s </name>\n”, n); printf(“ <price> %d </price>\n”, p); printf(“ <quantity> %d </quantity>\n”, q); printf(“ <maker> %s </maker>\n”, c); printf(“ </product>\n”);}EXECT SQL CLOSE crs;printf(“</allProducts>\n”);

}

Page 49: Phase 2, Answering queries using views. February 2 nd, 2004.

• What is NO_MORE_TUPLES ?

#define NO_MORE_TUPLES !(strcmp(SQLSTATE,”02000”))

Page 50: Phase 2, Answering queries using views. February 2 nd, 2004.

More on Cursors

• cursors can modify a relation as well as read it.

• We can determine the order in which the cursor will get tuples by the ORDER BY keyword in the SQL query.

• Cursors can be protected against changes to the underlying relations.

• The cursor can be a scrolling one: can go forward, backward +n, -n, Abs(n), Abs(-n).

Page 51: Phase 2, Answering queries using views. February 2 nd, 2004.

Dynamic SQL

• So far the SQL statements were visible to the compiler

• In dynamic SQL we have an arbitrary string that represents a SQL command

• Two steps:– Prepare: compiles the string– Execute: executes the compiled string

Page 52: Phase 2, Answering queries using views. February 2 nd, 2004.

Dynamic SQL

Void someQuery() {EXEC SQL BEGIN DECLARE SECTION;char *command=“UPDATE Product SET quantity=quantity+1 WHERE

name=“gizmo”EXEC SQL END DECLARE SECTION;

EXEC SQL PREPARE myquery FROM :command;

EXEC SQL EXECUTE myquery;

}

myquery = a SQL variable, does not need to be prefixed by “:”

Page 53: Phase 2, Answering queries using views. February 2 nd, 2004.

Transactions

Address two issues:

• Access by multiple users– Remember the “client-server” architecture:

one server with many clients

• Protection against crashes

Page 54: Phase 2, Answering queries using views. February 2 nd, 2004.

Multiple users: single statements

Client 1:UPDATE ProductSET Price = Price – 1.99WHERE pname = ‘Gizmo’

Client 2:UPDATE ProductSET Price = Price*0.5WHERE pname=‘Gizmo’

Client 1:UPDATE ProductSET Price = Price – 1.99WHERE pname = ‘Gizmo’

Client 2:UPDATE ProductSET Price = Price*0.5WHERE pname=‘Gizmo’

Two managers attempt to do a discount.Will it work ?

Page 55: Phase 2, Answering queries using views. February 2 nd, 2004.

Multiple users: multiple statements

Client 1: INSERT INTO SmallProduct(name, price)SELECT pname, priceFROM ProductWHERE price <= 0.99

DELETE ProductWHERE price <=0.99

Client 2: SELECT count(*)FROM Product

SELECT count(*)FROM SmallProduct

Client 1: INSERT INTO SmallProduct(name, price)SELECT pname, priceFROM ProductWHERE price <= 0.99

DELETE ProductWHERE price <=0.99

Client 2: SELECT count(*)FROM Product

SELECT count(*)FROM SmallProduct

What’s wrong ?

Page 56: Phase 2, Answering queries using views. February 2 nd, 2004.

Protection against crashes

Client 1:INSERT INTO SmallProduct(name, price)

SELECT pname, priceFROM ProductWHERE price <= 0.99

DELETE ProductWHERE price <=0.99

Client 1:INSERT INTO SmallProduct(name, price)

SELECT pname, priceFROM ProductWHERE price <= 0.99

DELETE ProductWHERE price <=0.99

What’s wrong ?

Crash !

Page 57: Phase 2, Answering queries using views. February 2 nd, 2004.

Transactions

• Transaction = group of statements that must be executed atomically

• Transaction properties: ACID– ATOMICITY = all or nothing– CONSISTENCY = leave database in consistent state– ISOLATION = as if it were the only transaction in the

system– DURABILITY = store on disk !

Page 58: Phase 2, Answering queries using views. February 2 nd, 2004.

Transactions in SQL

• In “ad-hoc” SQL:– Default: each statement = one transaction

• In “embedded” SQL:BEGIN TRANSACTION

[SQL statements]

COMMIT or ROLLBACK (=ABORT)

Page 59: Phase 2, Answering queries using views. February 2 nd, 2004.

Transactions: Serializability

Serializability = the technical term for isolation

• An execution is serial if it is completely before or completely after any other function’s execution

• An execution is serializable if it equivalent to one that is serial

• DBMS can offer serializability guarantees

Page 60: Phase 2, Answering queries using views. February 2 nd, 2004.

Serializability

• Enforced with locks, like in Operating Systems !• But this is not enough:

LOCK A[write A=1]UNLOCK A. . .. . .. . .. . .LOCK B[write B=2]UNLOCK B

LOCK A[write A=1]UNLOCK A. . .. . .. . .. . .LOCK B[write B=2]UNLOCK B

LOCK A[write A=3]UNLOCK ALOCK B[write B=4]UNLOCK B

LOCK A[write A=3]UNLOCK ALOCK B[write B=4]UNLOCK B

User 1 User 2

What is wrong ?

time

Page 61: Phase 2, Answering queries using views. February 2 nd, 2004.

Serializability

• Solution: two-phase locking– Lock everything at the beginning– Unlock everything at the end

• Read locks: many simultaneous read locks allowed

• Write locks: only one write lock allowed• Insert locks: one per table

Page 62: Phase 2, Answering queries using views. February 2 nd, 2004.

Isolation Levels in SQL

1. “Dirty reads”SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED

2. “Committed reads”SET TRANSACTION ISOLATION LEVEL READ COMMITTED

3. “Repeatable reads”SET TRANSACTION ISOLATION LEVEL REPEATABLE READ

4. Serializable transactions (default):SET TRANSACTION ISOLATION LEVEL SERIALIZABLE

Reading assignment: chapter 8.6