CPSC 534A – Background

100
CPSC 534A – Background Rachel Pottinger January 13 and 18, 2005

description

CPSC 534A – Background. Rachel Pottinger January 13 and 18, 2005. Administrative notes. Please note you’re supposed to sign up for one paper presentation and one discussion… for different papers Please sign up for the mailing list WebCT has been populated – make sure you can access it - PowerPoint PPT Presentation

Transcript of CPSC 534A – Background

Page 1: CPSC 534A – Background

CPSC 534A – Background

Rachel Pottinger

January 13 and 18, 2005

Page 2: CPSC 534A – Background

Administrative notesPlease note you’re supposed to sign up for one paper presentation and one discussion… for different papers

Please sign up for the mailing list

WebCT has been populated – make sure you can access it

HW 1 is on the web, due beginning of class a week from today

Page 3: CPSC 534A – Background

Overview of the next two classes

Relational databasesEntity Relationship (ER) diagramsObject Oriented Databases (OODBs)XMLOther data typesDatabase internals (Briefly)An extremely brief introduction to category theory

Metadata management examples are interspersed

Page 4: CPSC 534A – Background

Relational Database BasicsWhat’s in a relational database?

Relational Algebra

SQL

Datalog

Page 5: CPSC 534A – Background

Relational Data Representation

PName Price Category Manufacturer

gizmo $19.99 gadgets GizmoWorks

Power gizmo $29.99 gadgets GizmoWorks

SingleTouch $149.99 photography Canon

MultiTouch $203.99 household Hitachi

Tuples or rows

Attribute names or columns

Relation or table

Page 6: CPSC 534A – Background

Relational schema representationEvery attribute has an atomic type (e.g., Char, integer)Relation Schema: Column headings:relation name + attribute names + attribute types

Product(VarChar PName, real Price, VarChar Category, VarChar Manfacturer)often types are left off:Product(PName, Price, Category, Manfacturer)Relation instance: The values in a table.Database Schema: a set of relation schemas in the database.Database instance: a relation instance for every relation in the schema.

Page 7: CPSC 534A – Background

Querying – Relational AlgebraSelect ()- chose tuples from a relation

Project ()- chose attributes from relation

Join (⋈) - allows combining of 2 relations

Set-difference ( ) Tuples in relation 1, but not in relation 2.

Union ( )

Cartesian Product (×) Each tuple of R1 with each tuple in R2

Page 8: CPSC 534A – Background

Find products where the manufacturer is GizmoWorks

PName Price Category Manufacturer

Gizmo $19.99 Gadgets GizmoWorks

Powergizmo $29.99 Gadgets GizmoWorks

SingleTouch $149.99 Photography Canon

MultiTouch $203.99 Household Hitachi

Product

PName Price Category Manufacturer

Gizmo $19.99 Gadgets GizmoWorks

Powergizmo $29.99 Gadgets GizmoWorks

Selection: σManufacturer = GizmoWorksProduct

Page 9: CPSC 534A – Background

Find the Name, Price, and Manufacturers of products whose price is greater than 100

PName Price Category Manufacturer

Gizmo $19.99 Gadgets GizmoWorks

Powergizmo $29.99 Gadgets GizmoWorks

SingleTouch $149.99 Photography Canon

MultiTouch $203.99 Household Hitachi

Product

PName Price Manufacturer

SingleTouch $149.99 Canon

MultiTouch $203.99 Hitachi

Selection + Projection:πName, Price, Manufacturer (σPrice > 100Product)

Page 10: CPSC 534A – Background

Find the product names and price of products that cost less than $200 and have manufacturers where there is a Company that has a CName that matches the manufacturer, and its country is Japan

PName Price Category Manufacturer

Gizmo $19.99 Gadgets GizmoWorks

Powergizmo $29.99 Gadgets GizmoWorks

SingleTouch $149.99 Photography Canon

MultiTouch $203.99 Household Hitachi

Product Company

Cname StockPrice Country

GizmoWorks 25 USA

Canon 65 Japan

Hitachi 15 Japan

PName Price

SingleTouch $149.99

πPName, Price((σPrice < 200Product) ⋈ Manufacturer =

Cname (σCountry = ‘Japan’Company))

Page 11: CPSC 534A – Background

When are two relations related?

You guess they are

I tell you soConstraints say so

A key is a set of attributes whose values are unique; we underline a key

Product(PName, Price, Category, Manfacturer)

Foreign keys are a method for schema designers to tell you so

A foreign key states that an attribute is a reference to the key of another relationex: Product.Manufacturer is foreign key of Company

Gives information and enforces constraint

Page 12: CPSC 534A – Background

SQLData Manipulation Language (DML)

Query one or more tablesInsert/delete/modify tuples in tables

Data Definition Language (DDL)Create/alter/delete tables and their attributes

Transact-SQLIdea: package a sequence of SQL statements server

Page 13: CPSC 534A – Background

Querying – SQL

Standard language for querying and manipulating data

Structured Query Language

Many standards out there: • ANSI SQL• SQL92 (a.k.a. SQL2)• SQL99 (a.k.a. SQL3)• Vendors support various subsets of these• What we discuss is common to all of them

Page 14: CPSC 534A – Background

SQL basicsBasic form: (many many more bells and whistles in addition)

Select attributes

From relations (possibly multiple, joined)

Where conditions (selections)

Page 15: CPSC 534A – Background

SQL – Selections SELECT * FROM Company WHERE country=“Canada” AND stockPrice > 50

Some things allowed in the WHERE clause: attribute names of the relation(s) used in the FROM. comparison operators: =, <>, <, >, <=, >= apply arithmetic operations: stockPrice*2 operations on strings (e.g., “||” for concatenation). Lexicographic order on strings. Pattern matching: s LIKE p Special stuff for comparing dates and times.

Rachel Pottinger
Look for "USA" and "Seattle", not to mention SSN, and zip code
Page 16: CPSC 534A – Background

SQL – Projections

SELECT name AS company, stockPrice AS price FROM Company WHERE country=“Canada” AND stockPrice > 50

SELECT name, stock price FROM Company WHERE country=“Canada” AND stockPrice > 50

Select only a subset of the attributes

Rename the attributes in the resulting table

Page 17: CPSC 534A – Background

SQL – Joins

SELECT name, store FROM Person, Purchase WHERE name=buyer AND city=“Vancouver” AND product=“gizmo”

Product ( name, price, category, maker)Purchase (buyer, seller, store, product)Company (name, stock price, country)Person( name, phone number, city)

Page 18: CPSC 534A – Background

Selection: σManufacturer = GizmoWorks(Product)

PName Price Category Manufacturer

Gizmo $19.99 Gadgets GizmoWorks

Powergizmo $29.99 Gadgets GizmoWorks

SingleTouch $149.99 Photography Canon

MultiTouch $203.99 Household Hitachi

Product

PName Price Category Manufacturer

Gizmo $19.99 Gadgets GizmoWorks

Powergizmo $29.99 Gadgets GizmoWorks

What’s the SQL?

Page 19: CPSC 534A – Background

Selection + Projection:πName, Price, Manufacturer (σPrice > 100Product)

PName Price Category Manufacturer

Gizmo $19.99 Gadgets GizmoWorks

Powergizmo $29.99 Gadgets GizmoWorks

SingleTouch $149.99 Photography Canon

MultiTouch $203.99 Household Hitachi

Product

PName Price Manufacturer

SingleTouch $149.99 Canon

MultiTouch $203.99 Hitachi

What’s the SQL?

Page 20: CPSC 534A – Background

π PName, Price((σPrice <= 200Product)⋈

Manufacturer = Cname (σCountry = ‘Japan’Company))

PName Price Category Manufacturer

Gizmo $19.99 Gadgets GizmoWorks

Powergizmo $29.99 Gadgets GizmoWorks

SingleTouch $149.99 Photography Canon

MultiTouch $203.99 Household Hitachi

Product Company

Cname StockPrice Country

GizmoWorks 25 USA

Canon 65 Japan

Hitachi 15 Japan

PName Price

SingleTouch $149.99

What’s the SQL?

Page 21: CPSC 534A – Background

More SQL – Outer JoinsWhat happens if there’s no value available?

PName Country

Gizmo USA

Powergizmo USA

SingleTouch Japan

MultiTouch Japan

PName Price Category Manufacturer

Gizmo $19.99 Gadgets GizmoWorks

Powergizmo $29.99 Gadgets GizmoWorks

SingleTouch $149.99 Photography Canon

MultiTouch $203.99 Household Hitachi

Product Company

Cname StockPrice Country

GizmoWorks 25 USA

Canon 65 Japan

Hitachi 15 Japan

Select pname, Country

From Product, Company

Where Manufacturer = Cname

Foo $1.99 Gadgets Bar

Select pname, Country

From Product outer join Company

on Manufacturer = CnameFoo NULL

Page 22: CPSC 534A – Background

Querying – DatalogEnables expressing recursive queries

More convenient for analysis

Some people find it easier to understand

Without recursion but with negation it is equivalent in power to relational algebra and SQL

Limited version of Prolog (no functions)

Page 23: CPSC 534A – Background

Datalog Rules and Queries

A datalog rule has the following form:

head :- atom1, atom2, …, atom,…You can read this as then :- if ...

ExpensiveProduct(N) :- Product(N,M,P) & P > $100

CanadianProduct(N) :- Product(N,M,P) & Company(M, “Canada”, SP)

IntlProd(N) :- Product(N,M,P) & NOT Company(M, “Canada”, SP)

Distinguished variable Existential variables

Subgoal or EDB

Negated subgoal - also denoted by ¬

constant

Arithmetic comparison or interpreted predicate

Head or IDB

Page 24: CPSC 534A – Background

Conjunctive QueriesA subset of Datalog

Only relations appear in the right hand side of rules

No negation

Functionally equivalent to Select, Project, Join queries

Very popular in modeling relationships between databases

Page 25: CPSC 534A – Background

Selection: σManufacturer = GizmoWorks(Product)

PName Price Category Manufacturer

Gizmo $19.99 Gadgets GizmoWorks

Powergizmo $29.99 Gadgets GizmoWorks

SingleTouch $149.99 Photography Canon

MultiTouch $203.99 Household Hitachi

Product

PName Price Category Manufacturer

Gizmo $19.99 Gadgets GizmoWorks

Powergizmo $29.99 Gadgets GizmoWorks

What’s the Datalog?

Page 26: CPSC 534A – Background

Selection + Projection:πName, Price, Manufacturer (σPrice > 100Product)

PName Price Category Manufacturer

Gizmo $19.99 Gadgets GizmoWorks

Powergizmo $29.99 Gadgets GizmoWorks

SingleTouch $149.99 Photography Canon

MultiTouch $203.99 Household Hitachi

Product

PName Price Manufacturer

SingleTouch $149.99 Canon

MultiTouch $203.99 Hitachi

What’s the Datalog?

Page 27: CPSC 534A – Background

πPname,Price((σPrice <= 200Product)⋈ Manufacturer =

Cname (σCountry = ‘Japan’Company))

PName Price Category Manufacturer

Gizmo $19.99 Gadgets GizmoWorks

Powergizmo $29.99 Gadgets GizmoWorks

SingleTouch $149.99 Photography Canon

MultiTouch $203.99 Household Hitachi

Product Company

Cname StockPrice Country

GizmoWorks 25 USA

Canon 65 Japan

Hitachi 15 Japan

PName Price

SingleTouch $149.99

What’s the Datalog?

Page 28: CPSC 534A – Background

Bonus Relational Goodness: ViewsViews are relations, except that they are not physically

stored. (Materialized views are stored)They are used mostly in order to simplify complex queries

and to define conceptually different views of the database to different classes of users.

Used also to model relationships between databasesView: purchases of telephony products:

CREATE VIEW telephony-purchases AS SELECT product, buyer, seller, store FROM Purchase, Product WHERE Purchase.product = Product.name AND Product.category = “telephony”

Page 29: CPSC 534A – Background

Summarizing Relational DBsRelational perspective: Data is stored in relations. Relations have attributes. Data instances are tuples.SQL perspective: Data is stored in tables. Tables have columns. Data instances are rows.Query languages

Relational algebra – mathematical base for understanding query languagesSQL – very widely usedDatalog – based on Prolog, very popular with theoreticians

Views allow complex queries to be written simply

Page 30: CPSC 534A – Background

Relational Metadata problems

Page 31: CPSC 534A – Background

FodorswundergroundAAA Expedia

Data Integration:Planning a Beach Vacation

Beach Good Weather

Cheap Flight

weather.com Orbitz

Page 32: CPSC 534A – Background

Data Integration System Architecture

Local Database 1

Local Database N

Mediated Schema

Local Schema 1 Local Schema N

Orbitz Expedia

Virtual database

User Query

“Airport”

Page 33: CPSC 534A – Background

Data Translation

Data exists in two different schemas. You have data in one, and you want to put data into the other

How are the schemas related to one another?

How do you change the data from one to another?

Page 34: CPSC 534A – Background

Data WarehousingData Warehouses store vast quantities of

data for fast query processing, but only batch updating.

Import schemas of data sources

Identify overlapping attributes, etc.

Build data cleaning scripts

Build data transformation scripts

Enable data lineage tracing

Page 35: CPSC 534A – Background

Schema Evolution and Data Migration

Schemas change over time; data must change with it.

How do we deal with schema changes?

How can we make it easy for the data to migrate

How do we handle applications built on the old schema that store in the new database?

Page 36: CPSC 534A – Background

OutlineRelational databases

Entity Relationship (ER) diagrams

Object Oriented Databases (OODBs)

XML

Other data types

Database internals (Briefly)

An extremely brief introduction to category theory

Page 37: CPSC 534A – Background

Entity / Relationship Diagrams

Entities

Attributes

Relationships between entities

Product

address

buys

Page 38: CPSC 534A – Background

Keys in E/R DiagramsEvery entity set must have a key

Product

name category

price

Page 39: CPSC 534A – Background

address name sin

Person

buys

makes

employs

CompanyProduct

name category

stockprice

name

price

Page 40: CPSC 534A – Background

Multiplicity of E/R Relations

one-one:

many-one

many-many

123

abcd

123

abcd

123

abcd

Page 41: CPSC 534A – Background

address name sin

Person

buys

makes

employs

CompanyProduct

name category

stockprice

name

price

What doesthis say ?

Page 42: CPSC 534A – Background

Roles in Relationships

Purchase

What if we need an entity set twice in one relationship?

Product

Person

Store

salesperson buyer

Page 43: CPSC 534A – Background

Attributes on Relationships

Purchase

Product

Person

Store

date

Page 44: CPSC 534A – Background

Product

name category

price

isa isa

Educational ProductSoftware Product

Age Groupplatforms

Subclasses in E/R Diagrams

Page 45: CPSC 534A – Background

Keys in E/R Diagrams

address name SIN

Person

Product

name category

price

No formal way to specify multiple keys in E/R diagrams

Underline:

Page 46: CPSC 534A – Background

From E/R Diagramsto Relational Schema

Entity set relation

Relationship relation

Page 47: CPSC 534A – Background

Entity Set to Relation

Product

name category

price

Product(name, category, price)

name category price

gizmo gadgets $19.99

Page 48: CPSC 534A – Background

Relationships to Relations

makes CompanyProduct

name category

Stock price

name

Makes(product-name, product-category, company-name, year) Product-name Product-Category Company-name Starting-year

gizmo gadgets gizmoWorks 1963

Start Yearprice

(watch out for attribute name conflicts)

Page 49: CPSC 534A – Background

Relationships to Relations

makes CompanyProduct

name category

Stock price

name

No need for Makes. Modify Product:

name category price StartYear companyName

gizmo gadgets 19.99 1963 gizmoWorks

Start Yearprice

Page 50: CPSC 534A – Background

Multi-way Relationships to Relations

Purchase

Product

Person

Storename price

sin name

name address

Purchase( , , )

Page 51: CPSC 534A – Background

Summarizing ER diagramsEntities, relationships, and attributes

Also has inheritance

Used to design schemas, then relational derived from it

Page 52: CPSC 534A – Background

Metadata problems:Mapping ER to Relational

Database designMap ER model to SQL schema

Reverse engineer SQL schema to ER model

Page 53: CPSC 534A – Background

Metadata problems:Round Trip Engineering

Design in ER

Implement in relational

Modify the relational schema

How do we change the ER diagram?

Page 54: CPSC 534A – Background

View integration

Define use-case scenario

Identify views for each use-case

Integrate views into a conceptual schema

Page 55: CPSC 534A – Background

CPSC 534a

Background: Part 2

Rachel Pottinger

January 18, 2005

Page 56: CPSC 534A – Background

Administrative notesPlease sign up for papers if you haven’t already (if there’s enough time, we’ll do this at the end of class)Remember that the first reading responses are due 9pm Wednesday

Mail me if you can’t access WebCT

Remember the 1st homework is due beginning of class Thursday

General theory – trying to make sure you understand basics and have thought about it – not looking for one, true, answer.State any assumptions you makeIf you can’t figure out a detail on how to transform ER to relational based on class discussion, write an explanation as to what you did and why.Any other questions?

Office hours?

Page 57: CPSC 534A – Background

OutlineRelational databases

Entity Relationship (ER) diagrams

Object Oriented Databases (OODBs)

XML

Other data types

Database internals (Briefly)

An extremely brief introduction to category theory

Page 58: CPSC 534A – Background

Object-Oriented DBMS’sStarted late 80’s

Main idea:Toss the relational model !

Use the OO model – e.g. C++ classes

Standards group: ODMG = Object Data Management Group.

OQL = Object Query Language, tries to imitate SQL in an OO framework.

Page 59: CPSC 534A – Background

The OO PlanODMG imagines OO-DBMS vendors

implementing an OO language like C++ with extensions (OQL) that allow the programmer to transfer data between the database and “host language” seamlessly.

A brief diversion: the impedance mismatch

Page 60: CPSC 534A – Background

OO Implementation OptionsBuild a new database from scratch (O2)

Elegant extension of SQLLater adopted by ODMG in the OQL languageUsed to help build XML query languages

Make a programming language persistent (ObjectStore)

No query languageNiche marketObjectStore is still around, renamed to Exelon, stores XML objects now

Page 61: CPSC 534A – Background

ODLODL is used to define persistent classes, those whose objects may be stored permanently in the database.

ODL classes look like Entity sets with binary relationships, plus methods.

ODL class definitions are part of the extended, OO host language.

Page 62: CPSC 534A – Background

ODL – remind you of anything?

interface Student extends Person (extent Students) { attribute string major; relationship Set<Course> takes inverse stds;}

interface Student extends Person (extent Students) { attribute string major; relationship Set<Course> takes inverse stds;}

interface Person (extent People key sin) { attribute string sin; attribute string dept; attribute string name;}

interface Person (extent People key sin) { attribute string sin; attribute string dept; attribute string name;}

interface Course (extent Crs key cid) { attribute string cid; attribute string cname; relationship Person instructor; relationship Set<Student> stds inverse takes;}

interface Course (extent Crs key cid) { attribute string cid; attribute string cname; relationship Person instructor; relationship Set<Student> stds inverse takes;}

Page 63: CPSC 534A – Background

Why did OO Fail?Why are relational databases so popular?

Very simple abstraction; don’t have to think about programming when storing data.

Very well optimized

Relational db are very well entrenched – not enough advantages, and no good exit strategy…

Metadata failure

Page 64: CPSC 534A – Background

Merging Relational and OODBsObject-oriented models support interesting data types – not just flat files.

Maps, multimedia, etc.

The relational model supports very-high-level queries.Object-relational databases are an attempt to get the best of both.All major commercial DBs today have OR versions – full spec in SQL99, but your mileage may vary.

Page 65: CPSC 534A – Background

OutlineRelational databases

Entity Relationship (ER) diagrams

Object Oriented Databases (OODBs)

XML

Other data types

Database internals (Briefly)

An extremely brief introduction to category theory

Page 66: CPSC 534A – Background

XMLeXtensible Markup Language

XML 1.0 – a recommendation from W3C, 1998

Roots: SGML (from document community - works great for them; from db perspective, very nasty).

After the roots: a format for sharing data

Page 67: CPSC 534A – Background

Why XML is of Interest to UsXML is just syntax for data

Note: we have no syntax for relational dataBut XML is not relational: semistructured

This is exciting because:Can translate any data to XMLCan ship XML over the Web (HTTP)Can input XML into any applicationThus: data sharing and exchange on the Web

Page 68: CPSC 534A – Background

XML Data Sharing and Exchange

application

relational data

Transform

Integrate

Warehouse

XML Data WEB (HTTP)

application

application

legacy data

object-relational

Think of all the metadata problems!

Page 69: CPSC 534A – Background

From HTML to XML

HTML describes the presentation

Page 70: CPSC 534A – Background

HTML

<h1> Bibliography </h1>

<p> <i> Foundations of Databases </i>

Abiteboul, Hull, Vianu

<br> Addison Wesley, 1995

<p> <i> Data on the Web </i>

Abiteoul, Buneman, Suciu

<br> Morgan Kaufmann, 1999

<h1> Bibliography </h1>

<p> <i> Foundations of Databases </i>

Abiteboul, Hull, Vianu

<br> Addison Wesley, 1995

<p> <i> Data on the Web </i>

Abiteoul, Buneman, Suciu

<br> Morgan Kaufmann, 1999

Page 71: CPSC 534A – Background

XML<bibliography>

<book> <title> Foundations… </title>

<author> Abiteboul </author>

<author> Hull </author>

<author> Vianu </author>

<publisher> Addison Wesley </publisher>

<year> 1995 </year>

</book>

</bibliography>

<bibliography>

<book> <title> Foundations… </title>

<author> Abiteboul </author>

<author> Hull </author>

<author> Vianu </author>

<publisher> Addison Wesley </publisher>

<year> 1995 </year>

</book>

</bibliography>

XML describes the content

Page 72: CPSC 534A – Background

XML Document

<data>

<person id=“o555” >

<name> Mary </name>

<address>

<street> Maple </street>

<no> 345 </no>

<city> Seattle </city>

</address>

</person>

<person>

<name> John </name>

<address> Thailand </address>

<phone> 23456 </phone>

<married/>

</person>

</data>

<data>

<person id=“o555” >

<name> Mary </name>

<address>

<street> Maple </street>

<no> 345 </no>

<city> Seattle </city>

</address>

</person>

<person>

<name> John </name>

<address> Thailand </address>

<phone> 23456 </phone>

<married/>

</person>

</data>

person elements

name elements

attributes

Page 73: CPSC 534A – Background

XML TerminologyElements

enclosed within tags: <person> … </person>

nested within other elements: <person> <address> … </address> </person>

can be empty <married></married> abbreviated as <married/>

can have Attributes<person id=“0005”> … </person>

XML document has as single ROOT element

Page 74: CPSC 534A – Background

BuzzwordsWhat is XML?

W3C data exchange format

Hierarchical data model

Self-describing

Semi-structured

Page 75: CPSC 534A – Background

XML as a Tree !!

<data>

<person id=“o555” >

<name> Mary </name>

<address>

<street> Maple </street>

<no> 345 </no>

<city> Seattle </city>

</address>

</person>

<person>

<name> John </name>

<address> Thailand </address>

<phone> 23456 </phone>

</person>

</data>

<data>

<person id=“o555” >

<name> Mary </name>

<address>

<street> Maple </street>

<no> 345 </no>

<city> Seattle </city>

</address>

</person>

<person>

<name> John </name>

<address> Thailand </address>

<phone> 23456 </phone>

</person>

</data>

data

personperson

Mary

name address

street no city

Maple 345 Seattle

nameaddress

John Thai

phone

23456

id

o555

Elementnode

Textnode

Attributenode

Minor Detail: Order matters !!!

Page 76: CPSC 534A – Background

XML is self-describingSchema elements become part of the data

In XML <persons>, <name>, <phone> are part of the data, and are repeated many times

Relational schema: persons(name,phone) defined separately for the data and is fixed

Consequence: XML is much more flexible

Page 77: CPSC 534A – Background

Relational Data as XML

<persons><person> <name>John</name> <phone> 3634</phone> </person> <person> <name>Sue</name> <phone> 6343</phone> </person> <person> <name>Dick</name> <phone> 6363</phone> </person>

</persons>

<persons><person> <name>John</name> <phone> 3634</phone> </person> <person> <name>Sue</name> <phone> 6343</phone> </person> <person> <name>Dick</name> <phone> 6363</phone> </person>

</persons>

n a m e p h o n e

J o h n 3 6 3 4

S u e 6 3 4 3

D i c k 6 3 6 3

personperson person person

name name namephone phone phone

“John” 3634 “Sue” “Dick”6343 6363

personsXML:

Page 78: CPSC 534A – Background

XML is semi-structured

Missing elements:

Could represent in a table with nulls

<person> <name> John</name> <phone>1234</phone> </person>

<person> <name>Joe</name></person>

<person> <name> John</name> <phone>1234</phone> </person>

<person> <name>Joe</name></person> no phone !

name phone

John 1234

Joe -

Page 79: CPSC 534A – Background

XML is semi-structuredRepeated elements

Impossible in tables:

<person> <name> Mary</name> <phone>2345</phone> <phone>3456</phone></person>

<person> <name> Mary</name> <phone>2345</phone> <phone>3456</phone></person>

two phones !

name phone

Mary 2345 3456 ???

Page 80: CPSC 534A – Background

XML is semi-structuredElements with different types in different objects

Heterogeneous collections:<persons> can contain both <person>s and <customer>s

<person> <name> <first> John </first> <last> Smith </last> </name> <phone>1234</phone></person>

<person> <name> <first> John </first> <last> Smith </last> </name> <phone>1234</phone></person>

structured name !

Page 81: CPSC 534A – Background

Summarizing XMLXML has first class elements and second class attributes

XML is semi-structured

XML is nested

XML is a tree

XML is a huge buzzword

Will XML replace relational databases?

Page 82: CPSC 534A – Background

OutlineRelational databases

Entity Relationship (ER) diagrams

Object Oriented Databases (OODBs)

XML

Other data types

Database internals (Briefly)

An extremely brief introduction to category theory

Page 83: CPSC 534A – Background

Other data formatsMakefiles

Forms

Application code

Page 84: CPSC 534A – Background

Other Metadata ApplicationsMessage Mapping

Map messages from one format to anotherScientific data management

Merge schemas from related experimentsManage transformations of experimental dataTrack evolution of schemas and transformations

DB Application developmentMap SQL schema to default formMap business rule to SQL constraints and form validation codeManage dependencies between code and schemas and forms

Page 85: CPSC 534A – Background

OutlineRelational databases

Entity Relationship (ER) diagrams

Object Oriented Databases (OODBs)

XML

Other data types

Database internals (Briefly)

An extremely brief introduction to category theory

Page 86: CPSC 534A – Background

How SQL Gets Executed: Query Execution PlansSelect Pname, Price

From Product, Company

Where Manufacturer = Cname AND Price <= 200 AND Country = ‘Japan’

Product Company

⋈Manufacturer = Cname

σPrice < 200 ^ Country = ‘Japan’

πPname, Price

Query optimization also specifies the algorithms for each operator; then queries can be executed

Page 87: CPSC 534A – Background

Overview of Query Optimization

Plan: Tree of ordered Relational Algebra operators and choice of algorithm for each operator

Two main issues:For a given query, what plans are considered?

Algorithm to search plan space for cheapest (estimated) plan.

How is the cost of a plan estimated?

Ideally: Want to find best plan. Practically: Avoid worst plans.

Some tacticsDo selections early

Use materialized views

Page 88: CPSC 534A – Background

Query ExecutionNow that we have the plan, what do we do with it?

How do deal with paging in data, etc.

New research covers new paradigms where interleaved with optimization

Page 89: CPSC 534A – Background

TransactionsAddress two issues:

Access by multiple usersRemember the “client-server” architecture: one server with many clients

Protection against crashes

Page 90: CPSC 534A – Background

TransactionsTransaction = group of statements that must be executed atomically

Transaction properties: ACIDATOMICITY = all or nothingCONSISTENCY = leave database in consistent stateISOLATION = as if it were the only transaction in the systemDURABILITY = store on disk !

Page 91: CPSC 534A – Background

Transactions in SQLIn “ad-hoc” SQL:

Default: each statement = one transaction

In “embedded” SQL:BEGIN TRANSACTION

[SQL statements]

COMMIT or ROLLBACK (=ABORT)

Page 92: CPSC 534A – Background

Transactions: SerializabilitySerializability = the technical term for

isolationAn execution is serial if it is completely before or completely after any other function’s executionAn execution is serializable if it equivalent to one that is serialDBMS can offer serializability guarantees

Page 93: CPSC 534A – Background

SerializabilityEnforced with locks, like in Operating Systems !

But this is not enough:

LOCK A[write A=1]UNLOCK A. . .. . .. . .. . .LOCK B[write B=2]UNLOCK B

LOCK A[write A=1]UNLOCK A. . .. . .. . .. . .LOCK B[write B=2]UNLOCK B

LOCK A[write A=3]UNLOCK ALOCK B[write B=4]UNLOCK B

LOCK A[write A=3]UNLOCK ALOCK B[write B=4]UNLOCK B

User 1 User 2

What is wrong ?

time

Page 94: CPSC 534A – Background

OutlineRelational databases

Entity Relationship (ER) diagrams

Object Oriented Databases (OODBs)

XML

Other data types

Database internals (Briefly)

An extremely brief introduction to category theory

Page 95: CPSC 534A – Background

Category Theory“Category theory is a mathematical theory

that deals in an abstract way with mathematical structures and relationships between them. It is half-jokingly known as ‘generalized abstract nonsense’.” [wikipedia]

There is a lot of scary category theory out there. You only need to know a few terms.

Page 96: CPSC 534A – Background

Category TheoryStarted in 1945

General mathematical theory of structures and systems of structures.

Reveals how structures of different kinds are related to one another, as well as the universal components of a family of structures of a given kind.

It is considered by many as being an alternative to set theory as a foundation for mathematics.

It is very, very, very abstract

Page 97: CPSC 534A – Background

Category Theory Definitions

C is a graph.There are two classes: the objects or nodes obj(C) and the morphisms or edges or arrows mor(C).Any morphism f mor(C) has a source and target object f : a b.For any composable pair of morphisms f : a b and g : b c, there is a composition morphism (g • f) : a c.

A functor translates objects and morphisms from one category to anotherDiagrams commute if one can follow any path through the diagram and obtain the same result by composition

f

g

a b

cg • f

a b

c d

Page 98: CPSC 534A – Background

Why you need a bit of category theory

Lots of people like to use the term “morphism”

It’s motivation behind a number of views – understanding this can make reading papers easier

If you’re theoretically minded, it can give you a good way to think about the problem

Page 99: CPSC 534A – Background

Overall background recapThere are many different data models. We covered:

Relational DatabasesEntity-Relationship DiagramsObject Oriented DatabasesXML

Changing around schemas within data models creates metadata problems. So does changing schemas between data modelsDatabases have some (largely hidden) internal processes; some of these will be related to in other papers we’ve readTheory can be handy to ground your reading.

Page 100: CPSC 534A – Background

Now what?Time to read papers

Prepare paper responses – it’ll help you focus on the paper, and allow for the discussion leader to prepare better discussion

You all have different backgrounds, interests, and insights. Bring them into class!