Exam II Syllabus uStorage & Buffer Management uIndexing: Btrees & Hash uMulti-dimensional Indexing...

53
Exam II Syllabus Storage & Buffer Management Indexing: Btrees & Hash Multi-dimensional Indexing Query processing (relational ops) Query optimization 1

Transcript of Exam II Syllabus uStorage & Buffer Management uIndexing: Btrees & Hash uMulti-dimensional Indexing...

Page 1: Exam II Syllabus uStorage & Buffer Management uIndexing: Btrees & Hash uMulti-dimensional Indexing uQuery processing (relational ops) uQuery optimization.

Exam II Syllabus

Storage & Buffer Management Indexing: Btrees & Hash Multi-dimensional Indexing Query processing (relational ops) Query optimization

1

Page 2: Exam II Syllabus uStorage & Buffer Management uIndexing: Btrees & Hash uMulti-dimensional Indexing uQuery processing (relational ops) uQuery optimization.

2

XML: Semi-structured Data Model

Document Type DefinitionsXML Schema

Page 3: Exam II Syllabus uStorage & Buffer Management uIndexing: Btrees & Hash uMulti-dimensional Indexing uQuery processing (relational ops) uQuery optimization.

3

Semistructured Data

A data model based on trees, as opposed to the relational model (based on tables)

Motivation: flexible representation of data.

Motivation: sharing of documents among systems and databases.

Page 4: Exam II Syllabus uStorage & Buffer Management uIndexing: Btrees & Hash uMulti-dimensional Indexing uQuery processing (relational ops) uQuery optimization.

4

Graphs of Semistructured Data

Nodes = objects. Labels on arcs (like attribute names). Atomic values at leaf nodes (nodes

with no arcs out). Flexibility: There is no restriction on

Labels out of a node. Number of successors with a given label.

Page 5: Exam II Syllabus uStorage & Buffer Management uIndexing: Btrees & Hash uMulti-dimensional Indexing uQuery processing (relational ops) uQuery optimization.

5

Example: Data Graph

Bud

A.B.

Gold1995

MapleJoe’s

M’lob

beer beerbar

manfmanf

servedAt

name

namename

addr

prize

year award

root

The bar objectfor Joe’s Bar

The beer objectfor Bud

Notice anew kindof data.

Page 6: Exam II Syllabus uStorage & Buffer Management uIndexing: Btrees & Hash uMulti-dimensional Indexing uQuery processing (relational ops) uQuery optimization.

6

XML: Semi-structured Data

XML = Extensible Markup Language. Captures the same information as the

semi-structured data graph While HTML uses tags for formatting

(e.g., “italic”), XML uses tags for semantics (e.g., “this is an address”).

Key idea: create tag sets for a domain (e.g., genomics), and translate all data into properly tagged XML documents.

Page 7: Exam II Syllabus uStorage & Buffer Management uIndexing: Btrees & Hash uMulti-dimensional Indexing uQuery processing (relational ops) uQuery optimization.

7

Well-Formed and Valid XML

Well-Formed XML allows you to invent your own tags.

Valid XML conforms to a certain Document Type Definition (DTD): like a schema

Page 8: Exam II Syllabus uStorage & Buffer Management uIndexing: Btrees & Hash uMulti-dimensional Indexing uQuery processing (relational ops) uQuery optimization.

8

Well-Formed XML

Start the document with a declaration, surrounded by <?xml … ?> .

Normal declaration is:<?xml version = ”1.0” standalone = ”yes” ?> “standalone” = “no DTD provided.”

Balance of document is a root tag surrounding nested tags.

Page 9: Exam II Syllabus uStorage & Buffer Management uIndexing: Btrees & Hash uMulti-dimensional Indexing uQuery processing (relational ops) uQuery optimization.

9

Tags

Tags are normally matched pairs, as <FOO> … </FOO>.

Unmatched tags also allowed, as <FOO/>

Tags may be nested arbitrarily. XML tags are case-sensitive.

Page 10: Exam II Syllabus uStorage & Buffer Management uIndexing: Btrees & Hash uMulti-dimensional Indexing uQuery processing (relational ops) uQuery optimization.

10

Example: Well-Formed XML

<?xml version = “1.0” standalone = “yes” ?><BARS>

<BAR><NAME>Joe’s Bar</NAME><BEER><NAME>Bud</NAME>

<PRICE>2.50</PRICE></BEER><BEER><NAME>Miller</NAME>

<PRICE>3.00</PRICE></BEER></BAR><BAR> … </BAR> …

</BARS>

A NAMEsubelement

A BEERsubelement

Root tag

Tags surroundinga BEER element

Page 11: Exam II Syllabus uStorage & Buffer Management uIndexing: Btrees & Hash uMulti-dimensional Indexing uQuery processing (relational ops) uQuery optimization.

11

DTD’s (Document Type Definitions)

A grammatical notation for describing allowed use of tags: like a schema

Definition form:<!DOCTYPE <root tag> [<!ELEMENT <name>(<components>)>

. . . more elements . . .]>

Page 12: Exam II Syllabus uStorage & Buffer Management uIndexing: Btrees & Hash uMulti-dimensional Indexing uQuery processing (relational ops) uQuery optimization.

12

DTD Elements

The description of an element consists of its name (tag), and a parenthesized description of any nested tags. Includes order of subtags and their

multiplicity. Leaves (text elements) have

#PCDATA (Parsed Character DATA ) in place of nested tags.

Page 13: Exam II Syllabus uStorage & Buffer Management uIndexing: Btrees & Hash uMulti-dimensional Indexing uQuery processing (relational ops) uQuery optimization.

13

Example: DTD

<!DOCTYPE BARS [<!ELEMENT BARS (BAR*)><!ELEMENT BAR (NAME, BEER+)><!ELEMENT NAME (#PCDATA)><!ELEMENT BEER (NAME, PRICE)><!ELEMENT PRICE (#PCDATA)>

]>

A BARS object haszero or more BAR’snested within.

A BAR has oneNAME and oneor more BEERsubobjects.

A BEER has aNAME and aPRICE.

NAME and PRICEare text.

Page 14: Exam II Syllabus uStorage & Buffer Management uIndexing: Btrees & Hash uMulti-dimensional Indexing uQuery processing (relational ops) uQuery optimization.

14

Element Descriptions

Subtags must appear in order shown. A tag may be followed by a symbol

to indicate its multiplicity. * = zero or more. + = one or more. ? = zero or one.

Symbol | can connect alternative sequences of tags.

Page 15: Exam II Syllabus uStorage & Buffer Management uIndexing: Btrees & Hash uMulti-dimensional Indexing uQuery processing (relational ops) uQuery optimization.

15

Example: Element Description

A name is an optional title (e.g., “Prof.”), a first name, and a last name, in that order, or it is an IP address:

<!ELEMENT NAME (

(TITLE?, FIRST, LAST) | IPADDR

)>

Page 16: Exam II Syllabus uStorage & Buffer Management uIndexing: Btrees & Hash uMulti-dimensional Indexing uQuery processing (relational ops) uQuery optimization.

16

Use of DTD’s

1. Set standalone = “no”.2. Either:

a) Include the DTD as a preamble of the XML document, or

b) Follow DOCTYPE and the <root tag> by SYSTEM and a path to the file where the DTD can be found.

Page 17: Exam II Syllabus uStorage & Buffer Management uIndexing: Btrees & Hash uMulti-dimensional Indexing uQuery processing (relational ops) uQuery optimization.

17

Example: (a)<?xml version = “1.0” standalone = “no” ?><!DOCTYPE BARS [

<!ELEMENT BARS (BAR*)><!ELEMENT BAR (NAME, BEER+)><!ELEMENT NAME (#PCDATA)><!ELEMENT BEER (NAME, PRICE)><!ELEMENT PRICE (#PCDATA)>

]><BARS>

<BAR><NAME>Joe’s Bar</NAME><BEER><NAME>Bud</NAME> <PRICE>2.50</PRICE></BEER><BEER><NAME>Miller</NAME> <PRICE>3.00</PRICE></BEER>

</BAR> <BAR> …

</BARS>

The DTD

The document

Page 18: Exam II Syllabus uStorage & Buffer Management uIndexing: Btrees & Hash uMulti-dimensional Indexing uQuery processing (relational ops) uQuery optimization.

18

Example: (b)

Assume the BARS DTD is in file bar.dtd.<?xml version = “1.0” standalone = “no” ?><!DOCTYPE BARS SYSTEM ”bar.dtd”><BARS>

<BAR><NAME>Joe’s Bar</NAME><BEER><NAME>Bud</NAME>

<PRICE>2.50</PRICE></BEER><BEER><NAME>Miller</NAME>

<PRICE>3.00</PRICE></BEER></BAR><BAR> …

</BARS>

Get the DTDfrom the filebar.dtd

Page 19: Exam II Syllabus uStorage & Buffer Management uIndexing: Btrees & Hash uMulti-dimensional Indexing uQuery processing (relational ops) uQuery optimization.

19

Attributes

Like HTML, the opening tag in XML can have atttribute = value pairs.

Attributes also allow linking among elements (discussed later).

Page 20: Exam II Syllabus uStorage & Buffer Management uIndexing: Btrees & Hash uMulti-dimensional Indexing uQuery processing (relational ops) uQuery optimization.

20

Attributes

Opening tags in XML can have attributes.

In a DTD,<!ATTLIST E . . . >

declares attributes for element E, along with its datatype.

Page 21: Exam II Syllabus uStorage & Buffer Management uIndexing: Btrees & Hash uMulti-dimensional Indexing uQuery processing (relational ops) uQuery optimization.

21

Example: Attributes

Bars can have an attribute kind, a character string describing the bar.

<!ELEMENT BAR (NAME BEER*)>

<!ATTLIST BAR kind CDATA #IMPLIED>

Character stringtype; no tags

Attribute is optionalopposite: #REQUIRED

Page 22: Exam II Syllabus uStorage & Buffer Management uIndexing: Btrees & Hash uMulti-dimensional Indexing uQuery processing (relational ops) uQuery optimization.

22

Example: Attribute Use

In a document that allows BAR tags, we might see:

<BAR kind = ”sushi”>

<NAME>Homma’s</NAME>

<BEER><NAME>Sapporo</NAME>

<PRICE>5.00</PRICE></BEER>

...

</BAR>

Page 23: Exam II Syllabus uStorage & Buffer Management uIndexing: Btrees & Hash uMulti-dimensional Indexing uQuery processing (relational ops) uQuery optimization.

23

Bars, Using Attributes

<?xml version = “1.0” encoding = “utf-8” ?><BARS>

<BAR name = “Joe’s Bar”><BEER name = “Bud” price = 2.50 /><BEER name = “Miller” price =

3.00 /></BAR><BAR> …

</BARS>

Notice Beer elementshave only opening tagswith attributes.

name andprice areattributes

Page 24: Exam II Syllabus uStorage & Buffer Management uIndexing: Btrees & Hash uMulti-dimensional Indexing uQuery processing (relational ops) uQuery optimization.

24

Example: Attributes

<!ELEMENT BEER EMPTY>

<!ATTLIST name CDATA #REQUIRED,

price CDATA #IMPLIED>

No closingtag orsubelements

Characterstring

Required = “must occur”;Implied = “optional

Example use:<BEER name=“Bud” />

Page 25: Exam II Syllabus uStorage & Buffer Management uIndexing: Btrees & Hash uMulti-dimensional Indexing uQuery processing (relational ops) uQuery optimization.

25

ID’s and IDREF’s

Attributes can be pointers from one object to another. Compare to HTML’s NAME = ”foo”

and HREF = ”#foo”. Allows the structure of an XML

document to be a general graph, rather than just a tree.

Page 26: Exam II Syllabus uStorage & Buffer Management uIndexing: Btrees & Hash uMulti-dimensional Indexing uQuery processing (relational ops) uQuery optimization.

26

Creating ID’s

Give an element E an attribute A of type ID.

When using tag <E > in an XML document, give its attribute A a unique value.

Example:<E A = ”xyz”>

Page 27: Exam II Syllabus uStorage & Buffer Management uIndexing: Btrees & Hash uMulti-dimensional Indexing uQuery processing (relational ops) uQuery optimization.

27

Creating IDREF’s

To allow elements of type F to refer to another element with an ID attribute, give F an attribute of type IDREF.

Or, let the attribute have type IDREFS, so the F -element can refer to any number of other elements.

Page 28: Exam II Syllabus uStorage & Buffer Management uIndexing: Btrees & Hash uMulti-dimensional Indexing uQuery processing (relational ops) uQuery optimization.

28

Example: ID’s and IDREF’s

A new BARS DTD includes both BAR and BEER subelements.

BARS and BEERS have ID attributes name. BARS have SELLS subelements, consisting

of a number (the price of one beer) and an IDREF theBeer leading to that beer.

BEERS have attribute soldBy, which is an IDREFS leading to all the bars that sell it.

Page 29: Exam II Syllabus uStorage & Buffer Management uIndexing: Btrees & Hash uMulti-dimensional Indexing uQuery processing (relational ops) uQuery optimization.

29

The DTD<!DOCTYPE BARS [

<!ELEMENT BARS (BAR*, BEER*)><!ELEMENT BAR (SELLS+)>

<!ATTLIST BAR name ID #REQUIRED><!ELEMENT SELLS (#PCDATA)>

<!ATTLIST SELLS theBeer IDREF #REQUIRED><!ELEMENT BEER EMPTY>

<!ATTLIST BEER name ID #REQUIRED><!ATTLIST BEER soldBy IDREFS #IMPLIED>

]>Beer elements have an ID attribute called name,and a soldBy attribute that is a set of Bar names.

SELLS elementshave a number(the price) andone referenceto a beer.

Bar elements have nameas an ID attribute andhave one or moreSELLS subelements.

No Subelements

Page 30: Exam II Syllabus uStorage & Buffer Management uIndexing: Btrees & Hash uMulti-dimensional Indexing uQuery processing (relational ops) uQuery optimization.

30

Example: A Document

<BARS><BAR name = ”JoesBar”>

<SELLS theBeer = ”Bud”>2.50</SELLS><SELLS theBeer =

”Miller”>3.00</SELLS></BAR> …<BEER name = ”Bud” soldBy = ”JoesBar

SuesBar …” /> …</BARS>

Page 31: Exam II Syllabus uStorage & Buffer Management uIndexing: Btrees & Hash uMulti-dimensional Indexing uQuery processing (relational ops) uQuery optimization.

31

Empty Elements

We can do all the work of an element in its attributes. Like BEER in previous example.

Another example: SELLS elements could have attribute price rather than a value that is a price.

Page 32: Exam II Syllabus uStorage & Buffer Management uIndexing: Btrees & Hash uMulti-dimensional Indexing uQuery processing (relational ops) uQuery optimization.

32

Example: Empty Element

In the DTD, declare:<!ELEMENT SELLS EMPTY>

<!ATTLIST SELLS theBeer IDREF #REQUIRED><!ATTLIST SELLS price CDATA #REQUIRED>

Example use:<SELLS theBeer = ”Bud” price = ”2.50” />

Note exception to“matching tags” rule

Page 33: Exam II Syllabus uStorage & Buffer Management uIndexing: Btrees & Hash uMulti-dimensional Indexing uQuery processing (relational ops) uQuery optimization.

33

XML Schema

A more powerful way to describe the structure of XML documents than DTDs

XML-Schema declarations are themselves XML documents. They describe “elements” and the

things doing the describing are also “elements.”

Page 34: Exam II Syllabus uStorage & Buffer Management uIndexing: Btrees & Hash uMulti-dimensional Indexing uQuery processing (relational ops) uQuery optimization.

34

Structure of an XML-Schema Document

<? xml version = … ?>

<xs:schema xmlns:xs =

”http://www.w3.org/2001/XMLschema”>. . .

</xs:schema> Defines ”xs” to be thenamespace described inthe URL shown. Any stringin place of ”xs” is OK.

So uses of ”xs” within theschema element refer totags from this namespace.

Page 35: Exam II Syllabus uStorage & Buffer Management uIndexing: Btrees & Hash uMulti-dimensional Indexing uQuery processing (relational ops) uQuery optimization.

35

The xs:element Element

Has attributes:1. name = the tag-name of the

element being defined.2. type = the type of the element.

Could be an XML-Schema type, e.g., xs:string.

Or the name of a type defined in the document itself.

Page 36: Exam II Syllabus uStorage & Buffer Management uIndexing: Btrees & Hash uMulti-dimensional Indexing uQuery processing (relational ops) uQuery optimization.

36

Example: xs:element

<xs:element name = ”NAME”

type = ”xs:string” /> Describes elements such as <NAME>Joe’s Bar</NAME>

Page 37: Exam II Syllabus uStorage & Buffer Management uIndexing: Btrees & Hash uMulti-dimensional Indexing uQuery processing (relational ops) uQuery optimization.

37

Complex Types

To describe elements that consist of subelements, we use xs:complexType. Attribute name gives a name to the type.

Typical subelement of a complex type is xs:sequence, which itself has a sequence of xs:element subelements. Use minOccurs and maxOccurs attributes

to control the number of occurrences of an xs:element.

Page 38: Exam II Syllabus uStorage & Buffer Management uIndexing: Btrees & Hash uMulti-dimensional Indexing uQuery processing (relational ops) uQuery optimization.

38

Example: a Type for Beers

<xs:complexType name = ”beerType”><xs:sequence> <xs:element name = ”NAME”

type = ”xs:string” minOccurs = ”1” maxOccurs = ”1” />

<xs:element name = ”PRICE” type = ”xs:float” minOccurs = ”0” maxOccurs = ”1” />

</xs:sequence></xs:complexType>

Exactly oneoccurrence

Like ? ina DTD

Page 39: Exam II Syllabus uStorage & Buffer Management uIndexing: Btrees & Hash uMulti-dimensional Indexing uQuery processing (relational ops) uQuery optimization.

39

An Element of Type beerType

<xxx>

<NAME>Bud</NAME>

<PRICE>2.50</PRICE>

</xxx>

We don’t know thename of the elementof this type.

Page 40: Exam II Syllabus uStorage & Buffer Management uIndexing: Btrees & Hash uMulti-dimensional Indexing uQuery processing (relational ops) uQuery optimization.

40

Example: Create Element of beerType

<xs:element name = ”Beers”

type = ”beerType” />

<Beers>

<NAME>Bud</NAME>

<PRICE>2.50</PRICE>

</Beers>

Now we know the name

Page 41: Exam II Syllabus uStorage & Buffer Management uIndexing: Btrees & Hash uMulti-dimensional Indexing uQuery processing (relational ops) uQuery optimization.

41

Example: a Type for Bars

<xs:complexType name = ”barType”><xs:sequence> <xs:element name = ”NAME”

type = ”xs:string” minOccurs = ”1” maxOccurs = ”1” />

<xs:element name = ”BEER” type = ”beerType” minOccurs = ”0” maxOccurs =

”unbounded” /></xs:sequence>

</xs:complexType>Like * ina DTD

Page 42: Exam II Syllabus uStorage & Buffer Management uIndexing: Btrees & Hash uMulti-dimensional Indexing uQuery processing (relational ops) uQuery optimization.

42

xs:attribute

xs:attribute elements can be used within a complex type to indicate attributes of elements of that type.

attributes of xs:attribute: name and type as for xs.element. use = ”required” or ”optional”.

Page 43: Exam II Syllabus uStorage & Buffer Management uIndexing: Btrees & Hash uMulti-dimensional Indexing uQuery processing (relational ops) uQuery optimization.

43

Example: xs:attribute

<xs:complexType name = ”beerType”><xs:attribute name = ”name”

type = ”xs:string”use = ”required” />

<xs:attribute name = ”price”type = ”xs:float” use = ”optional” />

</xs:complexType>

Page 44: Exam II Syllabus uStorage & Buffer Management uIndexing: Btrees & Hash uMulti-dimensional Indexing uQuery processing (relational ops) uQuery optimization.

44

An Element of This New Type beerType

<xxx name = ”Bud”

price = ”2.50” />

We still don’t know theelement name.

The element isempty, since thereare no declaredsubelements.

Page 45: Exam II Syllabus uStorage & Buffer Management uIndexing: Btrees & Hash uMulti-dimensional Indexing uQuery processing (relational ops) uQuery optimization.

45

Restricted Simple Types

xs:simpleType can describe enumerations and range-restricted base types.

name is an attribute xs:restriction is a subelement.

Page 46: Exam II Syllabus uStorage & Buffer Management uIndexing: Btrees & Hash uMulti-dimensional Indexing uQuery processing (relational ops) uQuery optimization.

46

Restrictions

Attribute base gives the simple type to be restricted, e.g., xs:integer.

xs:{min, max}{Inclusive, Exclusive} are four attributes that can give a lower or upper bound on a numerical range.

xs:enumeration is a subelement with attribute value that allows enumerated types.

Page 47: Exam II Syllabus uStorage & Buffer Management uIndexing: Btrees & Hash uMulti-dimensional Indexing uQuery processing (relational ops) uQuery optimization.

47

Example: license Attribute for BAR

<xs:simpleType name = ”license”>

<xs:restriction base = ”xs:string”>

<xs:enumeration value = ”Full” />

<xs:enumeration value = ”Beer only” />

<xs:enumeration value = ”Sushi” />

</xs:restriction>

</xs:simpleType>

Page 48: Exam II Syllabus uStorage & Buffer Management uIndexing: Btrees & Hash uMulti-dimensional Indexing uQuery processing (relational ops) uQuery optimization.

48

Example: Prices in Range [1,5)

<xs:simpleType name = ”price”>

<xs:restriction

base = ”xs:float”

minInclusive = ”1.00”

maxExclusive = ”5.00” />

</xs:simpleType>

Page 49: Exam II Syllabus uStorage & Buffer Management uIndexing: Btrees & Hash uMulti-dimensional Indexing uQuery processing (relational ops) uQuery optimization.

49

Keys in XML Schema

An xs:element can have an xs:key subelement.

Meaning: within this element, all subelements reached by a certain selector path will have unique values for a certain combination of fields.

Example: within one BAR element, the name attribute of a BEER element is unique.

Page 50: Exam II Syllabus uStorage & Buffer Management uIndexing: Btrees & Hash uMulti-dimensional Indexing uQuery processing (relational ops) uQuery optimization.

50

Example: Key

<xs:element name = ”BAR” … >

. . .

<xs:key name = ”barKey”>

<xs:selector xpath = ”BEER” />

<xs:field xpath = ”@name” />

</xs:key>

. . .

</xs:element>

XPath is a query languagefor XML. All we need toknow here is that a pathis a sequence of tagsseparated by /.

And @indicatesan attributerather thana tag.

Page 51: Exam II Syllabus uStorage & Buffer Management uIndexing: Btrees & Hash uMulti-dimensional Indexing uQuery processing (relational ops) uQuery optimization.

51

Foreign Keys

An xs:keyref subelement within an xs:element says that within this element, certain values (defined by selector and field(s), as for keys) must appear as values of a certain key.

Page 52: Exam II Syllabus uStorage & Buffer Management uIndexing: Btrees & Hash uMulti-dimensional Indexing uQuery processing (relational ops) uQuery optimization.

52

Example: Foreign Key

Suppose that we have declared that subelement NAME of BAR is a key for BARS. The name of the key is barKey.

We wish to declare DRINKER elements that have FREQ subelements. An attribute bar of FREQ is a foreign key, referring to the NAME of a BAR.

Page 53: Exam II Syllabus uStorage & Buffer Management uIndexing: Btrees & Hash uMulti-dimensional Indexing uQuery processing (relational ops) uQuery optimization.

53

Example: Foreign Key in XML Schema

<xs:element name = ”DRINKERS”. . .

<xs:keyref name = ”barRef”refers = ”barKey”<xs:selector xpath =

”DRINKER/FREQ” /><xs:field xpath = ”@bar” />

</xs:keyref></xs:element>