CSE 636 Data Integration XML Query Languages XQuery.

84
CSE 636 Data Integration XML Query Languages XQuery
  • date post

    20-Dec-2015
  • Category

    Documents

  • view

    237
  • download

    0

Transcript of CSE 636 Data Integration XML Query Languages XQuery.

CSE 636Data Integration

XML Query Languages

XQuery

2

XQuery

• http://www.w3.org/TR/xquery/ (11/05)• Functional Programming Language• Operates on XML Sources• Returns XML

3

XQuery Components

• XQuery is composed of– Path expressions– Element constructors– FLWOR expressions– … and more …

4

Path Expressions

doc("co")/CUSTOMER_ORDERS/CUSTOMER/ORDER

CUSTOMER_ORDERS

NAMESue

[email protected]

CUSTOMER

ORDER

NO1897

SKUC5

QTY2

ITEMCARRIERUPS

NAMETom

[email protected]

CUSTOMER

ORDER

NO1878

SKUB7

QTY2

ITEMCARRIERUPS

ORDER

NO1861

SKUC5

QTY1

ITEMCARRIERFEDEX

NAMEAnn

[email protected]

CUSTOMER

SKUP5

QTY1

ITEM

CUSTOMER

Evaluate expression bycollecting all elementswhich satisfy the path

5

Element Construction

<ORDERS> { doc("co")/CUSTOMER_ORDERS/CUSTOMER/ORDER} </ORDERS>

ORDER

NO1861

SKUC5

QTY1

ITEMCARRIERFEDEX

ORDER

NO1878

SKUB7

QTY2

ITEMCARRIERUPS

ORDER

NO1897

SKUC5

QTY2

ITEMCARRIERUPS

SKUP5

QTY1

ITEM

ORDERS

A complete, executablequery returning the

ORDERS tree

1. Evaluate expression inside { ... }

2. Connect into tree

6

Introduction to for Expression

<ORDERS> { for $order in doc("co")/CUSTOMER_ORDERS/CUSTOMER/ORDER return $order} </ORDERS>

<ORDERS> { doc("co")/CUSTOMER_ORDERS/CUSTOMER/ORDER} </ORDERS>

ORDER

NO1861

SKUC5

QTY1

ITEMCARRIERFEDEX

ORDER

NO1878

SKUB7

QTY2

ITEMCARRIERUPS

ORDER

NO1897

SKUC5

QTY2

ITEMCARRIERUPS

SKUP5

QTY1

ITEM

ORDERS

Our path query …

… can be rewritten using a for expression:

7

Topics

• For-Let-Where-Order by-Return Expressions• Type Conversions• Variable Bindings• Joins• Nested Queries• Boolean Expressions• Conditionals• Aggregations• Missing Data in Joins and Nested Queries• Advanced Example• Sequences• Query Prolog

8

<ORDERS> { for $order in doc("co")/CUSTOMER_ORDERS/CUSTOMER/ORDER where $order/CARRIER = "UPS" return $order} </ORDERS>

ORDER

NO1861

SKUC5

QTY1

ITEMCARRIERFEDEX

ORDER

NO1878

SKUB7

QTY2

ITEMCARRIERUPS

ORDER

NO1897

SKUC5

QTY2

ITEMCARRIERUPS

SKUP5

QTY1

ITEM

ORDERS

Example with where

We take our previous query and add a where clause:

The output is the same as in the previous example, except non-UPS carriers are removed.

9

<ORDERS> { for $order in doc("co")/CUSTOMER_ORDERS/CUSTOMER/ORDER where $order/CARRIER = "UPS" return $order} </ORDERS>

FLWOR Expressions: The for Clause

The for variableranges over result

of in expression

CUSTOMER_ORDERS

NAMESue

[email protected]

CUSTOMER

ORDER

NO1897

SKUC5

QTY2

ITEMCARRIERUPS

NAMETom

[email protected]

CUSTOMER

ORDER

NO1878

SKUB7

QTY2

ITEMCARRIERUPS

ORDER

NO1861

SKUC5

QTY1

ITEMCARRIERFEDEX

NAMEAnn

[email protected]

CUSTOMER

SKUP5

QTY1

ITEM

CUSTOMER

10

<ORDERS> { for $order in doc("co")/CUSTOMER_ORDERS/CUSTOMER/ORDER where $order/CARRIER = "UPS" return $order} </ORDERS>

FLWOR Expressions: The where Clause

Selects only orders with UPS

as the carrier

CUSTOMER_ORDERS

NAMESue

[email protected]

CUSTOMER

ORDER

NO1897

SKUC5

QTY2

ITEMCARRIERUPS

NAMETom

[email protected]

CUSTOMER

ORDER

NO1878

SKUB7

QTY2

ITEMCARRIERUPS

ORDER

NO1861

SKUC5

QTY1

ITEMCARRIERFEDEX

NAMEAnn

[email protected]

CUSTOMER

SKUP5

QTY1

ITEM

CUSTOMER

11

<ORDERS> { for $order in doc("co")/CUSTOMER_ORDERS/CUSTOMER/ORDER where $order/CARRIER = "UPS" return $order} </ORDERS>

FLWOR Expressions: The return Clause

Every $orderthat qualified is added

to the return list:

ORDER

NO1897

SKUC5

QTY2

ITEMCARRIERUPS

ORDER

NO1878

SKUB7

QTY2

ITEMCARRIERUPS

SKUP5

QTY1

ITEM

12

<ORDERS> { for $order in doc("co")/CUSTOMER_ORDERS/CUSTOMER/ORDER where $order/CARRIER = "UPS" return $order} </ORDERS>

ORDERS

FLWOR Expressions: Final Result

The list coming fromthe FLWOR expression …

ORDER

NO1897

SKUC5

QTY2

ITEMCARRIERUPS

ORDER

NO1878

SKUB7

QTY2

ITEMCARRIERUPS

SKUP5

QTY1

ITEM

… is constructed intothe ORDERS element

to complete the example.

13

<ORDERS> { for $order in doc("co")/CUSTOMER_ORDERS/CUSTOMER/ORDER where $order/CARRIER = "UPS" return <ID> { data($order/NO) } </ID>} </ORDERS>

ORDERS

ID1897

ID1878

Example with Element Construction

Here, the return statementconstructs elements from values

• The “data” function returns the value of an element• The return statement also contains tags

• The next slide illustrates how the following result is created:

14

ORDER

NO1878

SKUB7

QTY2

ITEMCARRIERUPS

ORDER

NO1897

SKUC5

QTY2

ITEMCARRIERUPS

SKUP5

QTY1

ITEM

<ORDERS> { for $order in doc("co")/CUSTOMER_ORDERS/CUSTOMER/ORDER where $order/CARRIER = "UPS" return <ID> { data($order/NO) } </ID>} </ORDERS>

ORDERS

ID1897

ID1878

Return – Element Construction

1. Bring in selected items as before

2. Path selection

3. New element construction

4. Connect into tree

15

FLWOR Expressions: The let Clause

<ORDERS> { for $order in doc("co")/CUSTOMER_ORDERS/CUSTOMER/ORDER let $carrier := $order/CARRIER let $id := data($order/NO) where $carrier = "UPS" return <ID> { $id } </ID>} </ORDERS>

<ORDERS> { for $order in doc("co")/CUSTOMER_ORDERS/CUSTOMER/ORDER where $order/CARRIER = "UPS" return <ID> { data($order/NO) } </ID>} </ORDERS>

ORDERS

ID1897

ID1878

Our previous example can be rewritten using extra variable bindings to improve clarity:

16

<CUSTOMERS> { for $customer in doc("co")/CUSTOMER_ORDERS/CUSTOMER let $name := $customer/NAME order by $customer/NAME ascending return <CUSTOMER> {$customer/NAME} </CUSTOMER>} </CUSTOMERS>

NAMESue

CUSTOMER

NAMEAnn

CUSTOMERS

NAMETom

CUSTOMERCUSTOMER

FLWOR Expressions:The order by Clause

For this example, we prepare a list of customers sorted by customer name

17

Topics

• For-Let-Where-Order by-Return Expressions• Type Conversions• Variable Bindings• Joins• Nested Queries• Boolean Expressions• Conditionals• Aggregations• Missing Data in Joins and Nested Queries• Advanced Example• Sequences• Query Prolog

18

• In the context of functions and operators, values are automatically extracted from elements:

Type Conversions

<ORDERS> { for $order in doc("co")/CUSTOMER_ORDERS/CUSTOMER/ORDER where $order/CARRIER = "UPS" return <ID> { concat("ORDER-", $order/NO) } </ID>} </ORDERS>

19

• $order/NO binds to an element• concat(…) requires a string

• Value of the element is automatically extracted

• Same happens to lists containing a single element or value

Type Conversions

20

Type Conversions

All other cases result in errors

<ORDERS> { <ID> { concat("ORDER-", doc("co")/CUSTOMER_ORDERS/CUSTOMER/ORDER/NO) } </ID>} </ORDERS>

• Path expression above binds to lists • Cannot extract a value from a list of many

items!

21

• The data() function can be used to explicitly extract the value:

Type Conversions

<ORDERS> { for $order in doc("co")/CUSTOMER_ORDERS/CUSTOMER/ORDER where $order/CARRIER = "UPS" return <ID> { concat("ORDER-", data($order/NO)) } </ID>} </ORDERS>

22

• Automatic extraction of values does not occur in element construction

• In that case, the data() function is required:

Type Conversions

<ORDERS> { for $order in doc("co")/CUSTOMER_ORDERS/CUSTOMER/ORDER where $order/CARRIER = "UPS" return <ID> { data($order/NO) } </ID>} </ORDERS>

23

Topics

• For-Let-Where-Order by-Return Expressions• Type Conversions• Variable Bindings• Joins• Nested Queries• Boolean Expressions• Conditionals• Aggregations• Missing Data in Joins and Nested Queries• Advanced Example• Sequences• Query Prolog

24

For-Let-Where-Order By-Return (FLWOR)

<ORDERS> { for $order in doc("co")/CUSTOMER_ORDERS/CUSTOMER/ORDER let $carrier := $order/CARRIER let $id := data($order/NO) where $carrier = "UPS" return <ID> { $id } </ID>} </ORDERS>

Let’s take a more in-depth look at the variable bindings in the query developed previously

return clauseis executed for each remaining

tuple, generating

a list of trees

return expr

for and letclauses generate a list of tuples of variable bindings,

preserving input order

for $var1 in expr

let $var2 := expr

where clauseapplies a predicate,eliminatingsome of the

tuples

where expr order by expr

order byclause

imposes anorder on theremaining

tuples

25

<ORDERS> { for $order in doc("co")/CUSTOMER_ORDERS/CUSTOMER/ORDER let $carrier := $order/CARRIER, $id := data($order/NO) where $carrier = "UPS" return <ID> { $id } </ID>} </ORDERS>

where

YES

YES

NO

ID1878

result

ORDERS

ID1897

ID1878

return

ID1897

$orderORDER

SKUC5

QTY1

ITEMCARRIERFEDEX

ORDER

NO1878

SKUB7

QTY2

ITEMCARRIERUPS

ORDER

NO1897

SKUC5

QTY1

ITEM

SKUP5

QTY1

ITEMCARRIERUPS

NO1861

for/let$id

1897

1878

1861

$carrier

CARRIERFEDEX

CARRIERUPS

CARRIERUPS

FLWOR Variable Bindings

26

for vs. let

for• Binds node variables iterationfor $x in expr– binds $x to each element in the list expr

let• Binds collection variables one valuelet $x := expr– binds $x to the entire list expr– Useful for common subexpressions and for

aggregations

27

for vs. let

Returns: <result> <ORDER>…</ORDER></result> <result> <ORDER>…</ORDER></result> <result> <ORDER>…</ORDER></result> …

Returns: <result> <ORDER>…</ORDER> <ORDER>…</ORDER> <ORDER>…</ORDER> … </result>

for $order in doc("co")/CUSTOMER_ORDERS/CUSTOMER/ORDERreturn <result> { $order } </result>

let $order := doc("co")/CUSTOMER_ORDERS/CUSTOMER/ORDERreturn <result> { $order } </result>

28

for vs. let

<POPULAR_ITEMS> { for $sku in distinct-values(doc(“co")//ITEM/SKU) let $items := doc(“co")//ORDER/ITEM[SKU = $sku] let $qtyTotal := sum($items/QTY) where $qtyTotal > 1 return <ITEM> { $sku } </ITEM>} </POPULAR_ITEMS>

• distinct-values– a function that eliminates duplicate values– can be applied to simple elements and atomic values

• sum– a (aggregate) function that returns the sum of integers

29

<POPULAR_ITEMS> { for $sku in distinct-values(doc(“co")//ITEM/SKU) let $items := doc(“co")//ORDER/ITEM[SKU = $sku] let $qtyTotal := sum($items/QTY) where $qtyTotal > 1 return <ITEM> { $sku } </ITEM>} </POPULAR_ITEMS>

where

YES

NO

YES

result

POPULAR_ITEMS

return

ITEMC5

$sku

SKUB7

QTY2

ITEM

SKUP5

QTY1

ITEM

for/let

P5

B7

C5

$items

for vs. let

ITEMC5

SKUC5

QTY1

ITEM

SKUC5

QTY2

ITEM

$qtyTotal

1

2

3

ITEMB7

ITEMB7

30

for vs. let

Find items whose quantity is larger than average:let $avgQty := avg(doc(“co”)//ITEM/QTY)for $item in doc(“co”)//ITEMwhere $item/QTY > $avgQtyreturn $item

where

YES

NO

YES

return$avgQty

SKUB7

QTY2

ITEM

SKUP5

QTY1

ITEM

for/let

1.5

1.5

1.5

$items

SKUC5

QTY1

ITEM

SKUC5

QTY2

ITEM

$qtyTotal

1

2

2

1.5 1 NO

SKUB7

QTY2

ITEM

SKUC5

QTY2

ITEM

$avgQtylet

1.5

31

Topics

• For-Let-Where-Order by-Return Expressions• Type Conversions• Variable Bindings• Joins• Nested Queries• Boolean Expressions• Conditionals• Aggregations• Missing Data in Joins and Nested Queries• Advanced Example• Sequences• Query Prolog

32

• Joins are expressed using a FLWOR with two loop variables– two for clauses

• A where condition specifies how the loop variables relate

Joins

33

NAMEUPS

PICKUP5PM

SHIPPER

NAMEFEDEX

PICKUP2PM

SHIPPER

ORDER

ID1878

DEADLINE5PM

ORDERS

ORDER

ID1897

DEADLINE5PM

ORDER

ID1861

DEADLINE2PM

ORDER

NO1861

SKUC5

QTY1

ITEMCARRIERFEDEX

ORDER

NO1878

SKUB7

QTY2

ITEMCARRIERUPS

ORDER

NO1897

SKUC5

QTY2

ITEMCARRIERUPS

SKUP5

QTY1

ITEM

Join Example

Combineorders…

… withshipper info …

… to produceorder deadlines

34

Join Example Query

<ORDERS> { for $order in doc("co")/CUSTOMER_ORDERS/CUSTOMER/ORDER for $shipper in doc("s")/SHIPPERS/SHIPPER let $id := data($order/NO) let $time := data($shipper/PICKUP) where $order/CARRIER = $shipper/NAME return <ORDER> <ID>{$id}</ID> <DEADLINE>{$time}</DEADLINE> </ORDER>} </ORDERS>

Uses multiple for statements to generate Cartesian product of tuples

Uses where statement to filter Cartesian product

35

$time$id

1897

1878 2PM

1861

2PM

2PM

1897

1878 5PM

1861

5PM

5PM

NO

NO

YES

YES

YES

NO

where

Join Conditions

$order

CARRIERFEDEX

ITEMNO1861

ORDER

for/let

CARRIERUPS

ITEMNO1878

ORDER

CARRIERUPS

ITEMNO1897

ORDER

… ITEM…

CARRIERFEDEX

ITEMNO1861

ORDER

CARRIERUPS

ITEMNO1878

ORDER

CARRIERUPS

ITEMNO1897

ORDER

… ITEM…

$shipper

NAMEFEDEX

SHIPPER

PICKUP2PM

NAMEFEDEX

SHIPPER

PICKUP2PM

NAMEFEDEX

SHIPPER

PICKUP2PM

NAMEUPS

SHIPPER

PICKUP5PM

NAMEUPS

SHIPPER

PICKUP5PM

NAMEUPS

SHIPPER

PICKUP5PM

ORDER

DEADLINE2PM

ID1861

ORDER

DEADLINE5PM

ID1897

ORDER

DEADLINE5PM

ID1878

return

36

Condensed Join Table

<ORDERS> { for $order in doc("co")/CUSTOMER_ORDERS/CUSTOMER/ORDER, $shipper in doc("s")/SHIPPERS/SHIPPER let $id := data($order/NO), $time := data($shipper/PICKUP) where $order/CARRIER = $shipper/NAME return <ORDER> <ID>{$id}</ID> <DEADLINE>{$time}</DEADLINE> </ORDER>} </ORDERS>

$time$id

1861 2PM

1897

1878 5PM

5PM

$order

CARRIERFEDEX

ITEMNO1861

ORDER

for/let

CARRIERUPS

ITEMNO1878

ORDER

CARRIERUPS

ITEMNO1897

ORDER

… ITEM…

$shipper

NAMEFEDEX

SHIPPER

PICKUP2PM

NAMEUPS

SHIPPER

PICKUP5PM

NAMEUPS

SHIPPER

PICKUP5PM

ORDER

DEADLINE2PM

ID1861

ORDER

DEADLINE5PM

ID1897

ORDER

DEADLINE5PM

ID1878

return

In future examples,non-joined rows are removed, as are join where conditions:

37

Topics

• For-Let-Where-Order by-Return Expressions• Type Conversions• Variable Bindings• Joins• Nested Queries• Boolean Expressions• Conditionals• Aggregations• Missing Data in Joins and Nested Queries• Advanced Example• Sequences• Query Prolog

38

• Nested queries produce hierarchical results• An outer FLWOR loop contains

an inner FLWOR loop• Typically, a where condition in the inner

FLWOR specifies how the loops relate

Nested Queries

39

NAMEUPS

PICKUP5PM

SHIPPER

NAMEFEDEX

PICKUP2PM

SHIPPER

Nested Query Example

Combineshippers…

ORDER

NO1861

SKUC5

QTY1

ITEMCARRIERFEDEX

ORDER

NO1878

SKUB7

QTY2

ITEMCARRIERUPS

ORDER

NO1897

SKUC5

QTY2

ITEMCARRIERUPS

SKUP5

QTY1

ITEM… withorders …

SHIPPER_ORDERS

SHIPPER

ORDER1878

NAMEUPS

SHIPPER

ORDER1861

NAMEFEDEX

ORDER1897

… to produce orders for each shipper

40

<SHIPPER_ORDERS> { for $shipper in doc("s")/SHIPPERS/SHIPPER let $name := $shipper/NAME return <SHIPPER> { $name } { for $order in doc("co")/CUSTOMER_ORDERS/CUSTOMER/ORDER let $id := data($order/NO) where $name = $order/CARRIER return <ORDER> { $id } </ORDER> } </SHIPPER>} </SHIPPER_ORDERS>

Nested Query

• Outer loop binds $shipper and $name variables• For each $shipper, $name pair, inner loop binds $order and

$id variables• Inner where clause removes $order, $id pairs that don’t

match outer element• Inner loop constructs elements from inner variables• Outer loop constructs elements from outer variables and from

elements constructed in inner loop

41

Join Conditions

$name

NAMEFEDEX

NAMEUPS

$shipper

NAMEFEDEX

SHIPPER

PICKUP2PM

NAMEUPS

SHIPPER

PICKUP5PM

OUTER LOOP

SHIPPER

ORDER1861

NAMEFEDEX

return

SHIPPER

ORDER1878

NAMEUPS

ORDER1897

OUTER LOOP$id

1897

1878

1861

1897

1878

1861

$order

CARRIERFEDEX

ITEMNO1861

ORDER

CARRIERUPS

ITEMNO1878

ORDER

CARRIERUPS

ITEMNO1897

ORDER

… ITEM…CARRIERFEDEX

ITEMNO1861

ORDER

…CARRIER

UPSITEMNO

1878

ORDER…

CARRIERUPS

ITEMNO1897

ORDER

… ITEM…

INNER LOOP

NO

NO

YES

YES

YES

NO

where

ORDER1861

ORDER1897

ORDER1878

return

42

Condensed Nested Query Table

$name

NAMEFEDEX

NAMEUPS

$shipper

NAMEFEDEX

SHIPPER

PICKUP2PM

NAMEUPS

SHIPPER

PICKUP5PM

OUTER LOOP

SHIPPER

ORDER1861

NAMEFEDEX

return

SHIPPER

ORDER1878

NAMEUPS

ORDER1897

OUTER LOOP$id

1861

1897

1878

$order

CARRIERFEDEX

ITEMNO1861

ORDER

CARRIERUPS

ITEMNO1878

ORDER…

CARRIERUPS

ITEMNO1897

ORDER

… ITEM…

INNER LOOP

YES

YES

YES

where

ORDER1861

ORDER1897

ORDER1878

return

<SHIPPER_ORDERS> { for $shipper in doc("s")/SHIPPERS/SHIPPER let $name := $shipper/NAME return <SHIPPER> { $name } { for $order in doc("co")/CUSTOMER_ORDERS/CUSTOMER/ORDER let $id := data($order/NO) where $name = $order/CARRIER return <ORDER> { $id } </ORDER> } </SHIPPER>} </SHIPPER_ORDERS>

In future examples,non-matched inner rows

are removed, as are where conditions:

43

Topics

• For-Let-Where-Order by-Return Expressions• Type Conversions• Variable Bindings• Joins• Nested Queries• Boolean Expressions• Conditionals• Aggregations• Missing Data in Joins and Nested Queries• Advanced Example• Sequences• Query Prolog

44

Boolean Expressions

• In this section we examine various types of Boolean expressions that may appear in WHERE clauses

45

where

NO

YES

NO

result

ORDERS_IDS

ID1897

return

ID1897

$orderORDER

SKUC5

QTY1

ITEMCARRIERFEDEX

ORDER

NO1878

SKUB7

QTY2

ITEMCARRIERUPS

ORDER

NO1897

SKUC5

QTY1

ITEM

SKUP5

QTY1

ITEMCARRIERUPS

NO1861

for/let$id

1897

1878

1861

$lc

1

2

1

Functions in Boolean Expressions<ORDER_IDS> { for $order in doc("co")/CUSTOMER_ORDERS/CUSTOMER/ORDER let $id := data($order/NO), $lc := count($order/ITEM) where $lc > 1 return <ID> { $id } </ID>} </ORDER_IDS>

46

<ORDER_IDS> { for $order in doc("co")/CUSTOMER_ORDERS/CUSTOMER/ORDER let $id := data($order/NO), $lc := count($order/ITEM ) where $lc > 1 or $order/CARRIER = "FEDEX" return <ID> { $id } </ID>} </ORDER_IDS>

where

NO

YES

YES

$orderORDER

SKUC5

QTY1

ITEMCARRIERFEDEX

ORDER

NO1878

SKUB7

QTY2

ITEMCARRIERUPS

ORDER

NO1897

SKUC5

QTY1

ITEM

SKUP5

QTY1

ITEMCARRIERUPS

NO1861

for/let$id

1897

1878

1861

$lc

1

2

1

Disjunctions

return

ID1897

ID1861

result

ORDERS_IDS

ID1861

ID1897

47

<ORDER_IDS> { for $order in doc("co")/CUSTOMER_ORDERS/CUSTOMER/ORDER let $id := data($order/NO) where some $sku in $order/ITEM/SKU satisfies $sku = "C5" return <ID> { $id } </ID>} </ORDER_IDS>

Existential Quantification

where

NO

YES

YES

return

ID1897

ID1861

result

ORDERS_IDS

ID1861

ID1897

$orderORDER

SKUC5

QTY1

ITEMCARRIERFEDEX

ORDER

NO1878

SKUB7

QTY2

ITEMCARRIERUPS

ORDER

NO1897

SKUC5

QTY1

ITEM

SKUP5

QTY1

ITEMCARRIERUPS

NO1861

for/let$id

1897

1878

1861

$sku

SKUC5

SKUB7

SKUP5

SKUC5

48

<ORDER_IDS> { for $order in doc("co")/CUSTOMER_ORDERS/CUSTOMER/ORDER let $id := data($order/NO) where every $sku in $order/ITEM/SKU satisfies $sku = "C5" return <ID> { $id } </ID>} </ORDER_IDS>

Universal Quantification

where

NO

NO

YES

return

ID1861

result

ORDERS_IDS

ID1861

$orderORDER

SKUC5

QTY1

ITEMCARRIERFEDEX

ORDER

NO1878

SKUB7

QTY2

ITEMCARRIERUPS

ORDER

NO1897

SKUC5

QTY1

ITEM

SKUP5

QTY1

ITEMCARRIERUPS

NO1861

for/let$id

1897

1878

1861

$sku

SKUC5

SKUB7

SKUP5

SKUC5

49

Topics

• For-Let-Where-Order by-Return Expressions• Type Conversions• Variable Bindings• Joins• Nested Queries• Boolean Expressions• Conditionals• Aggregations• Missing Data in Joins and Nested Queries• Advanced Example• Sequences• Query Prolog

50

NAMETom

CUSTOMER

NAMEAnn

CUSTOMER

NAMESue

CUSTOMER

Conditionals Example Tree

Combinecustomers …

NAMESue

STATUSGOLD

MEMBER

NAMETom

STATUSGOLD

MEMBER

NAMEBob

STATUSSILVER

MEMBER… with member info …

CUSTOMER

NAMETom

CUSTOMERS

CUSTOMER

NAMESue

MEMBERNO

CUSTOMER

NAMEAnn

MEMBERYES

MEMBERYES

… to add MEMBER tag to customer data

51

<CUSTOMERS> { for $customer in doc("co")/CUSTOMER_ORDERS/CUSTOMER let $name := $customer/NAME return <CUSTOMER> {$name} { if (some $member in doc("m")/MEMBERS/MEMBER satisfies $member/NAME = $name) then <MEMBER>YES</MEMBER> else <MEMBER>NO</MEMBER> } </CUSTOMER>} </CUSTOMERS>

Conditionals Example Query

• For each customer, the existential quantification statement checks for the existence of a matching member

• If a matching member is found, the MEMBER YES tags are output; otherwise, the MEMBER NO tags are output

52

$customer

CUSTOMER

NAMESue

CUSTOMER

NAMEAnnCUSTOMER

NAMETom

NAMESue

$name

NAMETom

NAMEAnn

MEMBER

STATUSSILVER

if/then/else

NAMESue

MEMBER

STATUSGOLD

MEMBERNO

resultsome $member

NAMETom

MEMBERYES

MEMBERYES

CUSTOMER

MEMBERYES

NAMESue

return

CUSTOMER

MEMBERNO

NAMEAnn

CUSTOMER

MEMBERYES

NAMETom

Conditionals Table

<CUSTOMERS> { for $customer in doc("co")/CUSTOMER_ORDERS/CUSTOMER let $name := $customer/NAME return <CUSTOMER> {$name} { if (some $member in doc("m")/MEMBERS/MEMBER satisfies $member/NAME = $name) then <MEMBER>YES</MEMBER> else <MEMBER>NO</MEMBER> } </CUSTOMER>} </CUSTOMERS>

53

Topics

• For-Let-Where-Order by-Return Expressions• Type Conversions• Variable Bindings• Joins• Nested Queries• Boolean Expressions• Conditionals• Aggregations• Missing Data in Joins and Nested Queries• Advanced Example• Sequences• Query Prolog

54

<ORDERS> { for $order in doc("co")/CUSTOMER_ORDERS/CUSTOMER/ORDER let $id := data($order/NO) let $ic := count($order/ITEM) return <ORDER> <ID> {$id} </ID> <IC> {$ic} </IC> </ORDER>} </ORDERS>

Simple Aggregation

$orderORDER

SKUC5

QTY1

ITEMCARRIERFEDEX

ORDER

NO1878

SKUB7

QTY2

ITEMCARRIERUPS

ORDER

NO1897

SKUC5

QTY1

ITEM

SKUP5

QTY1

ITEMCARRIERUPS

NO1861

for/let$id

1897

1878

1861

$ic

1

1

2

return

IC1

ORDER

ID1878

IC2

ORDER

ID1897

IC1

ORDER

ID1861

result

ORDERS

IC1

ORDER

ID1878

IC2

ORDER

ID1897

IC1

ORDER

ID1861

55

<ORDERS> { for $order in doc("co")/CUSTOMER_ORDERS/CUSTOMER/ORDER let $id := data($order/NO) let $items := for $i in $order/ITEM where $i/SKU = "C5" return $i let $ic := count($items) return <ORDER> <ID> {$id} </ID> <IC> {$ic} </IC> </ORDER>} </ORDERS>

Conditional Aggregation

ORDER

SKUC5

QTY1

ITEMCARRIERFEDEX

ORDER

NO1878

SKUB7

QTY2

ITEMCARRIERUPS

ORDER

NO1897

SKUC5

QTY1

ITEM

SKUP5

QTY1

ITEMCARRIERUPS

NO1861

1897

1878

1861

$order $id$i return

QTY2

ITEM

SKUB7

QTY1

ITEM

SKUP5

QTY1

ITEM

SKUC5

QTY1

ITEM

SKUC5

$itemswhere

NO

NO

YES

YES

QTY1

ITEM

SKUC5

QTY1

ITEM

SKUC5

1

0

1

$ic

IC0

ORDER

ID1878

IC1

ORDER

ID1897

IC1

ORDER

ID1861

return

56

Topics

• For-Let-Where-Order by-Return Expressions• Type Conversions• Variable Bindings• Joins• Nested Queries• Boolean Expressions• Conditionals• Aggregations• Missing Data in Joins and Nested Queries• Advanced Example• Sequences• Query Prolog

57

Missing Data Join Example

• We will link CUSTOMER_ORDERS with MEMBERS

• There are customers that are not members

58

NAMETom

CUSTOMER

NAMEAnn

CUSTOMER

NAMESue

CUSTOMER

Missing Data Join Trees

Combinecustomers …

NAMESue

STATUSGOLD

MEMBER

NAMETom

STATUSGOLD

MEMBER

NAMEBob

STATUSSILVER

MEMBER… withmember info …

CUSTOMER

NAMETom

CUSTOMERS

CUSTOMER

NAMESue

PRIORITYGOLD

CUSTOMER

NAMEAnn

PRIORITYSILVER

… to producePrioritizedcustomers

59

Missing Data Join Query

<CUSTOMERS> { for $customer in doc("co")/CUSTOMER_ORDERS/CUSTOMER for $member in doc("m")/MEMBERS/MEMBER let $name := $customer/NAME let $status := data($member/STATUS) where $name = $member/NAME return <CUSTOMER> {$name} <PRIORITY>{$status}</PRIORITY> </CUSTOMER>} </CUSTOMERS>

60

Missing Data Join Table

<CUSTOMERS> { for $customer in doc("co")/CUSTOMER_ORDERS/CUSTOMER for $member in doc("m")/MEMBERS/MEMBER let $name := $customer/NAME let $status := data($member/STATUS) where $name = $member/NAME return <CUSTOMER> {$name} <PRIORITY>{$status}</PRIORITY> </CUSTOMER>} </CUSTOMERS>

$member

MEMBER

STATUSSILVER

NAMESue

MEMBER

STATUSGOLD

NAMETom

CUSTOMER

PRIORITYSILVER

NAMESue

return

CUSTOMER

PRIORITYGOLD

NAMETom

$name

SILVER

GOLD

NAMESue

NAMETom

NAMEAnn

$status$customer

CUSTOMER

NAMESue

CUSTOMER

NAMEAnnCUSTOMER

NAMETom

for/let/join

Result for Annis missing!

61

CUSTOMER

NAMETom

CUSTOMERS

CUSTOMER

NAMESue

PRIORITYGOLD

PRIORITYSILVER

CUSTOMER

NAMETom

CUSTOMERS

CUSTOMER

NAMESue

PRIORITYGOLD

CUSTOMER

NAMEAnn

PRIORITYSILVER

Missing Data Join Problem

Wanted:

Got:

The result we want is analogous to an SQL outer join

62

<CUSTOMERS> { for $customer in doc("co")/CUSTOMER_ORDERS/CUSTOMER for $member in doc("m")/MEMBERS/MEMBER let $name := $customer/NAME let $status := data($member/STATUS) where $name = $member/NAME return <CUSTOMER> {$name} <PRIORITY>{$status}</PRIORITY> </CUSTOMER>} </CUSTOMERS>

Missing Data Join Solution Query

Our join query …

<CUSTOMERS> { for $customer in doc("co")/CUSTOMER_ORDERS/CUSTOMER let $name := $customer/NAME return <CUSTOMER> {$name} { for $member in doc("m")/MEMBERS/MEMBER let $status := data($member/STATUS) where $name = $member/NAME return <PRIORITY>{$status}</PRIORITY> } </CUSTOMER>} </CUSTOMERS>

… can be restructured into a nested query:

63

Missing Data Join Solution Table

<CUSTOMERS> { for $customer in doc("co")/CUSTOMER_ORDERS/CUSTOMER let $name := $customer/NAME return <CUSTOMER> {$name} { for $member in doc("m")/MEMBERS/MEMBER let $status := data($member/STATUS) where $name = $member/NAME return <PRIORITY>{$status}</PRIORITY> } </CUSTOMER>} </CUSTOMERS>

$name

NAMESue

NAMETom

NAMEAnn

$customer

CUSTOMER

NAMESue

CUSTOMER

NAMEAnnCUSTOMER

NAMETom

OUTER LOOP$member

MEMBER

STATUSSILVER

NAMESue

MEMBER

STATUSGOLD

NAMETom

return

SILVER

GOLD

$status

INNER LOOP

PRIORITYSILVER

PRIORITYGOLD

CUSTOMER

PRIORITYSILVER

NAMESue

CUSTOMER

PRIORITYGOLD

NAMETom

OUTER LOOPreturn

CUSTOMER

NAMEAnn

64

Missing Data Joins vs. Nested Queries

• In joins, tuples with any missing data are eliminated– equivalent to an SQL natural or inner join

• In nested queries, tuples are output in spite of missing data– equivalent to an SQL outer join

65

Nested Query Problem

• How to remove tuples that have some missing data

• How to force inner join functionality in a nested query

66

Missing Data Nested Query Example

• Suppose we want a list, by product, of all items on order– perhaps for pulling the items from stock

• For each product, we want bundles, separate quantities for each order

• We don’t want to list products with no items on order

67

SKUC4

NAMECase

PRODUCT

SKUC5

NAMECable

PRODUCT

SKUB7

NAMEBattery

PRODUCT

SKUP5

NAMEPhone

PRODUCT

Missing Data Nested Query Trees

Combineproducts …

SKUC5

QTY2

ITEM

SKUC5

QTY1

ITEM

SKUP5

QTY1

ITEM

SKUB7

QTY2

ITEM

… with orderitems …

ITEMS_ON_ORDER

PRODUCT

SKUC5

NAMECable

BUNDLE1

PRODUCT

SKUP5

NAMEPhone

BUNDLE1

BUNDLE2

PRODUCT

SKUB7

NAMEBattery

BUNDLE2

… to itemson order

68

Missing Data Nested Query

<ITEMS_ON_ORDER> { for $p in doc("p")/PRODUCTS/PRODUCT let $sku := $p/SKU let $name := $p/NAME return <PRODUCT> {$sku} {$name} { for $i in doc("co")/CUSTOMER_ORDERS/CUSTOMER/ORDER/ITEM let $qty := data($i/QTY) where $sku = $i/SKU return <BUNDLE> {$qty} </BUNDLE> } </PRODUCT>} </ITEMS_ON_ORDER>

69

Missing Data Nested Query Table

NAMEPhone

NAMECase

NAMEBattery

NAMECable

SKUP5

SKUC4

SKUB7

SKUC5

$sku$p

OUTER LOOP

PRODUCT

SKUC4

NAMECase

PRODUCT

SKUP5

NAMEPhone

PRODUCT

SKUB7

NAMEBattery

PRODUCT

SKUC5

NAMECable

$name

ITEM

QTY1

SKUC5

ITEM

QTY1

SKUP5

ITEM

QTY2

SKUB7

ITEM

QTY2

SKUC5

$i return

2

$qty

INNER LOOP

BUNDLE2

1

2

1

BUNDLE1

BUNDLE2

BUNDLE2

PRODUCT

SKUC4

NAMECase

PRODUCT

SKUB7

NAMEBattery

BUNDLE2

PRODUCT

SKUP5

NAMEPhone

BUNDLE2

PRODUCT

SKUC5

BUNDLE2

NAMECable

BUNDLE1

OUTER LOOPreturn

70

ITEMS_ON_ORDER

PRODUCT

SKUC5

NAMECable

BUNDLE1

PRODUCT

SKUP5

NAMEPhone

BUNDLE1

BUNDLE2

PRODUCT

SKUB7

NAMEBattery

BUNDLE2

ITEMS_ON_ORDER

PRODUCT

SKUC5

NAMECable

BUNDLE1

PRODUCT

SKUP5

NAMEPhone

BUNDLE1

BUNDLE2

PRODUCT

SKUB7

NAMEBattery

BUNDLE2

PRODUCT

SKUC4

NAMECase

Missing Data Nested Query Problem

Wanted:

Got:

The result we want is analogous to an SQL inner (natural) join

71

<ITEMS_ON_ORDER> { for $p in doc("p")/PRODUCTS/PRODUCT let $sku := $p/SKU let $name := $p/NAME return <PRODUCT> {$sku} {$name} { for $i in doc("co")/CUSTOMER_ORDERS/CUSTOMER/ORDER/ITEM let $qty := data($i/QTY) where $sku = $i/SKU return <BUNDLE> {$qty} </BUNDLE> } </PRODUCT>} </ITEMS_ON_ORDER>

Missing Data Nested Query Solution

Our nested query …

… and a where clause can be added to remove

outer elements with no inner elements

… can be restructured with the inner for loop moved to a variable in the outer loop …

<ITEMS_ON_ORDER> { for $p in doc("p")/PRODUCTS/PRODUCT let $sku := $p/SKU, $name := $p/NAME let $bundle := for $i in doc("co")/CUSTOMER_ORDERS/CUSTOMER/ORDER/ITEM let $qty := data($i/QTY) where $sku = $i/SKU return <BUNDLE> {$qty} </BUNDLE> where not(empty($bundle)) return <PRODUCT> {$sku} {$name} {$bundle} </PRODUCT>} </ITEMS_ON_ORDER>

72

Missing Data Nested Query Solution Table

NAMEPhone

NAMECase

NAMEBattery

NAMECable

SKUP5

SKUC4

SKUB7

SKUC5

$sku$p

PRODUCT

SKUC4

NAMECase

PRODUCT

SKUP5

NAMEPhone

PRODUCT

SKUB7

NAMEBattery

PRODUCT

SKUC5

NAMECable

$name

ITEM

QTY1

SKUC5

ITEM

QTY1

SKUP5

ITEM

QTY2

SKUB7

ITEM

QTY2

SKUC5

$i return

2

$qty

BUNDLE2

1

2

1

BUNDLE1

BUNDLE2

BUNDLE2

$bundle

PRODUCT

SKUB7

NAMEBattery

BUNDLE2

PRODUCT

SKUP5

NAMEPhone

BUNDLE2

PRODUCT

SKUC5

BUNDLE2

NAMECable

BUNDLE1

returnwhere

YES

NO

YES

YES

73

Topics

• For-Let-Where-Order by-Return Expressions• Type Conversions• Variable Bindings• Joins• Nested Queries• Boolean Expressions• Conditionals• Aggregations• Missing Data in Joins and Nested Queries• Advanced Example• Sequences• Query Prolog

74

SKUC4

NAMECase

PRODUCT

SKUC5

NAMECable

PRODUCT

SKUB7

NAMEBattery

PRODUCT

SKUP5

NAMEPhone

PRODUCT

Advanced Example Trees

Combineproducts …

ORDER

NO1861

SKUC5

QTY1

ITEMCARRIERFEDEX

ORDER

NO1878

SKUB7

QTY2

ITEMCARRIERUPS

ORDER

NO1897

SKUC5

QTY2

ITEMCARRIERUPS

SKUP5

QTY1

ITEM… with

orders …

PRODUCT_ORDERS

PRODUCT

SKUC4

PRODUCT

ORDER1861

SKUC5

PRODUCT

ORDER1897

SKUP5

PRODUCT

ORDER1878

SKUB7

ORDER1897

… to produce orders for

each product

75

Advanced Example Query

(: By the way, this is a comment :)<PRODUCT_ORDERS> { for $product in doc("p")/PRODUCTS/PRODUCT return <PRODUCT> {$product/SKU} { for $order in doc("co")/CUSTOMER_ORDERS/CUSTOMER/ORDER let $id := data($order/NO) where some $sku in $order/ITEM/SKU satisfies $sku = $product/SKU return <ORDER> { $id } </ORDER> } </PRODUCT>} </PRODUCT_ORDERS>

For each product (outer for loop), loop through all orders (inner for loop)

Where statement filters out orders which don’t contain the product under consideration

76

Advanced Example Exercises

• Preparation of the query table (table of variable bindings) is left as an exercise

• How can the query be rewritten to– eliminate products with no orders?– add a <no_orders/> tag to products with no orders?– sort by SKU?– add a total quantity ordered count under each product?

77

Topics

• For-Let-Where-Order by-Return Expressions• Type Conversions• Variable Bindings• Joins• Nested Queries• Boolean Expressions• Conditionals• Aggregations• Missing Data in Joins and Nested Queries• Advanced Example• Sequences• Query Prolog

78

Sequences

• Ordered lists of nodes, either element, attribute or text nodes, or a combination thereof

• Can be constructed in for/let clausesfor $product in doc("p")/PRODUCTS/PRODUCT

• Or manually in the return clausefor $product in doc(“p")/PRODUCTS/PRODUCTreturn ( <SKU>{data($product/SKU)}</SKU>, <NAME>{data($product/NAME)}</NAME> )

• Not needed if a parent element constructor is presentfor $product in doc(“p")/PRODUCTS/PRODUCTreturn <PRODUCT> <SKU>{data($product/SKU)}</SKU> <NAME>{data($product/NAME)}</NAME> </PRODUCT>

79

Sequences

• Concatenation($seq1, $seq2)

• Union$seq1 union $seq2$seq1 | $seq2

– Example: for $product in doc(“p")/PRODUCTS/PRODUCT union doc(“co")//ITEM return $product

• Intersection$seq1 intersect $seq2

• Difference$seq1 except $seq2

• Union, Intersection and Difference remove duplicates

80

Topics

• For-Let-Where-Order by-Return Expressions• Type Conversions• Variable Bindings• Joins• Nested Queries• Boolean Expressions• Conditionals• Aggregations• Missing Data in Joins and Nested Queries• Advanced Example• Sequences• Query Prolog

81

QueryProlog

User-Defined Functions

• Useful for recursion

declare function local:depth($e as element()) as xs:integer { if (empty($e/*)) then 1 else max( for $child in $e/* return local:depth($child) + 1 ) };

for $a in doc(“co")/CUSTOMER_ORDERSreturn local:depth($a)

• “local” prefix is reserved for user-defined functions

82

Global Variables

• Also declared in the query prolog

declare variable $threshold := 2;

for $order in doc(“co")//ORDER

let $totalQty := sum($order//QTY)

where $totalQty > $threshold

return $order

• Can be used to parameterize your queries

83

XQuery and XML Schemas

• XML Schemas can be used within XQuery to validate:– Input documents– Query Result

import schema namespace in="http://www.cse.buffalo.edu/in" at “in.xsd";

import schema namespace out="http://www.cse.buffalo.edu/out" at “out.xsd";

validate{ <out:CUSTOMER_ORDERS> { for $custs in doc(“co”)/in:CUSTOMER_ORDERS/* return $custs } </out:CUSTOMER_ORDERS>}

84

References

• XQuery Tutorial– Yannis Papakonstantinou

– http://www.db.ucsd.edu/people/yannis/XQueryTutorial.htm

• W3C's XQuery homepage– http://www.w3.org/XML/Query/

• XML School– http://www.w3schools.com