XQuery and Hierarchical Naming Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet...

Post on 27-Dec-2015

215 views 1 download

Transcript of XQuery and Hierarchical Naming Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet...

XQuery and Hierarchical Naming

Zachary G. IvesUniversity of Pennsylvania

CIS 455 / 555 – Internet and Web Systems

February 7, 2008

2

Today

Reminder: Homework 1 due 2/12 @ 11:59PM

XQuery and joins

Addressing vs. naming

Hierarchical names

3

XQuery’s Basic Form

The model: bind nodes (or node sets) to variables; operate over each legal combination of bindings; produce a set of nodes

“FLWOR” statement pattern:for {iterators that bind variables}let {collections}where {conditions}order by {order-conditions}return {output constructor}

4

Example XML DataRoot

?xml dblp

mastersthesisinproceedings

mdate key

author title year

school

author title yearcrossref ee

mdatekey

2002…

ms/Brown92

Kurt Brown

PRPL…

1992

wisc

2002..

conf/sigm../

Paul R.

On…

sigmod-97

1997

www…

university

namekey

wisc

Wisconsin

country

USA

5

XQuery and Joins

for $i in doc (“dblp.xml”)/dblp/inproceedings, $r in $i/crossref/text(), $c in doc (“dblp.xml”)/dblp/conf, $n in $c/@name

where $c = $rreturn <result>{ $i, $c }</result>

6

Some Uses for Join in XML

Translation between values SSN PennID

Joining or combining information Amazon invoice info + UPS tracking info

Restructuring information <author><book>…</book>

<book>..</book></author> <book><author>…</author> <author>…</author></book>

Here, we separate authors from books, then join them back in “upside-down” fashion

7

Changing Nesting of XML Content

Re-nesting XML trees is a common operationSimply nest the query blocks and correlate them – similar to

join

for $u in doc(“dblp.xml”)/dblp/university, $n = $u/name/text(), $k = $u/@key

where $u/country = “USA”return <ms-theses-92-by-univ>

{ $n } { for $mt in $u/../mastersthesis, $inst in $mt/school/text() where $mt/year/text() = “1992” and _______________ return $mt/title} </ms-theses-92-by-univ>

8

Collections & Aggregation in XQuery

Given a collection, we can compute an average, count, etc. of its members:

<article-authors>{

for $paper in doc(“dblp.xml”)/dblp/inproceedingslet $pauth := $paper/authorreturn <paper> { $paper/title }

<count> { fn:count($pauth) } </count>

</paper>} </article-authors>

a collection

9

Sorting in XQuery

We can order the sequence of “result tuples” output by the return clause:

for $x in doc(“dblp.xml”)/proceedingsorder by $x/title/text()return $x

10

Querying & Defining Tags

Can get a node’s name by querying node-name():for $x in document(“dblp.xml”)/dblp/*return node-name($x)

Can construct elements and attributes using computed names:

for $x in document(“dblp.xml”)/dblp/*,$year in $x/year,$title in $x/title/text(),

element { node-name($x) } {attribute {“year-” + $year} { $title }

}

11

XQuery Summary Very flexible and powerful language for XML

Focus is on database-style operations like joins Performs tasks that can’t be done with XPath or XSLT and that

are tedious to program in Java: Integrating information from multiple sources Joins, based on correspondences of values Computing count, average, etc.

Today, XQuery is available: In RDBMSs (SQL Server, Oracle, DB2) and XML DBMS systems

(MarkLogic) As the basis of research prototypes for “XQuery full text” As the basis of “XQueryP” – a Web Services/AJAX programming

language based on XQuery but with programming language features

http://2006.xmlconference.org/programme/presentations/38.html

We will discuss data integration and middleware later in the course

12

Hierarchical Naming Schemes

Thus far, we’ve seen XPath as a hierarchical naming scheme “Content-based naming”: describe the

structure and values of a tree structure Assumption: XML tree resides in (or is being

sent to) one place

But hierarchy is often used for naming and location

13

How Do We Find Things on the Internet?

Generally, using one of three means: Addresses or locations: specify where something is,

assuming that we understand how to navigate Just like a physical address, we may still need a map! In the Internet, addresses are typically IP addresses – the

routers know the map Names: are mapped into addresses via lookup services

Best-known example on the Internet: DNS name Cell phone numbers, email addresses, etc. are becoming

names Content-based addressing/naming

The actual data value is somehow used to find its location The basis of publish-subscribe systems and peer-to-peer

architectures

14

The Simplest Way of Going fromNames or Content Locations

Directory-based lookup protocols are very common

Examples: Napster 1.0 – peer-to-peer storage with central

directory Inverted index – used to look up keywords in

information retrieval DNS – distributed hierarchical directory LDAP – hierarchical Directory Information Tree

15

Napster 1.0, ca 2002

Hybrid of peer-to-peer storage with central directory showing what’s currently available What are the trade-offs implicit in this model? Why did it

fail?

Napster.com

Peer1

Peer2

Peer3

jjackson-lame.mp3

bspears-oops.mp3

jjackson-lame.mp3

jjackson-lamebspears-oops

Directory

Other Services with Similar Directory + Peer Architectures

FolderSync – now owned by Microsoft Google Desktop Search with multiple

machines

BitTorrent trackers are quite similar (we’ll discuss BitTorrent more later)

16

17

Inverted Indices

A “forward index”: documents to words The “inverted index”: words to word-

occurrences

The basis of most information retrieval engines, Google, etc. Can handle positional predicates … But how can we reconstruct previews?

18

Naming People and Devices: LDAP

Lightweight Directory Access Protocol Hierarchical naming system that can be

partitioned and replicated

19

LDAP’s Schema

LDAP information has an XML-like schema: A unique name in LDAP is called a Distinguished Name,

“dn” and consists of a sequence of attributes representing a hierarchy, from most-specific to least-specific (as in DNS names):

o = organization; dc = domain component ou = organizational unit uid = user ID cn = common name

c = country; st = state; l = locality

Can also have objectClass – the type of entity

20

LDAP Hierarchy

Brad Marshall LDAP Tutorial, quark.humbug.au/publications/ldap_tut.html

21

Querying LDAP

LDAP queries are mostly attribute-value predicates: uid=zives; o=upenn; c = usa

(|(cn=Susan Davidson)(cn=Zachary Ives)(cn=Val Tannen))

objectclass=posixAccount

(!cn=Val Tannen)

How does this differ from XPath? How might we process these queries?

22

The Backbone of Internet Naming:Domain Name Service

A simple, hierarchical name system with a distributed database – each domain controls its own names

edu

columbia upenn berkeley

com

www cis sas

www wwwwww

amazon

www

……

……

…… …

Top LevelDomains

23

Top-Level Domains (TLDs)

Mostly controlled by Network Solutions, Inc. today .com: commercial .edu: educational institution .gov: US government .mil: US military .net: networks and ISPs (now also a number of other

things) .org: other organizations 244, 2-letter country suffixes, e.g., .us, .uk, .cz, .tv, … and a bunch of new suffixes that are not very common,

e.g., .biz, .name, .pro, …

24

Finding the Root

13 “root servers” store entries for all top level domains (TLDs)

DNS servers have a hard-coded mapping to root servers so they can “get started”

25

Excerpt from DNS Root Server Entries

This file is made available by InterNIC registration services under anonymous FTP as ; file /domain/named.root ; ; formerly NS.INTERNIC.NET ; . 3600000 IN NS A.ROOT-

SERVERS.NET. A.ROOT-SERVERS.NET. 3600000 A 98.41.0.4 ; ; formerly NS1.ISI.EDU ; . 3600000 NS B.ROOT-

SERVERS.NET.B.ROOT-SERVERS.NET. 3600000 A 128.9.0.107 ; ; formerly C.PSI.NET ; . 3600000 NS C.ROOT-

SERVERS.NET.C.ROOT-SERVERS.NET. 3600000 A 192.33.4.12

(13 servers in total, A through M)

26

Supposing We Were to Build DNS

How would we start? How is a lookup performed?

(Hint: what do you need to specify when you add a client to a network that doesn’t do DHCP?)

27

Issues in DNS

We know that everyone wants to be “my-domain”.com How does this mesh with the assumptions

inherent in our hierarchical naming system?

What happens if things move frequently? What happens if we want to provide

different behavior to different requestors (e.g., Akamai)?

28

Next Time…

We’ll look at alternative mechanisms for finding things: Publish-subscribe models Gossip protocols, such as in routers Flooding … and soon, peer-to-peer or content-based

routing