Post on 27-Dec-2015
StructuredStructured-Document -Document Processing LanguagesProcessing Languages
Spring 2011 Spring 2011
Course ReviewCourse Review
Repetitio mater studiorum est!Repetitio mater studiorum est!
SDPL 2011 Course Review 2
Goals of the CourseGoals of the Course
Learn about central models and languages for Learn about central models and languages for – manipulatingmanipulating– transforming and transforming and – querying querying
structured documents (or XML)structured documents (or XML)
"Generic XML processing technology""Generic XML processing technology"
SDPL 2011 Course Review 3
XML?XML?
ExtensibleExtensible Markup Language Markup Language is is notnot a markup a markup language! language! – does not fix a tag set nor its semantics does not fix a tag set nor its semantics
(like markup languages like HTML do)(like markup languages like HTML do)
XML XML isis– A way to use markup to represent informationA way to use markup to represent information– A A metalanguagemetalanguage
» supports definition of specific markup languages through XML supports definition of specific markup languages through XML DTDs or SchemasDTDs or Schemas
» E.g. XHTML a reformulation of HTML using XMLE.g. XHTML a reformulation of HTML using XML
SDPL 2011 Course Review 4
XML Encoding of Structure: XML Encoding of Structure: ExampleExample
<S><S>
SS
EE
<W A="1"><W A="1"> <W><W></W></W> <E b=‘2’ /><E b=‘2’ />HiHi there!there!
WW
HiHi
WW
there!there!
</W></W> </S></S>
b=b=22
A=A=11
SDPL 2011 Course Review 5
Basics of XML DTDsBasics of XML DTDs
A A Document Type DeclarationDocument Type Declaration provides a provides a grammar (grammar (document type definitiondocument type definition,, DTD DTD) for a ) for a class of documentsclass of documents
Syntax (in the prolog of document instance):Syntax (in the prolog of document instance):<!DOCTYPE rootElemType SYSTEM "ex.dtd"<!DOCTYPE rootElemType SYSTEM "ex.dtd"<!-- "<!-- "external subsetexternal subset" in file ex.dtd --> " in file ex.dtd -->
[ <!-- "[ <!-- "internal subsetinternal subset" may come here --" may come here --> >
]>]> DTD = union of the external and internal subsetDTD = union of the external and internal subset
SDPL 2011 Course Review 6
Example DTDExample DTD
<!ELEMENT invoice (client, item+)><!ELEMENT invoice (client, item+)>
<!ATTLIST invoice num NMTOKEN #REQUIRED><!ATTLIST invoice num NMTOKEN #REQUIRED>
<!ELEMENT client (name, email?)> <!ELEMENT client (name, email?)>
<!ATTLIST client num NMTOKEN #REQUIRED><!ATTLIST client num NMTOKEN #REQUIRED>
<!ELEMENT name (#PCDATA)> <!ELEMENT name (#PCDATA)>
<!ELEMENT email (#PCDATA)> <!ELEMENT email (#PCDATA)>
<!ELEMENT item (#PCDATA)><!ELEMENT item (#PCDATA)>
<!ATTLIST item <!ATTLIST item
priceprice NMTOKEN #REQUIREDNMTOKEN #REQUIRED
unit (FIM | EUR) ”EUR” >unit (FIM | EUR) ”EUR” >
Lang/Model Purpose Lang/Model Purpose Structure of Structure of Processing Processing (i) docs (i) docs model model (ii) model (ii) model
SDPL 2011 Course Review 7
ReviewReview
SDPL 2011 Course Review 8
XML NamespacesXML Namespaces
<xsl:stylesheet version=<xsl:stylesheet version="1.0""1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns="http://www.w3.org/TR/xhtml1/strict">xmlns="http://www.w3.org/TR/xhtml1/strict">
<!-- XHTML is the ’default namespace’ --><!-- XHTML is the ’default namespace’ --><xsl:template match="doc/title"><xsl:template match="doc/title"> <h1><h1>
<xsl:apply-templates /><xsl:apply-templates /> </h1> </h1> </xsl:template> </xsl:template>
</xsl:stylesheet> </xsl:stylesheet>
SDPL 2011 Course Review 9
3. XML Processor APIs3. XML Processor APIs
How can applications manipulate structured How can applications manipulate structured (XML) documents?(XML) documents?– An overview of XML processor interfacesAn overview of XML processor interfaces
3.1 SAX: an event-based interface3.1 SAX: an event-based interface
3.2 DOM: an object-based interface3.2 DOM: an object-based interface
3.3 JAXP: Java API for XML Processing3.3 JAXP: Java API for XML Processing
3.4 StAX: Streaming API for XML3.4 StAX: Streaming API for XML
SDPL 2011 Course Review 10
A SAX-based applicationA SAX-based application
Application Main Application Main RoutineRoutine
startDocument()startDocument()
startElement()startElement()
characters()characters()
parse()parse()
Callback
Callback
Routines
Routines
endElement()endElement() <A i="1"><A i="1"> </A></A>Hi!Hi!
"A",[i="1"]"A",[i="1"]
"Hi!""Hi!"
"A""A"<?xml version='1.0'?><?xml version='1.0'?>
SDPL 2011 11Course Review
Lang/Model Purpose Lang/Model Purpose Structure of Structure of Processing Processing (i) docs (i) docs model model (ii) model (ii) model
SDPL 2011 Course Review 12
ReviewReview
SDPL 2011 Course Review 13
3.4 Streaming API for XML (StAX)3.4 Streaming API for XML (StAX)
Event-driven streaming API, like SAXEvent-driven streaming API, like SAX "Pull API""Pull API"
– lets the application to ask for individual eventslets the application to ask for individual events
Two sets of APIs:Two sets of APIs: – "cursor" ("cursor" (XMLStreamReaderXMLStreamReader), and "iterator" (), and "iterator" (XMLEventReaderXMLEventReader))
Bidirectional: Bidirectional: – XMLStreamWriterXMLStreamWriter or or XMLEventWriter XMLEventWriter support support
"marshaling" data into XML"marshaling" data into XML
SDPL 2011 Course Review
A Pull-Parsing ApplicationA Pull-Parsing Application
ApplicationApplication EventReaderEventReader.nextEvent().nextEvent()
Parser APIParser API
<?xml version='1.0'?><?xml version='1.0'?>
StartDocumentStartDocument
Hi!Hi!
CharactersCharacters "Hi!""Hi!"
</A></A>EndElementEndElement "A""A" <A i="1"><A i="1">
StartElementStartElement
"A",[i="1"]"A",[i="1"]
14
SDPL 2011 Course Review 15
DOM: What is it? DOM: What is it?
Object-based, language-neutral API for XML and Object-based, language-neutral API for XML and HTML documentsHTML documents
– Allows programs/scripts to Allows programs/scripts to » build build » navigate and navigate and » modify documentsmodify documents
““DDirectly irectly OObtainable in btainable in MMemory” vs emory” vs ““SSerial erial AAccess ccess XXML”ML”
SDPL 2011 Course Review 16
<invoice form="00" <invoice form="00" type="estimated">type="estimated"> <addressdata><addressdata> <name>John Doe</name><name>John Doe</name> <address><address> <streetaddress>Pyynpolku 1<streetaddress>Pyynpolku 1 </streetaddress></streetaddress> <postoffice>70460 KUOPIO<postoffice>70460 KUOPIO </postoffice></postoffice> </address></address> </addressdata></addressdata> ......
DOM structure modelDOM structure model
invoiceinvoice
namename
addressdataaddressdata
addressaddress
form="00"form="00"type="estimated"type="estimated"
John DoeJohn Doe streetaddressstreetaddress postofficepostoffice
70460 KUOPIO70460 KUOPIOPyynpolku 1Pyynpolku 1
......
DocumentDocument
ElementElement
NamedNodeMapNamedNodeMap
TextText
SDPL 2011 17Course Review
Lang/Model Purpose Lang/Model Purpose Structure of Structure of Processing Processing (i) docs (i) docs model model (ii) model (ii) model
SDPL 2011 Course Review 18
ReviewReview
SDPL 2011 Course Review 19
Trans form ation P rocess
O utput P ro cess
X M L
T ext
H T M L
S tyleS heet
SourceDocument
Sourc e TreeR esult T ree
XSLT TransformationsXSLT Transformations
Lang/Model Purpose Lang/Model Purpose Structure of Structure of Processing Processing (i) docs (i) docs model model (ii) model (ii) model
SDPL 2011 Course Review 20
ReviewReview
SDPL 2011 Course Review 21
JAXP (Java API for XML JAXP (Java API for XML Processing)Processing)
Interface for “plugging-in” and using XML Interface for “plugging-in” and using XML processors in standard Java applicationsprocessors in standard Java applications– org.xml.saxorg.xml.sax:: SAX 2.0 SAX 2.0– javax.xml.streamjavax.xml.stream: StAX : StAX – org.w3c.domorg.w3c.dom:: DOM Level 2 (+ Level 3 Core) DOM Level 2 (+ Level 3 Core)– javax.xml.parsersjavax.xml.parsers::
initialization and use of parsersinitialization and use of parsers– javax.xml.transformjavax.xml.transform::
initialization and use of XSLT transformers initialization and use of XSLT transformers
SDPL 2011 Course Review 22
XMLXML
.getXMLReader().getXMLReader()
JAXP: Using a SAX parser (1)JAXP: Using a SAX parser (1)
f.xmlf.xml
.parse(.parse( ” ”f.xml”)f.xml”)
.newSAXParser().newSAXParser()
SDPL 2011 Course Review 23
f.xmlf.xml
JAXP: Using a DOM parser (1)JAXP: Using a DOM parser (1)
.parse(”f.xml”).parse(”f.xml”)
.newDocument().newDocument()
.newDocumentBuilder().newDocumentBuilder()
SDPL 2011 Course Review 24
XSLTXSLT
JAXP: Using Transformers (1)JAXP: Using Transformers (1)
.newTransformer(…).newTransformer(…)
.transform(.,.).transform(.,.)
Lang/Model Purpose Lang/Model Purpose Structure of Structure of Processing Processing (i) docs (i) docs model model (ii) model (ii) model
SDPL 2011 Course Review 25
ReviewReview
SDPL 2011 Course Review 26
W3C XQueryW3C XQuery
Functional expression languageFunctional expression language– A query is a side-effect-free A query is a side-effect-free expressionexpression
Operates on Operates on sequencessequences of items of items– XML nodes or atomic valuesXML nodes or atomic values
Strongly-typedStrongly-typed: : (XML Schema) types may be assigned to (XML Schema) types may be assigned to expressions statically, and results can be validated expressions statically, and results can be validated
Extends XPath 2.0Extends XPath 2.0 ((but not all axesbut not all axes required) required)
– common for common for XQuery 1.0 and XPath 2.0:XQuery 1.0 and XPath 2.0:» Functions and OperatorsFunctions and Operators, W3C Rec. 01/2007, W3C Rec. 01/2007
Roughly: XQuery Roughly: XQuery XPath 2.0 + XSLT' + SQL' XPath 2.0 + XSLT' + SQL'
SDPL 2011 Course Review 27
FLWOR ("flower") ExpressionsFLWOR ("flower") Expressions
forfor, , letlet, , wherewhere, , order byorder by and and returnreturn clauses clauses (~SQL (~SQL selectselect--fromfrom--wherewhere))
Form: Form: (ForClause | LetClause)+ (ForClause | LetClause)+ WhereClause? WhereClause? OrderByClause?OrderByClause?""returnreturn" Expr" Expr
binds variables to values, and uses these binds variables to values, and uses these bindings to construct a result bindings to construct a result (an ordered sequence of items)(an ordered sequence of items)
SDPL 2011 Course Review 28
XQuery ExampleXQuery Example
forfor $pn $pn in distinct-valuesin distinct-values((docdoc(”sp.xml”)//pno)(”sp.xml”)//pno)
letlet $sp:= $sp:=docdoc(”sp.xml”)//sp_tuple[pno=$pn](”sp.xml”)//sp_tuple[pno=$pn]
where countwhere count($sp) >= 3($sp) >= 3
order byorder by $pn $pn
returnreturn
<well_supplied_item><well_supplied_item>
<pno><pno>{{$pn$pn}}</pno></pno>
<avgprice> <avgprice> {avg{avg($sp/price)($sp/price)}} </avgprice> </avgprice>
</well_supplied_item> </well_supplied_item>
Lang/Model Purpose Lang/Model Purpose Structure of Structure of Processing Processing (i) docs (i) docs model model (ii) model (ii) model
SDPL 2011 Course Review 29
ReviewReview
SDPL 2011 Course Review 30
Course Main MessageCourse Main Message
XML is a universal way to represent XML is a universal way to represent information as tree-like data structures information as tree-like data structures
There are specialized and powerful There are specialized and powerful technologies for processing ittechnologies for processing it– hype has settledhype has settled– R&D still going onR&D still going on