XMLDB M2 Stanford

download XMLDB M2 Stanford

of 49

Transcript of XMLDB M2 Stanford

  • 7/25/2019 XMLDB M2 Stanford

    1/49

    1

    Module 2Module 2

    XML BasicsXML Basics(XML, Namespaces,(XML, Namespaces,Usage scenarios, DTDs)Usage scenarios, DTDs)

  • 7/25/2019 XMLDB M2 Stanford

    2/49

    2

    History: SGML vs. HTML vs.History: SGML vs. HTML vs.

    XMLXML

    SGML (1960)

    XML(1996)

    HTML(1990) XHTML(2000)

    http://www.w3.org/TR/2006/REC-xml-20060816/

  • 7/25/2019 XMLDB M2 Stanford

    3/49

    3

    Why XML ?Why XML ?

    HTML is to be interpreted by browsersHTML is to be interpreted by browsers Shown on the screen to a humanShown on the screen to a human

    Desire to separate the content fromDesire to separate the content frompresentationpresentation Presentation has to please the human eyePresentation has to please the human eye

    Content can be interpreted by machines, forContent can be interpreted by machines, formachines presentation is a handicapmachines presentation is a handicap

    Semantic markup of the dataSemantic markup of the data

  • 7/25/2019 XMLDB M2 Stanford

    4/49

    4

    Information about a book inInformation about a book in

    HTMLHTMLPolitics of experience by Ronald Laing,Politics of experience by Ronald Laing,

    published in 1967published in 1967 Item Item

    number:320070381076cellspacing="0" width="100%">

  • 7/25/2019 XMLDB M2 Stanford

    5/49

    5

    The same information in XMLThe same information in XML

    Elements

    Information is (1) decoupled from presentation, then (2)chopped into smaller pieces, and then (3) marked with

    semantic meaningIt can be processed by machinesLike H!L, only synta", not logical abstract data model

  • 7/25/2019 XMLDB M2 Stanford

    6/49

    6

    XML key conceptsXML key concepts

    DocumentsDocuments

    ElementsElements

    AttributesAttributes Namespace declarationsNamespace declarations

    TextText

    CommentsComments Processing InstructionsProcessing Instructions

    All inherited from SGML, then HTMLAll inherited from SGML, then HTML

  • 7/25/2019 XMLDB M2 Stanford

    7/49

    7

    The key concepts of XMLThe key concepts of XML

    Elements

    #ocuments$lements%ttributes

    e"t

    &ested structure'onceptual treerder is importantnly characters*, not integers, etc

  • 7/25/2019 XMLDB M2 Stanford

    8/49

    8

    ElementsElements

    Enclosed in TagsEnclosed in Tags Begin Tag: e.g.,Begin Tag: e.g., End Tag: e.g.,End Tag: e.g., Element without content: e.g.,Element without content: e.g.,is ais a

    shorthand forshorthand for

    Elements can be nestedElements can be nested Wilde Wutz Wilde Wutz

    Subelements can implement multisetsSubelements can implement multisets ... ... ... ... Order is important !Order is important ! Documents must be well-formedDocuments must be well-formedis forbidden!is forbidden!

    is forbidden!is forbidden!

  • 7/25/2019 XMLDB M2 Stanford

    9/49

    9

    AttributesAttributes Attribute are associated to ElementsAttribute are associated to Elements ... ... ... ...

    Elements can have only attributesElements can have only attributes Attribute names must be unique! (No Multisets)Attribute names must be unique! (No Multisets)is illegal!is illegal!

    What is the difference between a nested elementWhat is the difference between a nested elementand an attribute? Are attributes useful?and an attribute? Are attributes useful? Modeling decision: should name be an attributeModeling decision: should name be an attributeor a subelement of a person ? What about age ?or a subelement of a person ? What about age ?

  • 7/25/2019 XMLDB M2 Stanford

    10/49

    10

    Text and Mixed ContentText and Mixed Content Text appears in element contentText appears in element content

    The politics of experienceThe politics of experience

    Can be mixed with other subelementsCan be mixed with other subelements The politics of experienceThe politics of experience

    Mixed ContentMixed Content For documents data -- very usefulFor documents data -- very useful The need does not arise in data processing, only entitiesThe need does not arise in data processing, only entitiesand relationshipsand relationships

    People speak in sentences, not entities and relationships.People speak in sentences, not entities and relationships.XML allows to preserve the structure of natural language,XML allows to preserve the structure of natural language,while adding semantic markup that can be interpreted bywhile adding semantic markup that can be interpreted bymachines.machines.

  • 7/25/2019 XMLDB M2 Stanford

    11/49

    11

    Continuous spectrum betweenContinuous spectrum between

    natural language, semi-structurednatural language, semi-structured

    data, and structured datadata, and structured data1.1. Dana said that the book entitledDana said that the book entitledThe politics ofThe politics ofexperience is really excellent !experience is really excellent !

    2.2. The book entitledThe book entitledThe politicsThe politicsexperience is really excellent !experience is really excellent !

    3.3. The book entitledThe book entitledTheThepolitics of experiencepolitics of experienceis really excellent !is really excellent !

  • 7/25/2019 XMLDB M2 Stanford

    12/49

    12

    CDATA sectionsCDATA sections

    Sometimes we would like to preserve theSometimes we would like to preserve theoriginal characters, and not interpret them asoriginal characters, and not interpret them asmarkupmarkup

    CDATA sectionsCDATA sections Not parsed as XMLNot parsed as XML

    Hello,world!Hello,world!

    Hello,Hello,

    world!]]>world!]]>

  • 7/25/2019 XMLDB M2 Stanford

    13/49

    13

    Comments, PIs, PrologComments, PIs, Prolog Comment: Syntax as in HTMLComment: Syntax as in HTML

    Processing InstructionsProcessing Instructions Contain no data - interpretation by processorContain no data - interpretation by processor

    Syntax:Syntax: Pause isPause isTarget;Target;10secs10secsis Contentis Content XMLXMLis a reserved target for prologis a reserved target for prolog

    PrologProlog Standalone defines whether there is a DTDStandalone defines whether there is a DTD Encoding is usually Unicode.Encoding is usually Unicode.

  • 7/25/2019 XMLDB M2 Stanford

    14/49

    14

    Whitespaces declarationWhitespaces declaration

    Whitespace = Continuous sequence ofWhitespace = Continuous sequence ofSpaceSpace,,TabTabandandReturnReturncharactercharacter Special AttributeSpecial Attributexml:spacexml:spaceto control useto control use Human-readible XML (with Whitespace)Human-readible XML (with Whitespace)

    The politics of experienceThe politics of experienceRonald laingRonald laing

    (Efficient) machine-readible XML (no WS)(Efficient) machine-readible XML (no WS)The politics ofexperienceRonaldexperienceRonaldLaingLaing

    Performance improvement: ca. Factor 2.Performance improvement: ca. Factor 2.

  • 7/25/2019 XMLDB M2 Stanford

    15/49

    15

    Language declarationLanguage declaration

    The quickThe quick

    brown fox jumps over the lazybrown fox jumps over the lazy

    dog.

    dog.

    What colourWhat colour

    is it?

    is it?

    What colorWhat color

    is it?

    is it?
  • 7/25/2019 XMLDB M2 Stanford

    16/49

    16

    Universal Resource IdentifiersUniversal Resource Identifiers

    on the Webon the Web

    URLs, URIs, IRIsURLs, URIs, IRIs URL (Universal Resource Locators):URL (Universal Resource Locators):deferenceable identifier on thedeferenceable identifier on theWebWeb The target of an URL pointer is an HTML file (virtual or materialized)The target of an URL pointer is an HTML file (virtual or materialized)

    URIs (Unique Resource Identifier):URIs (Unique Resource Identifier):general purpose key to resourcesgeneral purpose key to resourceson the Webon the Web Uniquely identifies a resourceUniquely identifies a resource Target is not an HTML file, can be anything (schema, table, file, entity, object,Target is not an HTML file, can be anything (schema, table, file, entity, object,tuple, person, physical item, etc)tuple, person, physical item, etc)

    Lifetime and scope of this key is user dependentLifetime and scope of this key is user dependent

    IRI (Internationalized Resource Identifiers)IRI (Internationalized Resource Identifiers)

    Allow non Latin characters (Chinese, Arabic, Japanese, etc)Allow non Latin characters (Chinese, Arabic, Japanese, etc) URL, URI, IRIsURL, URI, IRIs

    All stringsAll strings Very LONG stringsVery LONG strings

  • 7/25/2019 XMLDB M2 Stanford

    17/49

    17

    NamespacesNamespaces Integration of Data from diverse data sourcesIntegration of Data from diverse data sources

    Integration of different XML Vocabularies (aka Namespaces)Integration of different XML Vocabularies (aka Namespaces) Each vocabulary has a unique key, identified by a URI/IRIEach vocabulary has a unique key, identified by a URI/IRI Same local name, from different vocabularies can haveSame local name, from different vocabularies can have

    Different meaningDifferent meaning Different structure associated with itDifferent structure associated with it

    Qualified Names (Qname) to attach a name to its vocabularyQualified Names (Qname) to attach a name to its vocabulary for all nodes in an XML document that has names (Attributes, Elements,for all nodes in an XML document that has names (Attributes, Elements,PisPis

    QNameQName::= triple ( URI::= triple ( URI[ prefix: ][ prefix: ]localname )localname )

    Binding (prefix, URI) is introduced in elements start tagBinding (prefix, URI) is introduced in elements start tag Later only the prefix is used, not the long URIsLater only the prefix is used, not the long URIs Prefix is optional, default namespacesPrefix is optional, default namespaces Prefix and localname a separated by :Prefix and localname a separated by :

    http://w3.org/TR/1999/REC-xml-nameshttp://w3.org/TR/1999/REC-xml-names

  • 7/25/2019 XMLDB M2 Stanford

    18/49

    18

    Namespaces (cont)Namespaces (cont)

    Namespace definitions look like AttributesNamespace definitions look like Attributes Identified by xmlns:prefix or xmlns (default)Identified by xmlns:prefix or xmlns (default) Bind the Prefix to the URIBind the Prefix to the URI

    Scope is the entire element where theScope is the entire element where thenamespace is declarednamespace is declared Includes the element itslef, its attributes and istIncludes the element itslef, its attributes and istsubtreessubtrees

    ExampleExample

  • 7/25/2019 XMLDB M2 Stanford

    19/49

    19

    Default namespacesDefault namespaces

    Default namespaces, no prefixDefault namespaces, no prefix

    Only applies to subelements, not attributesOnly applies to subelements, not attributes

  • 7/25/2019 XMLDB M2 Stanford

    20/49

    20

    Example: NamespacesExample: Namespaces

    DQ1 definesDQ1 definesdishdishforforchinachina Diameter, Volume, Decor, ...Diameter, Volume, Decor, ...

    DQ2 definesDQ2 definesdishdishforforsatellitessatellites

    Diameter, FrequencyDiameter, Frequency How many dishes are there?How many dishes are there?

    Better ask for:Better ask for: How manyHow manydishesdishesare there?are there? or or

    How manyHow manydishesdishesare thereare there??

  • 7/25/2019 XMLDB M2 Stanford

    21/49

    21

    Example: NamespacesExample: Namespaces

    2020

    55

    MeissnerMeissner

    200200

    20-2000MHz20-2000MHz

  • 7/25/2019 XMLDB M2 Stanford

    22/49

    22

    Mixing Several NamespacesMixing Several Namespaces

    2020

    55

    MeissnerMeissner

    This is an unqualified element nameThis is an unqualified element name

  • 7/25/2019 XMLDB M2 Stanford

    23/49

    23

    Example XML dataExample XML data

    XHTML (browser/presentation)XHTML (browser/presentation) RSS (blogs)RSS (blogs) UBL (Universal Business Language)UBL (Universal Business Language) HealthCare Level 7 (medical data)HealthCare Level 7 (medical data)

    XBRL (financial data)XBRL (financial data) Digital photography metadata (XMP)Digital photography metadata (XMP) XMI (metadata)XMI (metadata) XQueryX (programs)XQueryX (programs)

    XForms (forms)XForms (forms) SOAP (message envelopes)SOAP (message envelopes) Microsoft Office -- Powerpoint in XML (documents)Microsoft Office -- Powerpoint in XML (documents)

  • 7/25/2019 XMLDB M2 Stanford

    24/49

    24

    XHTMLXHTML

    QuickTime and aTIFF (Uncompressed) decompressor

    are needed to see this picture.

  • 7/25/2019 XMLDB M2 Stanford

    25/49

    25

    RSS, blogsRSS, blogs XML.com http://xml.com/pub XML.com features a rich mix ofinformation and services for the XML community.

    XML.com http://www.xml.com

    http://xml.com/universal/images/xml_tiny.gif

  • 7/25/2019 XMLDB M2 Stanford

    26/49

    26

    UBL (Universal BusinessUBL (Universal Business

    Language)Language) Vocabularies definitions for:Vocabularies definitions for:

    ApplicationResponseAttachedDocumentBillOfLadingApplicationResponseAttachedDocumentBillOfLading

    CatalogueCatalogueDeletionCatalogueItemSpecificCatalogueCatalogueDeletionCatalogueItemSpecific

    ationUpdateCataloguePricingUpdateCatalogueRequationUpdateCataloguePricingUpdateCatalogueRequ

    estCertificateOfOriginCreditNoteDebitNoteDespatcestCertificateOfOriginCreditNoteDebitNoteDespatc

    hAdviceForwardingInstructionsFreightInvoiceInvoichAdviceForwardingInstructionsFreightInvoiceInvoic

    eOrderOrderCancellationOrderChangeOrderRespeOrderOrderCancellationOrderChangeOrderResp

    onseOrderResponseSimplePackingListQuotationRonseOrderResponseSimplePackingListQuotationR

    eceiptAdviceReminderRemittanceAdviceRequestFoeceiptAdviceReminderRemittanceAdviceRequestForQuotationSelfBilledCreditNoteSelfBilledInvoiceStatrQuotationSelfBilledCreditNoteSelfBilledInvoiceStat

    ementTransportationStatusWaybillementTransportationStatusWaybill

  • 7/25/2019 XMLDB M2 Stanford

    27/49

    27

    HealthCareLevel 7HealthCareLevel 7

    Medical information that is being exchangedMedical information that is being exchangedbetween hospitals, patients, doctors,between hospitals, patients, doctors,pharmacies and insurance companiespharmacies and insurance companies

    http://en.wikipedia.org/wiki/HL7http://en.wikipedia.org/wiki/HL7

  • 7/25/2019 XMLDB M2 Stanford

    28/49

    28

    XBRL (Financial information)XBRL (Financial information)

    Goal: facilitate the exchange of businessGoal: facilitate the exchange of business

    and financial performance informationand financial performance information

    between companies, governments,between companies, governments,

    insurance companies, banks, etc.insurance companies, banks, etc. Mandate by law in many countriesMandate by law in many countries

    http://en.wikipedia.org/wiki/XBRLhttp://en.wikipedia.org/wiki/XBRL

  • 7/25/2019 XMLDB M2 Stanford

    29/49

    29

    Extensible Metadata PlatformExtensible Metadata Platform

    (XMP)(XMP)

    Used inUsed inPDFPDF,,photographyphotographyandandphoto editingphoto editingapplications.applications. ParticularParticularschemasschemasfor basic properties useful forfor basic properties useful for

    recording the history of a resource as it passes throughrecording the history of a resource as it passes throughmultiple processing steps, from being photographed,multiple processing steps, from being photographed,

    scannedscanned, or authored as text, through photo editing steps, or authored as text, through photo editing steps(such as(such ascroppingcroppingor color adjustment), to assembly intoor color adjustment), to assembly into

    a final image.a final image. XMP allows each software program or device along theXMP allows each software program or device along the

    way to add its own information to a digital resource, whichway to add its own information to a digital resource, whichcan then be retained in the final digital file.can then be retained in the final digital file. http://en.wikipedia.org/wiki/Extensible_Metadata_Plathttp://en.wikipedia.org/wiki/Extensible_Metadata_Platformform

    http://en.wikipedia.org/wiki/Photographyhttp://en.wikipedia.org/wiki/Portable_Document_Formathttp://en.wikipedia.org/wiki/Portable_Document_Formathttp://en.wikipedia.org/wiki/Photographyhttp://en.wikipedia.org/wiki/Photographyhttp://en.wikipedia.org/wiki/Graphics_softwarehttp://en.wikipedia.org/wiki/Graphics_softwarehttp://en.wikipedia.org/wiki/Schemahttp://en.wikipedia.org/wiki/Image_scannerhttp://en.wikipedia.org/wiki/Image_scannerhttp://en.wikipedia.org/wiki/Cropping_%28image%29http://en.wikipedia.org/wiki/Cropping_%28image%29http://en.wikipedia.org/wiki/Cropping_%28image%29http://en.wikipedia.org/wiki/Image_scannerhttp://en.wikipedia.org/wiki/Schemahttp://en.wikipedia.org/wiki/Graphics_softwarehttp://en.wikipedia.org/wiki/Photographyhttp://en.wikipedia.org/wiki/Portable_Document_Format
  • 7/25/2019 XMLDB M2 Stanford

    30/49

    30

    Microsoft Office in XMLMicrosoft Office in XML

    Office 2003 was able to import/export allOffice 2003 was able to import/export alldocuments into XMLdocuments into XML

    Office 2007 models the documents NATIVELY inOffice 2007 models the documents NATIVELY in

    XMLXML Examples of vocabularies and schemas:Examples of vocabularies and schemas:

    WordprocessingML (the XML file format forWordprocessingML (the XML file format for

    Word 2003! "preadsheetML (Excel 2003!Word 2003! "preadsheetML (Excel 2003!

    #orm$emplate XML schemas (%nfo&ath 2003#orm$emplate XML schemas (%nfo&ath 2003and 'ata'iagramingML (isio 2003and 'ata'iagramingML (isio 2003

  • 7/25/2019 XMLDB M2 Stanford

    31/49

    31

    Forms on the Web in XMLForms on the Web in XML

    XML Forms (Xforms)XML Forms (Xforms) http://www.w3.org/TR/xforms/http://www.w3.org/TR/xforms/

  • 7/25/2019 XMLDB M2 Stanford

    32/49

    32

    Programs and queries in XMLPrograms and queries in XML XQuery, the XML query language, has an XMLXQuery, the XML query language, has an XMLrepresentationrepresentation

    Programs and queries are also DATAPrograms and queries are also DATA Blurring the distinction between data, metadata, codeBlurring the distinction between data, metadata, code distinctdistinct

    documentdocument http://www.bn.comhttp://www.bn.com

    descendant-or-selfdescendant-or-self

    authorauthor

  • 7/25/2019 XMLDB M2 Stanford

    33/49

    33

    SOAP and Web ServicesSOAP and Web Services Web Services is the favorite way of exchanging informationWeb Services is the favorite way of exchanging information

    between applicationsbetween applications XML exchange over HTTP, with a specific protocol (SOAP)XML exchange over HTTP, with a specific protocol (SOAP)

    uuid:093a2da1-

    q345-739r-ba5d-pqff98fe8j7d 200q345-739r-ba5d-pqff98fe8j7d 200

    11-29T13:20:00.000-05:00 11-29T13:20:00.000-05:00

    ke Jgvan yvind

  • 7/25/2019 XMLDB M2 Stanford

    34/49

    34

    The need for XML schemasThe need for XML schemas Unlike any other data format, XML is totally flexible,Unlike any other data format, XML is totally flexible,

    elements can be nested in arbitrary wayselements can be nested in arbitrary ways We can start by writing the XML data -- no need for aWe can start by writing the XML data -- no need for apriori design of a schemapriori design of a schema Think relational databases, or Java classesThink relational databases, or Java classes

    However, schemas are necessary:However, schemas are necessary: Facilitate the writing of applications that process dataFacilitate the writing of applications that process data Constraint the data that is correct for a certain applicationConstraint the data that is correct for a certain application Have a priori agreements between parties with respect to theHave a priori agreements between parties with respect to thedata being exchangeddata being exchanged

    Schema: a model of the dataSchema: a model of the data Structural definitionsStructural definitions Type definitionsType definitions DefaultsDefaults

    HistoryandroleofXMLSchemaHistoryandroleofXMLSchema

  • 7/25/2019 XMLDB M2 Stanford

    35/49

    35

    History and role of XML SchemaHistory and role of XML Schema

    LanguagesLanguages Several standard Schema LanguagesSeveral standard Schema Languages

    DTDs, XML Schema, RelaxNGDTDs, XML Schema, RelaxNG

    Schema languages have been designed after, and inSchema languages have been designed after, and inan orthogonal fashion, to XML itselfan orthogonal fashion, to XML itself

    Schemas and data are completely decoupled in XMLSchemas and data are completely decoupled in XML Data can exist with or without schemasData can exist with or without schemas Or with multiple schemasOr with multiple schemas

    Schema evolutions rarely impose evolving the dataSchema evolutions rarely impose evolving the data

    Schemas can be designed before the data, or extracted fromSchemas can be designed before the data, or extracted fromthe data (DataGuide -- Stanford)the data (DataGuide -- Stanford)

    Makes XML the right choice for manipulating semi-Makes XML the right choice for manipulating semi-structured data, or rapidly evolving data, or highlystructured data, or rapidly evolving data, or highlycustomizable datacustomizable data

  • 7/25/2019 XMLDB M2 Stanford

    36/49

    36

    DTDsDTDs

    Inherited from SGMLInherited from SGML Part of the original XML 1.0 specificationPart of the original XML 1.0 specification Describe the grammar of the XML fileDescribe the grammar of the XML file

    Element declarations:Element declarations:how elements are allowed to nesthow elements are allowed to nest

    within each other by rules and constraintswithin each other by rules and constraints Attributes lists:Attributes lists:describe what attributes are allowed ondescribe what attributes are allowed onwhich elementwhich element

    Some constraints on the value of elements and attributesSome constraints on the value of elements and attributes Which is the root element of the XML fileWhich is the root element of the XML file

    Checking the structural constraints:Checking the structural constraints:DTD validationDTD validation(valid vs. invalid documents)(valid vs. invalid documents)

    DTD very useful for a while, not used anymore,DTD very useful for a while, not used anymore,several major limitationsseveral major limitations

  • 7/25/2019 XMLDB M2 Stanford

    37/49

    37

    Declaring the structure ofDeclaring the structure ofelementselements

    Grammar that describes the structure of the elementGrammar that describes the structure of the element Subelements, identified by Name orSubelements, identified by Name or #PCDATA#PCDATA

    Combinators :Combinators : + for at least 1+ for at least 1 * for 0 or more* for 0 or more ? for 0 or 1? for 0 or 1 , for concatenation, for concatenation | for choice| for choice

    PCDATA: only textual content allowedPCDATA: only textual content allowed EMPTY : the element must be emptyEMPTY : the element must be empty

    ANY: allows any contentANY: allows any content

  • 7/25/2019 XMLDB M2 Stanford

    38/49

    38

    Example DTD for recipesExample DTD for recipes

  • 7/25/2019 XMLDB M2 Stanford

    39/49

    39

    Defining the attribute listsDefining the attribute lists

    Structure:Structure:>

    name CDATA #REQUIRED name CDATA #REQUIRED amount CDATA #IMPLIED amount CDATA #IMPLIED unit CDATA #FIXED cupunit CDATA #FIXED cup>>

    CDATA means normal contentCDATA means normal content #REQUIRED, or #IMPLIED refer to the fact#REQUIRED, or #IMPLIED refer to the factthat the attribute is optional or notthat the attribute is optional or not

    Default value possibleDefault value possible

  • 7/25/2019 XMLDB M2 Stanford

    40/49

    40

    Attributes (cont.)Attributes (cont.)

    #REQUIRED#REQUIRED Document must specify a value for attributeDocument must specify a value for attribute

    #IMPLIED#IMPLIED Attribute is optional, there is no defaultAttribute is optional, there is no default

    valuevalue Default value, if no other value specifiedDefault value, if no other value specified

    #FIXED#FIXEDvaluevalue Default value, if no other value specifiedDefault value, if no other value specified If value specified, it must be the fixed valueIf value specified, it must be the fixed value

    M j ttibttM j ttibtt

  • 7/25/2019 XMLDB M2 Stanford

    41/49

    41

    Major attribute typesMajor attribute types

    PCDATA: normal Text contentPCDATA: normal Text content

    IDID Value is unique within documentValue is unique within document Element has at most one attribute of this typeElement has at most one attribute of this type No default values allowedNo default values allowed

    IDREF, IDREFSIDREF, IDREFS References to other elements within theReferences to other elements within thedocumentdocument

    IDREFS: Enumeration, as separatorIDREFS: Enumeration, as separator

  • 7/25/2019 XMLDB M2 Stanford

    42/49

    42

    ID and IDREF attributesID and IDREF attributes

    price CDATA #IMPLIED price CDATA #IMPLIED index IDREFS index IDREFS >>

  • 7/25/2019 XMLDB M2 Stanford

    43/49

    43

    Attributes list exampleAttributes list example

  • 7/25/2019 XMLDB M2 Stanford

    44/49

    44

    Mixed content in DTDsMixed content in DTDs

    Mixing PCDATA declarations with otherMixing PCDATA declarations with othersubelements means that the content can besubelements means that the content can bemixedmixed

    some text some emphasized

    some text some emphasized

    text blah some boldtext blah some bold

    text

    text
  • 7/25/2019 XMLDB M2 Stanford

    45/49

    45

    Declarations of DTDsDeclarations of DTDs

    No DTD (well-formed Documents)No DTD (well-formed Documents) DTD inside the Document:DTD inside the Document:

    DTD external, specified by URI:DTD external, specified by URI:

    DTD external, Name and optional URI:DTD external, Name and optional URI:

    DTD inside the document + external:DTD inside the document + external:

  • 7/25/2019 XMLDB M2 Stanford

    46/49

    46

    Correctness of XML documentsCorrectness of XML documents

    Well formedWell formeddocumentsdocuments Verify the basic XML constraints, e.g. Verify the basic XML constraints, e.g.

    Valid documentsValid documents Verify the additional DTD structural constraintsVerify the additional DTD structural constraints

    Non well formed XML documents cannot be processedNon well formed XML documents cannot be processed Non-valid documents can still be processed (queried,Non-valid documents can still be processed (queried,transformed, etc)transformed, etc)

  • 7/25/2019 XMLDB M2 Stanford

    47/49

    47

    Limitations of DTDsLimitations of DTDs

    DTDs describe only the grammar of the XMLDTDs describe only the grammar of the XMLfile, not the detailed structure and/or typesfile, not the detailed structure and/or types

    This grammatical description has some obviousThis grammatical description has some obvious

    shortcomings:shortcomings: we cannot express that a length element mustwe cannot express that a length element mustcontain a non-negative numbercontain a non-negative number(constraints on the(constraints on thetype of the value of an element or attribute)type of the value of an element or attribute)

    The unitThe unitelement should only be allowed whenelement should only be allowed when

    amountamountis presentis present(co-occurrence constraints)(co-occurrence constraints) the the commentcommentelement should be allowed toelement should be allowed toappear anywhereappear anywhere(schema flexibility)(schema flexibility)

    G d S h d i

  • 7/25/2019 XMLDB M2 Stanford

    48/49

    48

    Good Schema designprinciples

    The XML schema language shall be1. more e2pressi3e than 4ML +T+s

    5. e2pressed in 4ML

    6. se07descriin&

    4.usable by a wide variety of applications that employXML

    5.straightforwardly usable on the Internet

    6.optimized for interoperability

    7.simpleenough to be implemented with modestdesign and runtime resources

    8. coordinated 9ith ree3ant :6* specs

  • 7/25/2019 XMLDB M2 Stanford

    49/49

    RecapitulationRecapitulation

    XML as inheriting from the Web historyXML as inheriting from the Web history SGML, HTML, XHTML, XMLSGML, HTML, XHTML, XML

    XML key conceptsXML key concepts Documents, elements, attributes, textDocuments, elements, attributes, text

    Order, nested structure, textual informationOrder, nested structure, textual information NamespacesNamespaces XML usage scenariosXML usage scenarios

    Financial, medical, metadata, blogs, etcFinancial, medical, metadata, blogs, etc

    DTDs and the need for describing the structure ofDTDs and the need for describing the structure ofan XML filean XML file Next: XML SchemasNext: XML Schemas