SNUOOPSLA Lab.
XML Documents 1 : Structure
The ubiquitous XML(2)
© copyright 2001 SNU OOPSLA Lab.
2SNUOOPSLA Lab.The ubiquitous XML
XML Documents 1 : structure
Peeping into XML document at Physical view : Entity at logical view : DTD
XML document
3SNUOOPSLA Lab.The ubiquitous XML
Peeping into XML document(1/5)
<?xml version=“1.0” standalone=“yes”?>
<GREETING> Hello, XML!! <!--this is greeting--></GREETING>
Mark-updata
Mark-up and character data
XML document
4SNUOOPSLA Lab.The ubiquitous XML
<? xml version=“1.0” standalone=“yes” ?>
<!DOCUMENT DATE [ <!ELEMENT DATE (#PCDATA)>] >
<DATE> 001224</DATE>
XML document : date.xml
XML declarationxml 문서임을 선언 .<? 로 시작하여 ?> 로 끝난다 .
DTD(Document Type Definition)user 가 사용할 tag 를 정의한다 .여기서는 DATE tag 를 정의 .
Content
<!--This is date --> Comment : parser 는 이를 무시 .
XML document
Peeping into XML document(2/5)
5SNUOOPSLA Lab.The ubiquitous XML
Structure of XML document
physical structure : allows components of the document, called
entities
logical structure : allows a document to be divided into named
units and sub-units, called elements
Peeping into XML document(3/5)
XML document
Sub-unit
Unit
Document
elements
Logical Structure
entities(internal)(separate)
Physical Structure
SNUOOPSLA Lab.5
Peeping into XML document(4/5)
XML document
7SNUOOPSLA Lab.The ubiquitous XML
XML document
<person>
<name> kim </name>
<ID>771224</ID>
<office>301-453</office>
<phone>1830</phone>
<photo source=“k.jpg”/>
</person>
<person>
<name> kim </name>
<ID>771224</ID>
<office>301-453</office>
<phone>1830</phone>
<photo source= />
</person>
“k.jpg”
element
entity
Peeping into XML document(5/5)
8SNUOOPSLA Lab.The ubiquitous XML
XML Documents 1 : structure
Peeping into XML document at Physical view : Entity at logical view : DTD
9SNUOOPSLA Lab.The ubiquitous XML
Content of Physical structure
Entity Figures of Document Entity Defining an entity Grammar in Declaring Entity Examples of EntityDeclaration URL format
Physical structure
Entity (1/3) unit of physically isolating and storing any
part of a document ( 정보저장단위 ) Each unit of information is called an entity
entities(internal)(separate)
Physical Structure
<person><name> kim </name>
<ID>771224</ID>
<office>301-453</office>
<phone>1830</phone>
<photo source= />
</person>
“k.jpg”
entity
SNUOOPSLA Lab.
Physical structure
11SNUOOPSLA Lab.The ubiquitous XML
Entity (2/3) Purpose of Entity
contain all the information (well-formed XML data , other text file, binary
data…)
<person><name> kim </name> <ID>771224</ID>
<office>301-453</office><phone>1830</phone>
<photo source= />
</person>“k.jpg”
Document entity
Image entity
Physical structure
12SNUOOPSLA Lab.The ubiquitous XML
Entity (3/3) Internal Entity
해당 document 안에서 완전하게 정의되는 entity
External Entity URL 을 통해 알려진 외부의 source 로부터 그들의
content 를 받아 오는 entity
Physical structure
13SNUOOPSLA Lab.The ubiquitous XML
Figures of Document Entity
document entity(no entities)
document entity(main content)
A
A
B
C
D
document entity(framework file)
Physical structure
14SNUOOPSLA Lab.The ubiquitous XML
Defining an entity Entity must be defined before the first reference to
them in the data stream Declared in the DTD(Document Type Definition)
<!DOCTYPE DOCUMENT [
<!ENTITY EMAIL “[email protected]”> <!ENTITY TEXT “(#PCDATA)”>
]>
Entity definition in DTD
Physical structure
15SNUOOPSLA Lab.The ubiquitous XML
Example : EntityDeclaration(1/3)
Internal text entities <!ENTITY XML “eXtensible Markup Language”> <!ENTITY DemoEntity ‘The rule is 6” long.’>
Built-in entities ( 내장 entity) <!ENTITY sample “Use " and ‘as delimiters.”>
&li; >&'"
for ‘<‘for ‘>’for ‘&’for ‘ ’ ’for ‘ ” ’;
Physical structure
16SNUOOPSLA Lab.The ubiquitous XML
Example : EntityDeclaration(2/3)
External text entities <!ENTITY myent SYSTEM “/EMTS/MYENT.XML”> <!ENTITY myent PUBLIC “-//MyCorp//ENTITY
Syperscript Chars//EN”….>
Binary entities <!ENTITY Jsphoto SYSTEM “/ENTS/Jsphoto.tif”
NDATA “TIFF”>
Physical structure
Example : EntityDeclaration(3/3)
<!ENTITY ent9 SYSTEM “entities/entity9.xml”> /xml/document.xml/entities/entity9.xml
<!ENTITY ent9 SYSTEM “../entities/entity9.xml”>
/xml/docs/document.xml/ entities/entity9.xml
xml
document.xmlentities
entity9.xml
xml
entities
entity9.xml
docs
document.xml
URL format
SNUOOPSLA Lab.
Physical structure
18SNUOOPSLA Lab.The ubiquitous XML
XML Documents 1 : structure
Peeping into XML document at Physical view : Entity at logical view : DTD
19SNUOOPSLA Lab.The ubiquitous XML
Concepts DTD Structure Element Declaration Attribute Declarations Parameter Entities Conditional Sections Notation Declarations DTD Processing Issues
Content of Logical structure
logical structure
20SNUOOPSLA Lab.The ubiquitous XML
DTD(Document Type Definition) An optional but powerful feature of XML Comprises a set of declarations that define a
document structure tree XML processors read the DTD and check whether
the document is valid and use it to build the document model in memory
Describes user’s own tag set as meta markup language
Concepts of DTD(1/3)
logical structure
21SNUOOPSLA Lab.The ubiquitous XML
Concepts of DTD(2/3) DTD describes..
Element , attribute , notation , relation between each elements
Establishes formal document structure rules
22SNUOOPSLA Lab.The ubiquitous XML
Declare Vs. Define Declare “This document is a concert poster” Define “A concert poster must have the
following features” DTD define
Element type + Attribute + Entities Valid Vs. Invalid
Valid conforms to DTD Invalid fail to conform to DTD
Well formed XML Document
Valid XML Document
Concepts of DTD(3/3)
logical structure
23SNUOOPSLA Lab.The ubiquitous XML
Valid & Invalid Documents
Valid:<GREETING>various random text but no markup</GREETING>
Invalid: anything else including<GREETING>
<sometag>various random text</sometag> <someEmptyTag/>
<GREETING>
logical structure
Example: <!DOCTYPE GREETING[<ELEMENT GREETING (#PCDATA)>]>
24SNUOOPSLA Lab.The ubiquitous XML
DTD is composed of a number of declarations ELEMENT (tag definition) ATTLIST (attribute definitions) ENTITY (entity definition) NOTATION(data type notation definition)
DTD can be stored in an external subset or an internal subset
DTD structure
logical structure
25SNUOOPSLA Lab.The ubiquitous XML
Internal subset Form : <!DOCTYPE … [ <!-- Internal Subset --> … ]> Pros
Easy to write XML Cons
Editing two files without moving Other document can’t reuse without copying
internal subset
Internal and External Subset(1/3)
logical structure
26SNUOOPSLA Lab.The ubiquitous XML
External subset better to use external DTDs Reason why?
Many benefits document management updating editing
Few reasons If you use an external DTD, you can use public DTDs(capability) External DTDs provide for better document management External DTDs make it easier to validate you document
Internal and External Subset(2/3)
logical structure
27SNUOOPSLA Lab.The ubiquitous XML
Internal and External Subset(3/3)
internal
external
Internal subset
external subset
full parsing path
logical structure
28SNUOOPSLA Lab.The ubiquitous XML
Used to define a new element, specify its allowed content and gives the name and content model of the element
Each tag must be declared in a <!ELEMENT> declaration.
The content model uses a simple regular expression-like grammar to precisely specify what is and isn't allowed in an element
ELEMENT Type declaration ‘<!ELEMENT’ S Name S Contentspec S? ‘>’
Element Declarations
logical structure
29SNUOOPSLA Lab.The ubiquitous XML
Content Specifications ANY #PCDATA Sequences Choices Mixed Content Modifiers Empty
logical structure
30SNUOOPSLA Lab.The ubiquitous XML
A SEASON can contain any child element and/or raw text (parsed character data)
Rarely used in practice, due to the lack of constraint on structure it encourages.
<!ELEMENT SEASON ANY>
ANY
logical structure
31SNUOOPSLA Lab.The ubiquitous XML
Parsed Character Data; i.e. raw text, no markup
Represent normal data and preceded by the hash-symbol, ‘#’, to avoid confusion with an identical element name, when used within a model group( for example, ‘(#PCDATA | PCDATA)’)
<!ELEMENT YEAR (#PCDATA)>
#PCDATA
logical structure
32SNUOOPSLA Lab.The ubiquitous XML
Use of #PCDATA in XML Valid: Invalid:
<YEAR>1999</YEAR><YEAR>99</YEAR><YEAR>1999 .E.</YEAR><YEAR> The year of our Lord one thousand, nine hundred, and ninety-nine</YEAR>
<YEAR><MONTH>January</MONTH><MONTH>February</MONTH><MONTH>March</MONTH><MONTH>April</MONTH><MONTH>May</MONTH><MONTH>June</MONTH><MONTH>July</MONTH><MONTH>August</MONTH><MONTH>September</MONTH><MONTH>October</MONTH><MONTH>November</MONTH><MONTH>December</MONTH></YEAR>
logical structure
33SNUOOPSLA Lab.The ubiquitous XML
Child Elements To declare that a LEAGUE element must
have a LEAGUE_NAME child:
<!ELEMENT LEAGUE (LEAGUE_NAME)> <!ELEMENT LEAGUE_NAME (#PCDATA)>
logical structure
34SNUOOPSLA Lab.The ubiquitous XML
Sequences(1/2) Separate multiple required child
elements with commas; e.g.
One or More Children +
<!ELEMENT SEASON (YEAR, LEAGUE, LEAGUE)><!ELEMENT LEAGUE (LEAGUE_NAME, DIVISION, DIVISION, DIVISION)>
<!ELEMENT DIVISION_NAME (#PCDATA)><!ELEMENT DIVISION (DIVISION_NAME, TEAM+)>
logical structure
35SNUOOPSLA Lab.The ubiquitous XML
Sequences(2/2) Zero or More Children *
Choices
<!ELEMENT TEAM (TEAM_CITY, TEAM_NAME, PLAYER*)><!ELEMENT TEAM_CITY (#PCDATA)><!ELEMENT TEAM_NAME (#PCDATA)>
<!ELEMENT PAYMENT (CASH | CREDIT_CARD)>
<!ELEMENT PAYMENT (CASH | CREDIT_CARD | CHECK)>
logical structure
36SNUOOPSLA Lab.The ubiquitous XML
Grouping With Parentheses
Parentheses combine several elements into a single element.
Parenthesized element can be nested inside other parentheses in place of a single element.
The parenthesized element can be suffixed with a plus sign, a comma, or a question mark.
<!ELEMENT dl (dt, dd)*><!ELEMENT ARTICLE (TITLE, (P | PHOTO |GRAPH | SIDEBAR | PULLQUOTE | SUBHEAD)*, BYLINE?)>
logical structure
37SNUOOPSLA Lab.The ubiquitous XML
Mixed Content Both #PCDATA and child elements in a choice
#PCDATA must come first #PCDATA cannot be used in a sequence
<!ELEMENT TEAM (#PCDATA | TEAM_CITY | TEAM_NAME | PLAYER)*>
Empty elements
<!ELEMENT BR EMPTY>
logical structure
38SNUOOPSLA Lab.The ubiquitous XML
Attribute Declarations
Consider this element:
It is declared like this:
<GREETING LANGUAGE="Spanish"> Hola!</GREETING>
<!ELEMENT GREETING (#PCDATA)><!ATTLIST GREETING LANGUAGE CDATA "English">
<!ATTLIST Element_name Attribute_name Type Default_value>
logical structure
39SNUOOPSLA Lab.The ubiquitous XML
Multiple Attribute Declarations
Consider this element
With two attribute declarations:
With one attribute declaration Indentation is a convetion, not a requirement
<RECT LENGTH="70px" WIDTH="85px"/>
<!ELEMENT RECTANGLE EMPTY><!ATTLIST RECTANGLE LENGTH CDATA "0px"><!ATTLIST RECTANGLE WIDTH CDATA "0px">
<!ATTLIST RECTANGLE LENGTH CDATA "0px" WIDTH CDATA "0px">
logical structure
40SNUOOPSLA Lab.The ubiquitous XML
Attribute Types CDATA ID IDREF IDREFS ENTITY
ENTITIES NOTATION NMTOKEN NMTOKENS Enumerated
logical structure
41SNUOOPSLA Lab.The ubiquitous XML
CDATA Most general attribute type
Value can be any string of text not containing a less-than sign (<) or quotation marks (")
logical structure
42SNUOOPSLA Lab.The ubiquitous XML
ID Value must be an XML name
May include letters, digits, underscores, hyphens, and periods
May not include whitespace May contain colons only if used for
namespaces Value must be unique within ID type attributes
in the document Generally the default value is #REQUIRED
logical structure
43SNUOOPSLA Lab.The ubiquitous XML
IDREF Value matches the ID of an element in
the same document Used for links and the like
IDREFS
A list of ID values in the same documentSeparated by white space
logical structure
44SNUOOPSLA Lab.The ubiquitous XML
ENTITY Value is the name of an unparsed
general entity declared in the DTD
ENTITIES
Value is a list of unparsed general entities declared in the DTDSeparated by white space
logical structure
45SNUOOPSLA Lab.The ubiquitous XML
NOTATION Value is the name of a notation
declared in the DTD
<!NOTATION Tex SYSTEM “..\TEXVIEW.EXE”>
<!ENTITY Logo SYSTEM “LOGO.TEX” NDATA Tex>
TEXVIEW.EXE LOGO.TEX
1
2
3
4
logical structure
46SNUOOPSLA Lab.The ubiquitous XML
NMTOKEN Value is any legal XML name
NMTOKENS
Value is a list of XML namesSeparated by white space
logical structure
47SNUOOPSLA Lab.The ubiquitous XML
Enumerated Not a keyword Refers to a list of possible values from
which one must be chosen Default value is generally provided
explicitly
<!ATTLIST P VISIBLE (TRUE | FALSE) "TRUE">
logical structure
48SNUOOPSLA Lab.The ubiquitous XML
Attribute Default Values A literal string value One of these three keywords
#REQUIRED #IMPLIED #FIXED
logical structure
49SNUOOPSLA Lab.The ubiquitous XML
#REQUIRED No default value is provided in the DTD Document authors must provide attribute
value for each element
<!ELEMENT IMG EMPTY><!ATTLIST IMG ALT CDATA #REQUIRED><!ATTLIST IMG WIDTH CDATA #REQUIRED><!ATTLIST IMG HEIGHT CDATA #REQUIRED>
logical structure
50SNUOOPSLA Lab.The ubiquitous XML
#IMPLIED No default value in the DTD Author may(but does not have to)
provide a value with each element
logical structure
51SNUOOPSLA Lab.The ubiquitous XML
#FIXED Value is the same for all elements Default value must be provided in DTD Document author may not change default
value
<!ELEMENT AUTHOR EMPTY><!ATTLIST AUTHOR NAME CDATA #REQUIRED><!ATTLIST AUTHOR EMAIL CDATA #REQUIRED><!ATTLIST AUTHOR EXTENSION CDATA #IMPLIED><!ATTLIST AUTHOR COMPANY CDATA #FIXED "TIC">
logical structure
52SNUOOPSLA Lab.The ubiquitous XML
Example of Internal DTDs
<?xml version="1.0"?><!DOCTYPE GREETING [ <!ELEMENT GREETING (#PCDATA)>]><GREETING>Hello XML!</GREETING>
logical structure
53SNUOOPSLA Lab.The ubiquitous XML
Internal DTD Subsets
Internal declarations override external declarations
<?xml version="1.0"?><!DOCTYPE GREETING SYSTEM "greeting.dtd" [ <!ELEMENT GREETING (#PCDATA)>]><GREETING>Hello XML!</GREETING>
logical structure
Top Related