Producing XML Documents with Guaranteed “Good” Properties David W. Embley Brigham Young...
-
date post
21-Dec-2015 -
Category
Documents
-
view
215 -
download
2
Transcript of Producing XML Documents with Guaranteed “Good” Properties David W. Embley Brigham Young...
Producing XML Documents with Guaranteed “Good” PropertiesDavid W. Embley
Brigham Young University
Wai Y. MokUniversity of Alabama in Huntsville
Sponsored in part by the National Science Foundation under grant number IIS-0083127
“Good” ~ XNF
Motivation XML is for Information Exchange. What constitutes a “good” XML document for Information Exchange?
Principles XML Document Properties
A Few Large Trees. No Redundancy.
Information Modeling Create a conceptual model. Generate “good” XML.
XNF Align XML trees with natural hierarchies in the data. Base redundancy elimination on FDs, naturally occurring MVDs, and
inclusion dependencies (IDs).
Example: XNF
GradStudent
FacultyMember
Hobby
Program
Department
GradStudent
FacultyMember
Hobby
Program
Department
F D
S P H
H
( F D ( S P ( H )* )* ( H )* )*
Kelly CS Pat PhD Hiking Hiking Skiing Skiing
Tracy MS Hiking Sailing
Chris MS
Lynn Math Sailing
Example: More Trees Than Necessary
GradStudent
FacultyMember
Hobby
Program
Department
GradStudent
FacultyMember
Hobby
Program
Department
S P F D
H F
H
( S P F ( H )* )* ( D ( F ( H )* )*
Pat PhD Kelly Hiking CS Kelly Hiking Skiing Skiing
Tracy MS Kelly Hiking Math Lynn Sailing Sailing
Chris MS Kelly
Example: Redundancy
GradStudent
FacultyMember
Hobby
Program
Department
GradStudent
FacultyMember
Hobby
Program
Department
H
S P
( H ( S P )* )*
Hiking Pat PhD Tracy MS
Skiing Pat PhD
Sailing Tracy MS
S
H
F
( S ( H ( F )* )* )*
Pat Hiking Kelly Skiing Kelly
Tracy Hiking Kelly
Sailing Lynn
Chris
XNF → XML
GradStudent
FacultyMember
Hobby
Program
Department
GradStudent
FacultyMember
Hobby
Program
Department
F D
S P H
H
( F D ( S P ( H )* )* ( H )* )*
Kelly CS Pat PhD Hiking Hiking Skiing Skiing
Tracy MS Hiking Sailing
Chris MS
Lynn Math Sailing
Naive DTD Generation
F D
S P H
H
( F D ( S P ( H )* )* ( H )* )*
Kelly CS Pat PhD Hiking Hiking Skiing Skiing
Tracy MS Hiking Sailing
Chris MS
Lynn Math Sailing
<!DOCTYPE University[<!ELEMENT University ( ( Faculty_Member, Department, ( Grad_Student, Program, ( Hobby )* )* ( Hobby )* )*, <!ELEMENT Faculty_Member (#PCDATA)> …]>
Naive DTD Generation
F D
S P H
H
<!DOCTYPE University[<!ELEMENT University ( ( Faculty_Member, Department, ( Graduate_Student, Program, ( Hobby )* )* ( Hobby )* )*, <!ELEMENT Faculty_Member (#PCDATA)> …]> <University>
<Faculty_Member>Kelly</Faculty_Member> <Department>CS</Department> <Graduate_Student>Pat</Graduate_Student> <Program>PhD</Program> <Hobby_S>Hiking</Hobby_S> <Hobby_S>Skiing</Hobby_S> <Graduate_Student>Tracy</Graduate_Student> <Program>MS</Program> <Hobby_S>Hiking</Hobby_S> <Hobby_S>Sailing</Hobby_S> <Graduate_Student>Chris</Graduate_Student> <Program>MS</Program> <Hobby_F>Hiking</Hobby_F> <Hobby_F>Skiing</Hobby_F> <Faculty_Member>Lynn</Facutly_Member> <Hobby_F>Sailing</Hobby_F></University>
Sophisticated DTD Generation
F D
S P H
H
Faculty Members
Grad_Students
Hobbies
Hobbies
<!DOCTYPE University[<!ELEMENT University (Faculty_Members)> <!ELEMENT Faculty_Members (Faculty_Member)*> <!ELEMENT Faculty_Member (Department, Grad_Students, Hobbies)> <!ATTLIST Faculty_Member value CDATA #REQUIRED> <!ELEMENT Department (#PCDATA) <!ELEMENT Grad_Students (Grad_Student)*> <!ELEMENT Grad_Student (Program, Hobbies)> …]>
<University> <Faculty_Members> <Faculty_Member value=“Kelly”> <Department>CS</Department> <Grad_Students> <Grad_Student value=“Pat”> <Program>PhD</Program> <Hobbies> <Hobby>Hiking</Hobby> <Hobby>Skiing</Hobby> </Hobbies> </Grad_Student> <Grad_Student value=“Tracy”> … </Faculty_Members></University>
→ XNF
GradStudent
FacultyMember
Hobby
Program
Department
GradStudent
FacultyMember
Hobby
Program
Department
F D
S P H
H
( F D ( S P ( H )* )* ( H )* )*
Kelly CS Pat PhD Hiking Hiking Skiing Skiing
Tracy MS Hiking Sailing
Chris MS
Lynn Math Sailing
How do we generateXNF scheme-trees?
Alg. 1
GradStudent
FacultyMember
Hobby
Program
Department
GradStudent
FacultyMember
Hobby
Program
Department
F D
S P H
H
How do we generateXNF scheme-trees?
Algorithm 1Until all vertices and edges are included: Find a start vertex: -- included in most enclosures -- back off by one, if possible Grow a tree as large as possible: -- cut out hierarchy (watch out for optionals) -- add adjacent vertices: -- within node (for functional edges) -- below node (for non-functional edges)
Alg. 1: Start
GradStudent
FacultyMember
Hobby
Program
Department
GradStudent
FacultyMember
Hobby
Program
Department
F D
S P H
H
How do we generateXNF scheme-trees?
Algorithm 1Until all vertices and edges are included: Find a start vertex: -- included in most enclosures -- back off by one, if possible Grow a tree as large as possible: -- cut out hierarchy (watch out for optionals) -- add adjacent vertices: -- within node (for functional edges) -- below node (for non-functional edges)
1
2 3
1 2
Alg. 1: Start
GradStudent
FacultyMember
Hobby
Program
Department
GradStudent
FacultyMember
Hobby
Program
Department
F D
S P H
H
How do we generateXNF scheme-trees?
Algorithm 1Until all vertices and edges are included: Find a start vertix: -- included in most enclosures -- back off by one, if possible Grow a tree as large as possible: -- cut out hierarchy (watch out for optionals) -- add adjacent vertices: -- within node (for functional edges) -- below node (for non-functional edges)
1
2 3
1 2
GradStudent
FacultyMember
Hobby
Program
Department
GradStudent
FacultyMember
Hobby
Program
DepartmentAlg. 1: Grow
F D
S P H
H
How do we generateXNF scheme-trees?
Algorithm 1Until all vertices and edges are included: Find a start vertex: -- included in most enclosures -- back off by one, if possible Grow a tree as large as possible: -- cut out hierarchy (watch out for optionals) -- add adjacent vertices: -- within node (for functional edges) -- below node (for non-functional edges)
Alg. 1: Grow
GradStudent
FacultyMember
Hobby
Program
Department
GradStudent
FacultyMember
Hobby
Program
Department
F D
S P H
H
How do we generateXNF scheme-trees?
Algorithm 1Until all vertices and edges are included: Find a start vertex: -- included in most enclosures -- back off by one, if possible Grow a tree as large as possible: -- cut out hierarchy (watch out for optionals) -- add adjacent vertices: -- within node (for functional edges) -- below node (for non-functional edges)
√
Alg. 1: Grow
GradStudent
FacultyMember
Hobby
Program
Department
GradStudent
FacultyMember
Hobby
Program
Department
F D
S P H
H
How do we generateXNF scheme-trees?
Algorithm 1Until all vertices and edges are included: Find a start vertex: -- included in most enclosures -- back off by one, if possible Grow a tree as large as possible: -- cut out hierarchy (watch out for optionals) -- add adjacent vertices: -- within node (for functional edges) -- below node (for non-functional edges)
√
√
Algorithm 1 Yields XNF
Theorem. Given a canonical, binary conceptual-model (CM)hypergraph H, Algorithm 1 generates an XNF scheme-treeforest with respect to the FDs and MVDs of H.
Proof: Based on NNF (Mok, et al., TODS, 1996)
What is this restriction?
Can we relax this constraint?
Can we enlarge the set of dependencies?
Non-Canonical CM Hypergraphs
GradStudent
DepartmentFacultyMember
Hobby
ProgramGradStudent
DepartmentFacultyMember
Hobby
Program
If the input CM hypergraph has redundancy, Algorithm 1generates scheme trees with potential redundancy.
D
F S
S P H
H
The set of studentsmust be the samefor every department.
F D
S P D H
H
A faculty member’sdepartment is the sameas the faculty member’sstudents’ department.A CM hypergraph is canonical if:
(1) No edge is redundant,(2) No edge is losslessly decomposable, and(3) No vertex is redundant.
Non-Binary CM Hypergraphs
Address DayTime
Name Course
Phone
Major
Address DayTime
Name Course
Phone
Major
Address
Name Course
DayTimePhone
Major
Address
Name Course
DayTimePhone
Major
Not Canonical:Decomposable
Generating Scheme Trees fromNon-Binary CM Hypergraphs
Address
Name Course
DayTimePhone
Major
Address
Name Course
DayTimePhone
Major
N A M
C
C
D
T
C
D Tor or …
A P
Alg. 2
Address
Name Course
DayTimePhone
Major
Address
Name Course
DayTimePhone
Major
N A M
C
C
D
T
A P Algorithm 2Until all vertices and edges are included: Find a start edge and configure it Grow a tree as large as possible -- cut out hierarchy (watch out for optionals) -- add and configure edges
Alg. 2: Start
Address
Name Course
DayTimePhone
Major
Address
Name Course
DayTimePhone
Major
N A M
C
C
D
T
A P Algorithm 2Until all vertices and edges are included: Find a start edge and configure it Grow a tree as large as possible -- cut out hierarchy (watch out for optionals) -- add and configure edges
√
Alg. 2: Grow
Address
Name Course
DayTimePhone
Major
Address
Name Course
DayTimePhone
Major
N A M
C
C
D
T
A P Algorithm 2Until all vertices and edges are included: Find a start edge and configure it Grow a tree as large as possible -- cut out hierarchy (watch out for optionals) -- add and configure edges
Alg. 2: Start Again & Grow
Address
Name Course
DayTimePhone
Major
Address
Name Course
DayTimePhone
Major
N A M
C
C
D
T
A P Algorithm 2Until all vertices and edges are included: Find a start edge and configure it Grow a tree as large as possible -- cut out hierarchy (watch out for optionals) -- add and configure edges
√
Alg. 2: Start Again and Grow
Address
Name Course
DayTimePhone
Major
Address
Name Course
DayTimePhone
Major
N A M
C
C
D
T
A P Algorithm 2Until all vertices and edges are included: Find a start edge and configure it Grow a tree as large as possible -- cut out hierarchy (watch out for optionals) -- add and configure edges
√
Algorithm 2 Yields XNF
Theorem. Given a canonical conceptual-model (CM)hypergraph H, Algorithm 2 generates an XNF scheme-treeforest with respect to the FDs and MVDs of H.
Proof: Based on NNF (Mok, et al., TODS, 1996)
Inclusion Dependencies (IDs)
Hobby
FacultyMemberwith Hobby
GradStudentwith Hobby
FacultyMember
GradStudent
Advisor
Department
Program
Hobby
FacultyMemberwith Hobby
GradStudentwith Hobby
FacultyMember
GradStudent
Advisor
Department
Program
optionalconnections
Inclusion Dependencies (IDs)
Hobby
Grad-StudentHobbies
Faculty-MemberHobbies
GradStudentwith Hobby
FacultyMemberwith Hobby
FacultyMember
GradStudent
Advisor
Department
Program
Hobby
Grad-StudentHobbies
Faculty-MemberHobbies
GradStudentwith Hobby
FacultyMemberwith Hobby
FacultyMember
GradStudent
Advisor
Department
Program
This constraint makesthis vertex redundant.
Canonical CM Hypergraph with IDs
Grad-StudentHobbies
Faculty-MemberHobbies
GradStudentwith Hobby
FacultyMemberwith Hobby
FacultyMember
GradStudent
Advisor
Department
ProgramGrad-StudentHobbies
Faculty-MemberHobbies
GradStudentwith Hobby
FacultyMemberwith Hobby
FacultyMember
GradStudent
Advisor
Department
Program
Generating Scheme Trees fromCanonical CM Hypergraph with IDs
Grad-StudentHobbies
Faculty-MemberHobbies
GradStudentwith Hobby
FacultyMemberwith Hobby
FacultyMember
GradStudent
Advisor
Department
ProgramGrad-StudentHobbies
Faculty-MemberHobbies
GradStudentwith Hobby
FacultyMemberwith Hobby
FacultyMember
GradStudent
Advisor
Department
Program
F D
S P HF
HS
Algorithm 3Collapse G/S hierarchiesIf the edges are all binary Execute Algorithm 1Else Execute Algorithm 2
Alg. 3: Collapse
Grad-StudentHobbies
Faculty-MemberHobbies
GradStudentwith Hobby
FacultyMemberwith Hobby
FacultyMember
GradStudent
Advisor
Department
ProgramGrad-StudentHobbies
Faculty-MemberHobbies
GradStudentwith Hobby
FacultyMemberwith Hobby
FacultyMember
GradStudent
Advisor
Department
Program
F D
S P HF
HS
Algorithm 3Collapse G/S hierarchiesIf the edges are all binary Execute Algorithm 1Else Execute Algorithm 2
Alg. 3: Collapse
F D
S P HF
HS
Algorithm 3Collapse G/S hierarchiesIf the edges are all binary Execute Algorithm 1Else Execute Algorithm 2
GradStudent
Grad-StudentHobbies
FacultyMember
Faculty-MemberHobbies
Department
ProgramGradStudent
Grad-StudentHobbies
FacultyMember
Faculty-MemberHobbies
Department
Program
Alg. 3: Execute
F D
S P HF
HS
Algorithm 3Collapse G/S hierarchiesIf the edges are all binary Execute Algorithm 1Else Execute Algorithm 2
GradStudent
Grad-StudentHobbies
FacultyMember
Faculty-MemberHobbies
Department
ProgramGradStudent
Grad-StudentHobbies
FacultyMember
Faculty-MemberHobbies
Department
Program
Algorithm 3 Yields XNF
Theorem. Given a canonical conceptual-model (CM)hypergraph H, Algorithm 3 generates an XNF scheme-treeforest with respect to the FDs, MVDs, and IDs of H.
Proof: Based on NNF (Mok, et al., TODS, 1996)
Conclusions
XNF ~ “Good” XML No redundancy As few trees as possible
Elegant DTD generation Algorithms to generate XNF Proofs of correctness
[email protected]@email.uah.edu