Producing XML Documents with Guaranteed “Good” Properties David W. Embley Brigham Young...

35
Producing XML Documents with Guaranteed “Good” Properties David W. Embley Brigham Young University Wai Y. Mok University of Alabama in Huntsville red in part by the National Science Foundation under grant number IIS-0083127
  • date post

    21-Dec-2015
  • Category

    Documents

  • view

    215
  • download

    2

Transcript of Producing XML Documents with Guaranteed “Good” Properties David W. Embley Brigham Young...

Page 1: Producing XML Documents with Guaranteed “Good” Properties David W. Embley Brigham Young University Wai Y. Mok University of Alabama in Huntsville Sponsored.

Producing XML Documents with Guaranteed “Good” PropertiesDavid W. Embley

Brigham Young University

Wai Y. MokUniversity of Alabama in Huntsville

Sponsored in part by the National Science Foundation under grant number IIS-0083127

Page 2: Producing XML Documents with Guaranteed “Good” Properties David W. Embley Brigham Young University Wai Y. Mok University of Alabama in Huntsville Sponsored.

“Good” ~ XNF

Motivation XML is for Information Exchange. What constitutes a “good” XML document for Information Exchange?

Principles XML Document Properties

A Few Large Trees. No Redundancy.

Information Modeling Create a conceptual model. Generate “good” XML.

XNF Align XML trees with natural hierarchies in the data. Base redundancy elimination on FDs, naturally occurring MVDs, and

inclusion dependencies (IDs).

Page 3: Producing XML Documents with Guaranteed “Good” Properties David W. Embley Brigham Young University Wai Y. Mok University of Alabama in Huntsville Sponsored.

Example: XNF

GradStudent

FacultyMember

Hobby

Program

Department

GradStudent

FacultyMember

Hobby

Program

Department

F D

S P H

H

( F D ( S P ( H )* )* ( H )* )*

Kelly CS Pat PhD Hiking Hiking Skiing Skiing

Tracy MS Hiking Sailing

Chris MS

Lynn Math Sailing

Page 4: Producing XML Documents with Guaranteed “Good” Properties David W. Embley Brigham Young University Wai Y. Mok University of Alabama in Huntsville Sponsored.

Example: More Trees Than Necessary

GradStudent

FacultyMember

Hobby

Program

Department

GradStudent

FacultyMember

Hobby

Program

Department

S P F D

H F

H

( S P F ( H )* )* ( D ( F ( H )* )*

Pat PhD Kelly Hiking CS Kelly Hiking Skiing Skiing

Tracy MS Kelly Hiking Math Lynn Sailing Sailing

Chris MS Kelly

Page 5: Producing XML Documents with Guaranteed “Good” Properties David W. Embley Brigham Young University Wai Y. Mok University of Alabama in Huntsville Sponsored.

Example: Redundancy

GradStudent

FacultyMember

Hobby

Program

Department

GradStudent

FacultyMember

Hobby

Program

Department

H

S P

( H ( S P )* )*

Hiking Pat PhD Tracy MS

Skiing Pat PhD

Sailing Tracy MS

S

H

F

( S ( H ( F )* )* )*

Pat Hiking Kelly Skiing Kelly

Tracy Hiking Kelly

Sailing Lynn

Chris

Page 6: Producing XML Documents with Guaranteed “Good” Properties David W. Embley Brigham Young University Wai Y. Mok University of Alabama in Huntsville Sponsored.

XNF → XML

GradStudent

FacultyMember

Hobby

Program

Department

GradStudent

FacultyMember

Hobby

Program

Department

F D

S P H

H

( F D ( S P ( H )* )* ( H )* )*

Kelly CS Pat PhD Hiking Hiking Skiing Skiing

Tracy MS Hiking Sailing

Chris MS

Lynn Math Sailing

Page 7: Producing XML Documents with Guaranteed “Good” Properties David W. Embley Brigham Young University Wai Y. Mok University of Alabama in Huntsville Sponsored.

Naive DTD Generation

F D

S P H

H

( F D ( S P ( H )* )* ( H )* )*

Kelly CS Pat PhD Hiking Hiking Skiing Skiing

Tracy MS Hiking Sailing

Chris MS

Lynn Math Sailing

<!DOCTYPE University[<!ELEMENT University ( ( Faculty_Member, Department, ( Grad_Student, Program, ( Hobby )* )* ( Hobby )* )*, <!ELEMENT Faculty_Member (#PCDATA)> …]>

Page 8: Producing XML Documents with Guaranteed “Good” Properties David W. Embley Brigham Young University Wai Y. Mok University of Alabama in Huntsville Sponsored.

Naive DTD Generation

F D

S P H

H

<!DOCTYPE University[<!ELEMENT University ( ( Faculty_Member, Department, ( Graduate_Student, Program, ( Hobby )* )* ( Hobby )* )*, <!ELEMENT Faculty_Member (#PCDATA)> …]> <University>

<Faculty_Member>Kelly</Faculty_Member> <Department>CS</Department> <Graduate_Student>Pat</Graduate_Student> <Program>PhD</Program> <Hobby_S>Hiking</Hobby_S> <Hobby_S>Skiing</Hobby_S> <Graduate_Student>Tracy</Graduate_Student> <Program>MS</Program> <Hobby_S>Hiking</Hobby_S> <Hobby_S>Sailing</Hobby_S> <Graduate_Student>Chris</Graduate_Student> <Program>MS</Program> <Hobby_F>Hiking</Hobby_F> <Hobby_F>Skiing</Hobby_F> <Faculty_Member>Lynn</Facutly_Member> <Hobby_F>Sailing</Hobby_F></University>

Page 9: Producing XML Documents with Guaranteed “Good” Properties David W. Embley Brigham Young University Wai Y. Mok University of Alabama in Huntsville Sponsored.

Sophisticated DTD Generation

F D

S P H

H

Faculty Members

Grad_Students

Hobbies

Hobbies

<!DOCTYPE University[<!ELEMENT University (Faculty_Members)> <!ELEMENT Faculty_Members (Faculty_Member)*> <!ELEMENT Faculty_Member (Department, Grad_Students, Hobbies)> <!ATTLIST Faculty_Member value CDATA #REQUIRED> <!ELEMENT Department (#PCDATA) <!ELEMENT Grad_Students (Grad_Student)*> <!ELEMENT Grad_Student (Program, Hobbies)> …]>

<University> <Faculty_Members> <Faculty_Member value=“Kelly”> <Department>CS</Department> <Grad_Students> <Grad_Student value=“Pat”> <Program>PhD</Program> <Hobbies> <Hobby>Hiking</Hobby> <Hobby>Skiing</Hobby> </Hobbies> </Grad_Student> <Grad_Student value=“Tracy”> … </Faculty_Members></University>

Page 10: Producing XML Documents with Guaranteed “Good” Properties David W. Embley Brigham Young University Wai Y. Mok University of Alabama in Huntsville Sponsored.

→ XNF

GradStudent

FacultyMember

Hobby

Program

Department

GradStudent

FacultyMember

Hobby

Program

Department

F D

S P H

H

( F D ( S P ( H )* )* ( H )* )*

Kelly CS Pat PhD Hiking Hiking Skiing Skiing

Tracy MS Hiking Sailing

Chris MS

Lynn Math Sailing

How do we generateXNF scheme-trees?

Page 11: Producing XML Documents with Guaranteed “Good” Properties David W. Embley Brigham Young University Wai Y. Mok University of Alabama in Huntsville Sponsored.

Alg. 1

GradStudent

FacultyMember

Hobby

Program

Department

GradStudent

FacultyMember

Hobby

Program

Department

F D

S P H

H

How do we generateXNF scheme-trees?

Algorithm 1Until all vertices and edges are included: Find a start vertex: -- included in most enclosures -- back off by one, if possible Grow a tree as large as possible: -- cut out hierarchy (watch out for optionals) -- add adjacent vertices: -- within node (for functional edges) -- below node (for non-functional edges)

Page 12: Producing XML Documents with Guaranteed “Good” Properties David W. Embley Brigham Young University Wai Y. Mok University of Alabama in Huntsville Sponsored.

Alg. 1: Start

GradStudent

FacultyMember

Hobby

Program

Department

GradStudent

FacultyMember

Hobby

Program

Department

F D

S P H

H

How do we generateXNF scheme-trees?

Algorithm 1Until all vertices and edges are included: Find a start vertex: -- included in most enclosures -- back off by one, if possible Grow a tree as large as possible: -- cut out hierarchy (watch out for optionals) -- add adjacent vertices: -- within node (for functional edges) -- below node (for non-functional edges)

1

2 3

1 2

Page 13: Producing XML Documents with Guaranteed “Good” Properties David W. Embley Brigham Young University Wai Y. Mok University of Alabama in Huntsville Sponsored.

Alg. 1: Start

GradStudent

FacultyMember

Hobby

Program

Department

GradStudent

FacultyMember

Hobby

Program

Department

F D

S P H

H

How do we generateXNF scheme-trees?

Algorithm 1Until all vertices and edges are included: Find a start vertix: -- included in most enclosures -- back off by one, if possible Grow a tree as large as possible: -- cut out hierarchy (watch out for optionals) -- add adjacent vertices: -- within node (for functional edges) -- below node (for non-functional edges)

1

2 3

1 2

Page 14: Producing XML Documents with Guaranteed “Good” Properties David W. Embley Brigham Young University Wai Y. Mok University of Alabama in Huntsville Sponsored.

GradStudent

FacultyMember

Hobby

Program

Department

GradStudent

FacultyMember

Hobby

Program

DepartmentAlg. 1: Grow

F D

S P H

H

How do we generateXNF scheme-trees?

Algorithm 1Until all vertices and edges are included: Find a start vertex: -- included in most enclosures -- back off by one, if possible Grow a tree as large as possible: -- cut out hierarchy (watch out for optionals) -- add adjacent vertices: -- within node (for functional edges) -- below node (for non-functional edges)

Page 15: Producing XML Documents with Guaranteed “Good” Properties David W. Embley Brigham Young University Wai Y. Mok University of Alabama in Huntsville Sponsored.

Alg. 1: Grow

GradStudent

FacultyMember

Hobby

Program

Department

GradStudent

FacultyMember

Hobby

Program

Department

F D

S P H

H

How do we generateXNF scheme-trees?

Algorithm 1Until all vertices and edges are included: Find a start vertex: -- included in most enclosures -- back off by one, if possible Grow a tree as large as possible: -- cut out hierarchy (watch out for optionals) -- add adjacent vertices: -- within node (for functional edges) -- below node (for non-functional edges)

Page 16: Producing XML Documents with Guaranteed “Good” Properties David W. Embley Brigham Young University Wai Y. Mok University of Alabama in Huntsville Sponsored.

Alg. 1: Grow

GradStudent

FacultyMember

Hobby

Program

Department

GradStudent

FacultyMember

Hobby

Program

Department

F D

S P H

H

How do we generateXNF scheme-trees?

Algorithm 1Until all vertices and edges are included: Find a start vertex: -- included in most enclosures -- back off by one, if possible Grow a tree as large as possible: -- cut out hierarchy (watch out for optionals) -- add adjacent vertices: -- within node (for functional edges) -- below node (for non-functional edges)

Page 17: Producing XML Documents with Guaranteed “Good” Properties David W. Embley Brigham Young University Wai Y. Mok University of Alabama in Huntsville Sponsored.

Algorithm 1 Yields XNF

Theorem. Given a canonical, binary conceptual-model (CM)hypergraph H, Algorithm 1 generates an XNF scheme-treeforest with respect to the FDs and MVDs of H.

Proof: Based on NNF (Mok, et al., TODS, 1996)

What is this restriction?

Can we relax this constraint?

Can we enlarge the set of dependencies?

Page 18: Producing XML Documents with Guaranteed “Good” Properties David W. Embley Brigham Young University Wai Y. Mok University of Alabama in Huntsville Sponsored.

Non-Canonical CM Hypergraphs

GradStudent

DepartmentFacultyMember

Hobby

ProgramGradStudent

DepartmentFacultyMember

Hobby

Program

If the input CM hypergraph has redundancy, Algorithm 1generates scheme trees with potential redundancy.

D

F S

S P H

H

The set of studentsmust be the samefor every department.

F D

S P D H

H

A faculty member’sdepartment is the sameas the faculty member’sstudents’ department.A CM hypergraph is canonical if:

(1) No edge is redundant,(2) No edge is losslessly decomposable, and(3) No vertex is redundant.

Page 19: Producing XML Documents with Guaranteed “Good” Properties David W. Embley Brigham Young University Wai Y. Mok University of Alabama in Huntsville Sponsored.

Non-Binary CM Hypergraphs

Address DayTime

Name Course

Phone

Major

Address DayTime

Name Course

Phone

Major

Address

Name Course

DayTimePhone

Major

Address

Name Course

DayTimePhone

Major

Not Canonical:Decomposable

Page 20: Producing XML Documents with Guaranteed “Good” Properties David W. Embley Brigham Young University Wai Y. Mok University of Alabama in Huntsville Sponsored.

Generating Scheme Trees fromNon-Binary CM Hypergraphs

Address

Name Course

DayTimePhone

Major

Address

Name Course

DayTimePhone

Major

N A M

C

C

D

T

C

D Tor or …

A P

Page 21: Producing XML Documents with Guaranteed “Good” Properties David W. Embley Brigham Young University Wai Y. Mok University of Alabama in Huntsville Sponsored.

Alg. 2

Address

Name Course

DayTimePhone

Major

Address

Name Course

DayTimePhone

Major

N A M

C

C

D

T

A P Algorithm 2Until all vertices and edges are included: Find a start edge and configure it Grow a tree as large as possible -- cut out hierarchy (watch out for optionals) -- add and configure edges

Page 22: Producing XML Documents with Guaranteed “Good” Properties David W. Embley Brigham Young University Wai Y. Mok University of Alabama in Huntsville Sponsored.

Alg. 2: Start

Address

Name Course

DayTimePhone

Major

Address

Name Course

DayTimePhone

Major

N A M

C

C

D

T

A P Algorithm 2Until all vertices and edges are included: Find a start edge and configure it Grow a tree as large as possible -- cut out hierarchy (watch out for optionals) -- add and configure edges

Page 23: Producing XML Documents with Guaranteed “Good” Properties David W. Embley Brigham Young University Wai Y. Mok University of Alabama in Huntsville Sponsored.

Alg. 2: Grow

Address

Name Course

DayTimePhone

Major

Address

Name Course

DayTimePhone

Major

N A M

C

C

D

T

A P Algorithm 2Until all vertices and edges are included: Find a start edge and configure it Grow a tree as large as possible -- cut out hierarchy (watch out for optionals) -- add and configure edges

Page 24: Producing XML Documents with Guaranteed “Good” Properties David W. Embley Brigham Young University Wai Y. Mok University of Alabama in Huntsville Sponsored.

Alg. 2: Start Again & Grow

Address

Name Course

DayTimePhone

Major

Address

Name Course

DayTimePhone

Major

N A M

C

C

D

T

A P Algorithm 2Until all vertices and edges are included: Find a start edge and configure it Grow a tree as large as possible -- cut out hierarchy (watch out for optionals) -- add and configure edges

Page 25: Producing XML Documents with Guaranteed “Good” Properties David W. Embley Brigham Young University Wai Y. Mok University of Alabama in Huntsville Sponsored.

Alg. 2: Start Again and Grow

Address

Name Course

DayTimePhone

Major

Address

Name Course

DayTimePhone

Major

N A M

C

C

D

T

A P Algorithm 2Until all vertices and edges are included: Find a start edge and configure it Grow a tree as large as possible -- cut out hierarchy (watch out for optionals) -- add and configure edges

Page 26: Producing XML Documents with Guaranteed “Good” Properties David W. Embley Brigham Young University Wai Y. Mok University of Alabama in Huntsville Sponsored.

Algorithm 2 Yields XNF

Theorem. Given a canonical conceptual-model (CM)hypergraph H, Algorithm 2 generates an XNF scheme-treeforest with respect to the FDs and MVDs of H.

Proof: Based on NNF (Mok, et al., TODS, 1996)

Page 27: Producing XML Documents with Guaranteed “Good” Properties David W. Embley Brigham Young University Wai Y. Mok University of Alabama in Huntsville Sponsored.

Inclusion Dependencies (IDs)

Hobby

FacultyMemberwith Hobby

GradStudentwith Hobby

FacultyMember

GradStudent

Advisor

Department

Program

Hobby

FacultyMemberwith Hobby

GradStudentwith Hobby

FacultyMember

GradStudent

Advisor

Department

Program

optionalconnections

Page 28: Producing XML Documents with Guaranteed “Good” Properties David W. Embley Brigham Young University Wai Y. Mok University of Alabama in Huntsville Sponsored.

Inclusion Dependencies (IDs)

Hobby

Grad-StudentHobbies

Faculty-MemberHobbies

GradStudentwith Hobby

FacultyMemberwith Hobby

FacultyMember

GradStudent

Advisor

Department

Program

Hobby

Grad-StudentHobbies

Faculty-MemberHobbies

GradStudentwith Hobby

FacultyMemberwith Hobby

FacultyMember

GradStudent

Advisor

Department

Program

This constraint makesthis vertex redundant.

Page 29: Producing XML Documents with Guaranteed “Good” Properties David W. Embley Brigham Young University Wai Y. Mok University of Alabama in Huntsville Sponsored.

Canonical CM Hypergraph with IDs

Grad-StudentHobbies

Faculty-MemberHobbies

GradStudentwith Hobby

FacultyMemberwith Hobby

FacultyMember

GradStudent

Advisor

Department

ProgramGrad-StudentHobbies

Faculty-MemberHobbies

GradStudentwith Hobby

FacultyMemberwith Hobby

FacultyMember

GradStudent

Advisor

Department

Program

Page 30: Producing XML Documents with Guaranteed “Good” Properties David W. Embley Brigham Young University Wai Y. Mok University of Alabama in Huntsville Sponsored.

Generating Scheme Trees fromCanonical CM Hypergraph with IDs

Grad-StudentHobbies

Faculty-MemberHobbies

GradStudentwith Hobby

FacultyMemberwith Hobby

FacultyMember

GradStudent

Advisor

Department

ProgramGrad-StudentHobbies

Faculty-MemberHobbies

GradStudentwith Hobby

FacultyMemberwith Hobby

FacultyMember

GradStudent

Advisor

Department

Program

F D

S P HF

HS

Algorithm 3Collapse G/S hierarchiesIf the edges are all binary Execute Algorithm 1Else Execute Algorithm 2

Page 31: Producing XML Documents with Guaranteed “Good” Properties David W. Embley Brigham Young University Wai Y. Mok University of Alabama in Huntsville Sponsored.

Alg. 3: Collapse

Grad-StudentHobbies

Faculty-MemberHobbies

GradStudentwith Hobby

FacultyMemberwith Hobby

FacultyMember

GradStudent

Advisor

Department

ProgramGrad-StudentHobbies

Faculty-MemberHobbies

GradStudentwith Hobby

FacultyMemberwith Hobby

FacultyMember

GradStudent

Advisor

Department

Program

F D

S P HF

HS

Algorithm 3Collapse G/S hierarchiesIf the edges are all binary Execute Algorithm 1Else Execute Algorithm 2

Page 32: Producing XML Documents with Guaranteed “Good” Properties David W. Embley Brigham Young University Wai Y. Mok University of Alabama in Huntsville Sponsored.

Alg. 3: Collapse

F D

S P HF

HS

Algorithm 3Collapse G/S hierarchiesIf the edges are all binary Execute Algorithm 1Else Execute Algorithm 2

GradStudent

Grad-StudentHobbies

FacultyMember

Faculty-MemberHobbies

Department

ProgramGradStudent

Grad-StudentHobbies

FacultyMember

Faculty-MemberHobbies

Department

Program

Page 33: Producing XML Documents with Guaranteed “Good” Properties David W. Embley Brigham Young University Wai Y. Mok University of Alabama in Huntsville Sponsored.

Alg. 3: Execute

F D

S P HF

HS

Algorithm 3Collapse G/S hierarchiesIf the edges are all binary Execute Algorithm 1Else Execute Algorithm 2

GradStudent

Grad-StudentHobbies

FacultyMember

Faculty-MemberHobbies

Department

ProgramGradStudent

Grad-StudentHobbies

FacultyMember

Faculty-MemberHobbies

Department

Program

Page 34: Producing XML Documents with Guaranteed “Good” Properties David W. Embley Brigham Young University Wai Y. Mok University of Alabama in Huntsville Sponsored.

Algorithm 3 Yields XNF

Theorem. Given a canonical conceptual-model (CM)hypergraph H, Algorithm 3 generates an XNF scheme-treeforest with respect to the FDs, MVDs, and IDs of H.

Proof: Based on NNF (Mok, et al., TODS, 1996)

Page 35: Producing XML Documents with Guaranteed “Good” Properties David W. Embley Brigham Young University Wai Y. Mok University of Alabama in Huntsville Sponsored.

Conclusions

XNF ~ “Good” XML No redundancy As few trees as possible

Elegant DTD generation Algorithms to generate XNF Proofs of correctness

[email protected]@email.uah.edu