Inexact Querying of XML. XML Data May be Irregular Relational data is regular and organized. XML may...

34
Inexact Querying of XML Inexact Querying of XML

Transcript of Inexact Querying of XML. XML Data May be Irregular Relational data is regular and organized. XML may...

Page 1: Inexact Querying of XML. XML Data May be Irregular Relational data is regular and organized. XML may be very different. –Data is incomplete: Missing values.

Inexact Querying of XMLInexact Querying of XML

Page 2: Inexact Querying of XML. XML Data May be Irregular Relational data is regular and organized. XML may be very different. –Data is incomplete: Missing values.

XML Data May be IrregularXML Data May be Irregular

• Relational data is regular and organized. XML may be

very different.

– Data is incomplete: Missing values of attributes in elements

– Data has structural variations: Relationships between

elements are represented differently in different parts of the

document

– Data has ontology variations: Different labels are used to

describe nodes of the same type

• (Note: In some of the upcoming slides, we have labels

on edges instead of on nodes.)

Page 3: Inexact Querying of XML. XML Data May be Irregular Relational data is regular and organized. XML may be very different. –Data is incomplete: Missing values.

1

11 12 14

Movie Database

Movie

Movie

Actor

22 23 25 26 27 2829

T.V. Series

Film

ActorActor

TitleName Name

Name

Title

Title Title

31 3234 35

KyleMacLachlan

NataliePortman

Harrison Ford

1977

Dune

StarWars

TwinPeaks

36

Year

1984

24

Year

21

Actor

Name

30

Mark Hamill

Léon

Movie

13

Title

33Magnolia

The movie has a year attribute

Incomplete DataIncomplete Data

The year of the movie is missing

Page 4: Inexact Querying of XML. XML Data May be Irregular Relational data is regular and organized. XML may be very different. –Data is incomplete: Missing values.

1

11 12 14

Movie Database

Movie

Movie

Actor

22 23 25 26 27 2829

T.V. Series

Film

ActorActor

TitleName Name

Name

Title

Title Title

31 3234 35

KyleMacLachlan

NataliePortman

Harrison Ford

1977

Dune

StarWars

TwinPeaks

36

Year

1984

24

Year

Actor

Name

30

Mark Hamill

Léon

Movie

13

Title

33Magnolia

Variations in StructureVariations in Structure

11

Movie below Actor

29

14

2121

Actor below Movie

Page 5: Inexact Querying of XML. XML Data May be Irregular Relational data is regular and organized. XML may be very different. –Data is incomplete: Missing values.

1

11 12 13

Movie Database

Movie

Movie

Actor

22 23 25 26 27 2829

T.V. Series

Film

ActorActor

TitleName Name

Name

Title

Title Title

31 3233 34

KyleMacLachlan

NataliePortman

Harrison Ford

1977

Dune

StarWars

TwinPeaks

35

Year

1984

24

Year

21

Actor

Name

30

Mark Hamill

Léon

Movie

13

Title

34Magnolia

A movie label A film label

Ontology VariationsOntology Variations

Page 6: Inexact Querying of XML. XML Data May be Irregular Relational data is regular and organized. XML may be very different. –Data is incomplete: Missing values.

The description of the

schema is large

(e.g., a DTD of XML)

The description of the

schema is large

(e.g., a DTD of XML)

It is difficult to use the schema when formulating queries

It is difficult to use the schema when formulating queries

Data is contributedby many users in a variety of designs

Data is contributedby many users in a variety of designs

The query should deal with differentstructures of data

The query should deal with differentstructures of data

The structure of the

database is changed

frequently

The structure of the

database is changed

frequently

Queries should be rewritten frequentlyQueries should be rewritten frequently

Need to allow the user to write an “approximate query” and have the query processor deal with it

Need to allow the user to write an “approximate query” and have the query processor deal with it

Page 7: Inexact Querying of XML. XML Data May be Irregular Relational data is regular and organized. XML may be very different. –Data is incomplete: Missing values.

The ProblemThe Problem

• In many different domains, we are given the option

to query some source of information

• Usually, the user only gets results if the query can

be completely answered (satisfied)

• In many domains, this is not appropriate, e.g.,

– The user is not familiar with the database

– The database does not contain complete information

– There is a mismatch between the ontology of the user

and that of the database

Page 8: Inexact Querying of XML. XML Data May be Irregular Relational data is regular and organized. XML may be very different. –Data is incomplete: Missing values.

Example 1Example 1

ישוב: באר שבע 03איזור חיוג :

Page 9: Inexact Querying of XML. XML Data May be Irregular Relational data is regular and organized. XML may be very different. –Data is incomplete: Missing values.

היישוב הנבחר אינו מופיע באיזור החיוג הנבחר!

Page 10: Inexact Querying of XML. XML Data May be Irregular Relational data is regular and organized. XML may be very different. –Data is incomplete: Missing values.

עלייה: חיפה – טכניוןירידה: אילת

Page 11: Inexact Querying of XML. XML Data May be Irregular Relational data is regular and organized. XML may be very different. –Data is incomplete: Missing values.

אין קו ישיר המחבר בין הנקודות הנבחרות

Page 12: Inexact Querying of XML. XML Data May be Irregular Relational data is regular and organized. XML may be very different. –Data is incomplete: Missing values.

עלייה: ירידה: אילת

Page 13: Inexact Querying of XML. XML Data May be Irregular Relational data is regular and organized. XML may be very different. –Data is incomplete: Missing values.

פרטי המקצוע: בסיסי נתונים

Page 14: Inexact Querying of XML. XML Data May be Irregular Relational data is regular and organized. XML may be very different. –Data is incomplete: Missing values.

לא נמצאו מקצועות מתאימים

Page 15: Inexact Querying of XML. XML Data May be Irregular Relational data is regular and organized. XML may be very different. –Data is incomplete: Missing values.

What Do Users Need?What Do Users Need?

• Users need a way to get interesting partial answers

to their queries, especially if a complete answer does

not exist

• These partial answers should contain maximal

information

• Problem:

– It is easy to define when an answer satisfies a query

– Hard to say when an answer that does not satisfy a query is

of interest

– Hard to say which incomplete answers are better than others

Page 16: Inexact Querying of XML. XML Data May be Irregular Relational data is regular and organized. XML may be very different. –Data is incomplete: Missing values.

Modeling a Database and a Modeling a Database and a QueryQuery

• It is useful to model both databases and

queries as labeled directed graphs

– Clean mathematical modeling!

– Captures the essentials of XPath, XQuery

Page 17: Inexact Querying of XML. XML Data May be Irregular Relational data is regular and organized. XML may be very different. –Data is incomplete: Missing values.

University DatabaseUniversity Database

Technion

University

NameDept Dept

Name Faculty Name Faculty

Professor

Name Teaches Teaches

Lecturer

Name Teaches

ComputerScience

ChanaIsraeli

Databases Bioinformatics AviLevy

Biology

MolecularBiology

Page 18: Inexact Querying of XML. XML Data May be Irregular Relational data is regular and organized. XML may be very different. –Data is incomplete: Missing values.

QueryQuery University

Dept

Faculty

Name

• Exact answers are

defined by exact

matchings, i.e.,

subgraph

homorphisms

• This query asks for the

names of all faculty

members (of any type)How would you write

this in XPath?

Page 19: Inexact Querying of XML. XML Data May be Irregular Relational data is regular and organized. XML may be very different. –Data is incomplete: Missing values.

Exact AnswersExact Answers

Technion

University

NameDept Dept

Name Faculty Name Faculty

Professor

Name Teaches Teaches

Lecturer

NameTeaches

ComputerScience

ChanaIsraeli

Databases Bioinformatics AviLevy

Biology

MolecularBiology

University

Dept

Faculty

Name

Page 20: Inexact Querying of XML. XML Data May be Irregular Relational data is regular and organized. XML may be very different. –Data is incomplete: Missing values.

Exact AnswersExact Answers

Technion

University

NameDept Dept

Name Faculty Name Faculty

Professor

Name Teaches Teaches

Lecturer

NameTeaches

ComputerScience

ChanaIsraeli

Databases Bioinformatics AviLevy

Biology

MolecularBiology

University

Dept

Faculty

Name

Page 21: Inexact Querying of XML. XML Data May be Irregular Relational data is regular and organized. XML may be very different. –Data is incomplete: Missing values.

Slightly More Complex QuerySlightly More Complex Query

University

Dept

Faculty

Name

• Returns faculty

members only from the

Biology Department

Biology

Page 22: Inexact Querying of XML. XML Data May be Irregular Relational data is regular and organized. XML may be very different. –Data is incomplete: Missing values.

Exact Answers Are Not Always Exact Answers Are Not Always UsefulUseful

• Problems with exact answers:

– labels are not always known

– content may be unknown, misspelled, etc.

– structure may be unknown, or may vary from one

representation to another

– we may actually want to perform a search, since the

query is a vague hypothesis

– do not allow users to get partial/vague answers

where none better exist

Page 23: Inexact Querying of XML. XML Data May be Irregular Relational data is regular and organized. XML may be very different. –Data is incomplete: Missing values.

Manually Adding InexactnessManually Adding Inexactness

• One can use language constructs in order to

get more flexible queries

• Example: Suppose we want to find courses,

with teachers that teach them but we don’t

know which hierarchy exists in the database:

– for each teacher, there is a list of courses or

– for each course, there is a list of teachers

– or both…

Page 24: Inexact Querying of XML. XML Data May be Irregular Relational data is regular and organized. XML may be very different. –Data is incomplete: Missing values.

Technion

University

NameDept Dept

Name Faculty Name Faculty

Teacher

Name Course Course

Teacher

NameCourse

ComputerScience

ChanaIsraeli

Databases Bioinformatics AviLevy

Biology

MolecularBiology

Teacher

Course

Query Needed:

Page 25: Inexact Querying of XML. XML Data May be Irregular Relational data is regular and organized. XML may be very different. –Data is incomplete: Missing values.

Technion

University

NameDept Dept

Name Faculty Name Faculty

Course

Name Teacher Teacher

Course

Name

ComputerScience

Bioinformatics ChanaIsraeli

Avi Levy

Biology

MolecularBiology

Course

Teacher

Query Needed:

Page 26: Inexact Querying of XML. XML Data May be Irregular Relational data is regular and organized. XML may be very different. –Data is incomplete: Missing values.

Manually Adding Inexactness Manually Adding Inexactness (cont.)(cont.)

• If we don’t know the hierarchy, we need

Teacher

Course

Course

Teacher

Union

Page 27: Inexact Querying of XML. XML Data May be Irregular Relational data is regular and organized. XML may be very different. –Data is incomplete: Missing values.

Manually Adding Inexactness Manually Adding Inexactness (cont.)(cont.)

• If we don’t know the hierarchy, we need:

• If we don’t know what exactly the labels are, we

might need:

Teacher

Course

Course

Teacher

Union

Teacher or Lecturer or Professor

Course or Seminar or Lab

UnionTeacher or Lecturer or

Professor

Course or Seminar or Lab

Page 28: Inexact Querying of XML. XML Data May be Irregular Relational data is regular and organized. XML may be very different. –Data is incomplete: Missing values.

Help!Help!

Page 29: Inexact Querying of XML. XML Data May be Irregular Relational data is regular and organized. XML may be very different. –Data is incomplete: Missing values.

IntuitionIntuition

• Users write regular queries, stating what

they are looking for

• The query processor uses a built-in strategy

to find answers that exactly satisfy the query

or inexactly satisfy the query

• Burden is on the query processor, not on the

user

Page 30: Inexact Querying of XML. XML Data May be Irregular Relational data is regular and organized. XML may be very different. –Data is incomplete: Missing values.

Inexact AnswersInexact Answers

• Many different definitions have been given

– For each definition, query processing algorithms have been

defined

• Examples:

– Allow some of the nodes of the query to be unmatched

– Allow edges in the query to be matched to paths in the

database

– Allow nodes to be matched to nodes with labels that have a

similar meaning

• Be careful so that answers are meaningful!

Page 31: Inexact Querying of XML. XML Data May be Irregular Relational data is regular and organized. XML may be very different. –Data is incomplete: Missing values.

Name

Area Code

City

Allow Unmatched Nodes: Bezeq Allow Unmatched Nodes: Bezeq QueryQuery

Phone Number

שמולביץ

באר שבע

03

Page 32: Inexact Querying of XML. XML Data May be Irregular Relational data is regular and organized. XML may be very different. –Data is incomplete: Missing values.

Eilat

Matching Edges to Paths: Matching Edges to Paths: Egged QueryEgged Query

Source

Destination

Technion-Haifa

Page 33: Inexact Querying of XML. XML Data May be Irregular Relational data is regular and organized. XML may be very different. –Data is incomplete: Missing values.

Similar Meaning LabelsSimilar Meaning Labels

Course

Name Details

בסיסי נתוניםבסיסי נתונים

Page 34: Inexact Querying of XML. XML Data May be Irregular Relational data is regular and organized. XML may be very different. –Data is incomplete: Missing values.

Other Types of InexactnessOther Types of Inexactness

• Many other definitions have been given, e.g.,

– allow permutations of nodes in the query

– allow child nodes to be promoted

– interconnection

• Summary: Inexactness basically means that

we relax some of the query requirements!