Evaluating XML-Extended OLAP Queries Based on a Physical Algebra

21
Evaluating XML-Extended OLAP Queries Based on a Physical Algebra Xuepeng Yin and Torben B. Pedersen Department of Computer Science Aalborg University

description

Evaluating XML-Extended OLAP Queries Based on a Physical Algebra. Xuepeng Yin and Torben B. Pedersen. Department of Computer Science Aalborg University. Problem. OLAP-systems are good for complex analysis queries Easy-to-use Fast Business, science ... - PowerPoint PPT Presentation

Transcript of Evaluating XML-Extended OLAP Queries Based on a Physical Algebra

Page 1: Evaluating XML-Extended OLAP Queries Based on a Physical Algebra

Evaluating XML-Extended OLAP Queries Based on a Physical Algebra

Xuepeng Yin and Torben B. Pedersen

Department of Computer Science Aalborg University

Page 2: Evaluating XML-Extended OLAP Queries Based on a Physical Algebra

# 2

Problem• OLAP-systems are good for complex analysis queries

Easy-to-use Fast Business, science ...

• Problems with physical integration in existing OLAP systems Integrating new data requires (partial) cube rebuild => too slow

• Problems arise with dynamic data Stock quotes, competitors prices, disease info...

• Data will often be available in Extended Markup Language (XML) format

Weather data, map info, price lists, ……

Page 3: Evaluating XML-Extended OLAP Queries Based on a Physical Algebra

# 3

Solution

• Allows the use of external XML data as virtual dimensions

Decoration (extra info) Type information.

Selection Condition on XML data

Grouping Categories by XML data

Logicalfederation

OLAP

OLAP/XML query

OLAP query XML query

<?xml version=”1.0” ?><?xml version=”1.0” ?>

<?xml version=”1.0” ?><?xml version=”1.0” ?>

XML

• Goal: flexible access to XML data from OLAP systems

Page 4: Evaluating XML-Extended OLAP Queries Based on a Physical Algebra

# 4

Overview• Contributions• Architecture of the federation• Linking OLAP and XML• The federation query semantics

The logical algebra The physical algebra Conversion from logical to physical plans

• Plan execution• Query optimization

The query optimizer Execution of an optimized plan

• Performance• Conclusion

Page 5: Evaluating XML-Extended OLAP Queries Based on a Physical Algebra

# 5

Contributions of This Paper• Previous OLAP-XML federation efforts

A logical algebra A partial, straight-forward implementation

• Problems with previous work The logical algebra does not accurately reflect query execution tasks Query optimization is based on an abstract level Implementation is very limited

• Novelties of this paper A physical algebra and simplified query semantics Practical query optimization techniques A full-function, robust query engine Experiments with the query engine

Page 6: Evaluating XML-Extended OLAP Queries Based on a Physical Algebra

# 6

Architecture of the federation• OLAP and XML components• Auxiliary components• Query engine

Query analyzer Query optimizer Query evaluator

Page 7: Evaluating XML-Extended OLAP Queries Based on a Physical Algebra

# 7

Linking OLAP and XML• Links

Relation between a set of dimension values and a set of XML nodes

• Level expressions <level>/<link>/<XPath expression> specifies a concrete link usage Nation/Nlink/Population links nations to populations

NlinkTime Orders EC

Year

Quarter

Month

Customer

Order

Region

Nation

Supplier

Quantity

<Nations><Nation> <NationName>Denmark< / NationName >

<Population>5.3</ Population></ Nation>

</ Nations>

Man.

Brand

Part

Suppliers

Nlink={(DK, n1), (CN, n2), (UK, n3)}

Page 8: Evaluating XML-Extended OLAP Queries Based on a Physical Algebra

# 8

The Federation Query Semantics• The logical algebra

Decoration, Federation selection, Federation Generalized projection,

• The federation query language: SQLXM

SELECT SUM(Quantities), Brand(Part), Nation/Nlink/Population

FROM TC WHERE

Nation/Nlink/Population<30GROUP BY Brand(Part),

Nation/Nlink/Population

)(]//),([ QSUMPNlNPartBrandFed

]30//[ PNlNFed

PNlN //

TCF

Fed

Fed

Page 9: Evaluating XML-Extended OLAP Queries Based on a Physical Algebra

# 9

The Physical Algebra • Includes data retrieval and manipulation operators• A physical plan models real execution tasks

i.e., when, where and how data is processed

• Nine physical operators Querying the OLAP component

Cube selection and generalized projection Data transfer between components

Fact-, dimension- and XML- transfer operators Temporary data manipulations

Decoration, federation selection and generalized projection Inlining XML data

Inlining

Page 10: Evaluating XML-Extended OLAP Queries Based on a Physical Algebra

# 10

Querying the OLAP Component• Cube selection

Has no references to XML data Performs selection over the OLAP cube Intuitively, a SQL SELECT statement

• Cube generalized projection Has no references to XML data Rolls up dimensions and aggregate specified measures at specified

levels Intuitively, a SQL SELECT statement with a GROUP BY clause

cube

cube

Page 11: Evaluating XML-Extended OLAP Queries Based on a Physical Algebra

# 11

Data Transfer Between Components• Fact-transfer

Transfers the OLAP fact data to the temporary component The temporary facts then can be decorated Intuitively, a SQL SELECT INTO statement

• Dimension-transfer Transfers dimension data to the temporary component Used when higher level dimension data is required in the temporary

component

• XML-transfer Transfers XML data to the temporary component Uses XPath expressions to identify XML nodes with decoration

values

Page 12: Evaluating XML-Extended OLAP Queries Based on a Physical Algebra

# 12

Temporary Data Manipulations• Decoration

Decorates the cube by adding a new dimension Intuitively, adds a table with dimension and decoration XML data SELECT * FROM t(supplier, nation) t1, t(nation, population) t2 WHERE t1.nation

=t2.nation

• Federation selection Performs selection over the cube in the temporary component Intuitively, a SQL selection over the temporary tables SELECT t1.* FROM tfact t1, t(supplier, population) t2 WHERE t1.supplier

=t2.supplier and population<30

• Federation generalized projection Rolls up and aggregates the cube in the temporary component Intuitively, a SQL selection with a GROUP BY clause SELECT SUM(Quantity), t2.population FROM tfact t1, t(supplier, population) t2

WHERE t1.supplier= t2.supplier GROUP BY t2.population

Fed

Fed

Page 13: Evaluating XML-Extended OLAP Queries Based on a Physical Algebra

# 13

Inlining XML Data• Denoted as• Comparing federated data in the temporary component is

expensive• Inlining refers to integrating XML data into the OLAP

selections• A resulting predicate

Only references dimension levels and constants Can be evaluated in the OLAP component

Nation Population

DK 5.3

CN 1264.5

UK 19.1

Nation/Nlink/Population<30

Nation=‘DK’ OR Nation=‘UK’

+

Page 14: Evaluating XML-Extended OLAP Queries Based on a Physical Algebra

# 14

From Logical to Physical Plans

PNlN //

TCF

]30//[ PNlNFed

)(]//),([ QSUMPNlNPBFed

PNlN // ],[ NS

],[ BP

extTC ,F

PNlN //

]30//[ PNlNFed

)(]//),([ QSUMPNlNPBFed

Page 15: Evaluating XML-Extended OLAP Queries Based on a Physical Algebra

# 15

Plan Execution

PNlN //

PNlN // ],[ NS

],[ BP

extTC ,F } {

Quantity ExtPrice Supplier Part Order Day

17 17954 S1 P3 11 2/12/96

28 29983 S2 P4 42 30/3/94

2 2388 S3 P3 4 8/12/96

26 26374 S4 P2 20 10/11/93}{ FR

Nation Population

DK 5.3

CN 1264.5

UK 19.1

Supplier Nation

S1 DK

S2 DK

S3 CN

S4 UK

,

},,{ ],[ NSRRRF

PNlN //

5.3DK

1264.5CN

19.1UK

PopulationNation

UK

CN

DK

DK

Nation

S3

S1

S2

S4

Supplier

19.1

1264.5

5.3

5.3

Population

S3

S1

S2

S4

Supplier

},,,{ ],[ RRRR NSF

]30//[ PNlNFed

19.1

1264.5

5.3

5.3

Population

S3

S1

S2

S4

Supplier

S4 10/11/9320P2

S2 30/3/9442P4

8/12/964P3S3

2/12/96P3 11

26374

29983

2388

26

28

2

17954

DayPart OrderExtPrice

S117

SupplierQuantity

19.1 10/11/9320P2

5.3 30/3/9442P4

2/12/96P3 11

26374

29983

26

28

17954

DayPart OrderExtPrice

5.317

PopulationQuantity

}{ ,,, ],[ RRRR NSF

],[ BP

Part Brand

P2 B2

P3 B3

P4 B4

},,,,{ ],[],[ BPNS RRRRRF

)(]//),([ QSUMPNlNPBFed

Quantity Population Brand

17 5.3 B3

28 5.3 B4

26 19.1 B2

}{ FR

]30//[ PNlNFed

)(]//),([ QSUMPNlNPBFed

Page 16: Evaluating XML-Extended OLAP Queries Based on a Physical Algebra

# 16

The Query Optimizer

Pl anRewri t i ng

Logi calPl an

Conversi on

Pl an SpacePruni ng

CostEsti mati on

I ni t i al pl an

Fi nalexecuti on

pl anlljl PP ,,1

),(,),,( 11 ll ljljll PPPPPP

):,(,),:,( 111 lll ljljljlll TPPPTPPP

):,(,),:,( lnlnln llllll TPPPTPPP lmlmlm

P

PP

• Based on the Volcano optimizer• Four phases optimization at one stage

Logical equivalent plan enumeration One-to-one logical to physical conversion Estimating cost of physical plans: Cost-based plan space pruning

),,( 1 nrootplan ttMaxtt

Page 17: Evaluating XML-Extended OLAP Queries Based on a Physical Algebra

# 17

An Optimized Query Plan

TCF

)(]//),([ QSUMPNlNPBFed

])'30//[( PNlNFed

PNlN //

)(]),([ QSUMNPBFed

)(]//),([ QSUMPNlNPBFed

PNlN //

)(],[ QSUMNBCube

])'30//[( PNlNCube

]30//[ PNlNCube

PNlN //

extTC ,F

Page 18: Evaluating XML-Extended OLAP Queries Based on a Physical Algebra

# 18

Execution of the Optimized Plan )(]//),([ QSUMPNlNPBFed

PNlN //

)(],[ QSUMNBCube

])'30//[( PNlNCube

]30//[ PNlN

PNlN //

extTC ,F} {

Nation Population

DK 5.3

CN 1264.5

UK 19.1

PNlN //}{ R

)]30//[( PNlN}{ R

]''''[ UK Nation DKNation=Cube

S4 10/11/9320P2

S2 30/3/9442P4

2/12/96P3 11

26374

29983

26

28

17954

DayPart OrderExtPrice

S117

SupplierQuantity

}{ R )(],[ QSUMNBCube

Quantity Nation Brand

17 DK B3

28 DK B4

26 UK B2}{ R

UK B2

DK B4

B3

26

28

Brand

DK17

NationQuantity

},{ RRF

PNlN //

UK B2

DK B4

B3

26

28

Brand

DK17

NationQuantity

5.3DK

1264.5CN

19.1UK

PopulationNation}{ , FRR

)(]//),([ QSUMPNlNPBFed

}{ FR

Quantity Population Brand

17 5.3 B3

28 5.3 B4

26 19.1 B2

Page 19: Evaluating XML-Extended OLAP Queries Based on a Physical Algebra

# 19

Performance

1

10

100

1000

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16Query Type

Eval

uatio

n Ti

me

(in s

econ

ds)

Federated Cached Integrated

1

10

100

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16Query type

Eval

uatio

n Ti

me

(in s

econ

ds)

• One experiment compared: a. Our federated solution b. Physical integration c. Federating cached XML

data

• Data 100M fact data based on

TPC-H benchmark 11MB and 2KB XML data

• Queries• Result:

Comparable to b for small amounts of data

Use c for large amounts of data

Page 20: Evaluating XML-Extended OLAP Queries Based on a Physical Algebra

# 20

Related Work• Generic data integration

Relational, XML, semi-structured, OO,… + combinations Do not consider OLAP DB properties such as automatic

aggregation, dimension hierarchies and correct aggregation

• OLAP-object federations Current solution offers much more general use of external data Current solution not restricted to rigid object schemas Current solution allows irregular data

• Previous OLAP-XML federation efforts A logical algebra A partial, straight-forward implementation

Page 21: Evaluating XML-Extended OLAP Queries Based on a Physical Algebra

# 21

Conclusion• OLAP handles schema changes and dynamic data poorly• Solutions

Logical federation of OLAP and XML A physical algebra models actual execution tasks Optimized query evaluation

• Experiments suggest feasibility • Future work

More optimization techniques Advanced evaluation techniques Co-operative development with OLAP query tool vendor