My Favorite Issues in Data Warehouse...
Transcript of My Favorite Issues in Data Warehouse...
![Page 1: My Favorite Issues in Data Warehouse Modelingdbis-group.uni-muenster.de/dbms/media/people/lechtenboerger/... · Conceptual representation of multidimensional scenario – System-](https://reader030.fdocuments.us/reader030/viewer/2022040822/5e6c59d03e4b223e172eea73/html5/thumbnails/1.jpg)
University of Munster
My Favorite Issues inData Warehouse Modeling
Jens Lechtenborger
University of Munster & ERCIS, Germany
http://dbms.uni-muenster.de
![Page 2: My Favorite Issues in Data Warehouse Modelingdbis-group.uni-muenster.de/dbms/media/people/lechtenboerger/... · Conceptual representation of multidimensional scenario – System-](https://reader030.fdocuments.us/reader030/viewer/2022040822/5e6c59d03e4b223e172eea73/html5/thumbnails/2.jpg)
Context
Data Warehouse (DW) modeling
• ETL design
• DW schema design
– Database design– Methodical process in several phases
• Focus here: Conceptual schema design
DOLAP 2005, November 5 Jens Lechtenborger 1
![Page 3: My Favorite Issues in Data Warehouse Modelingdbis-group.uni-muenster.de/dbms/media/people/lechtenboerger/... · Conceptual representation of multidimensional scenario – System-](https://reader030.fdocuments.us/reader030/viewer/2022040822/5e6c59d03e4b223e172eea73/html5/thumbnails/3.jpg)
Outline
• Context
• Conceptual Modeling
• Meaning of Features
• Multidimensional Normal Forms
• Schema Versioning
• Conclusions
![Page 4: My Favorite Issues in Data Warehouse Modelingdbis-group.uni-muenster.de/dbms/media/people/lechtenboerger/... · Conceptual representation of multidimensional scenario – System-](https://reader030.fdocuments.us/reader030/viewer/2022040822/5e6c59d03e4b223e172eea73/html5/thumbnails/4.jpg)
Conceptual Modeling (1/5)
• Conceptual representation of multidimensional scenario
– System- and implementation-independent
• No standard data model in sight
– Ad hoc– E/R variants– Object-oriented, based upon UML
• Specification of facts’ structure, i.e.,
– Relevant dimensions and their inner structure(→ dimension schema),
– Measures within their multidimensional contexts(→ fact schema)
DOLAP 2005, November 5 Jens Lechtenborger 2
![Page 5: My Favorite Issues in Data Warehouse Modelingdbis-group.uni-muenster.de/dbms/media/people/lechtenboerger/... · Conceptual representation of multidimensional scenario – System-](https://reader030.fdocuments.us/reader030/viewer/2022040822/5e6c59d03e4b223e172eea73/html5/thumbnails/5.jpg)
Conceptual Modeling (2/5)Fact Schema
PersonCustType:
CompanyCustType:
Branch
RegionCity
Account AccountID
BranchID
CustID CustType
#Transactions YearDay Month Quarter
TransactionsBranch
Time
Job
DOLAP 2005, November 5 Jens Lechtenborger 3
![Page 6: My Favorite Issues in Data Warehouse Modelingdbis-group.uni-muenster.de/dbms/media/people/lechtenboerger/... · Conceptual representation of multidimensional scenario – System-](https://reader030.fdocuments.us/reader030/viewer/2022040822/5e6c59d03e4b223e172eea73/html5/thumbnails/6.jpg)
Conceptual Modeling (3/5)Meaning of Fact Schema
• Universal relation
• Universal relation schema assumption (URSA):Semantics of attribute tied to its name
• Defining dimension levels form key
• Each arc represents functional dependency (FD)
DOLAP 2005, November 5 Jens Lechtenborger 4
![Page 7: My Favorite Issues in Data Warehouse Modelingdbis-group.uni-muenster.de/dbms/media/people/lechtenboerger/... · Conceptual representation of multidimensional scenario – System-](https://reader030.fdocuments.us/reader030/viewer/2022040822/5e6c59d03e4b223e172eea73/html5/thumbnails/7.jpg)
Conceptual Modeling (4/5)Some Features
(Incomplete list)
• Standard Features
– Fact schema represents M:N relationship among dimensions– Arc in dimension schema represents M:1 relationship, i.e., FD
• Typical Features (some with challenges for summarizability)
– M:N relationships among dimension levels(non-strict hierarchies)
– Alternative and parallel paths, possibly including joining levels– Optional levels allowing NULL values
(heterogeneous, unbalanced, non-onto hierarchies)
DOLAP 2005, November 5 Jens Lechtenborger 5
![Page 8: My Favorite Issues in Data Warehouse Modelingdbis-group.uni-muenster.de/dbms/media/people/lechtenboerger/... · Conceptual representation of multidimensional scenario – System-](https://reader030.fdocuments.us/reader030/viewer/2022040822/5e6c59d03e4b223e172eea73/html5/thumbnails/8.jpg)
Conceptual Modeling (5/5)Guidelines
• A rich set of features is good
• A set of guidelines for their proper use is even better
• Let’s consider above typical features in turn
DOLAP 2005, November 5 Jens Lechtenborger 6
![Page 9: My Favorite Issues in Data Warehouse Modelingdbis-group.uni-muenster.de/dbms/media/people/lechtenboerger/... · Conceptual representation of multidimensional scenario – System-](https://reader030.fdocuments.us/reader030/viewer/2022040822/5e6c59d03e4b223e172eea73/html5/thumbnails/9.jpg)
Outline
• Context
• Conceptual Modeling
• Meaning of Features
• Multidimensional Normal Forms
• Schema Versioning
• Conclusions
![Page 10: My Favorite Issues in Data Warehouse Modelingdbis-group.uni-muenster.de/dbms/media/people/lechtenboerger/... · Conceptual representation of multidimensional scenario – System-](https://reader030.fdocuments.us/reader030/viewer/2022040822/5e6c59d03e4b223e172eea73/html5/thumbnails/10.jpg)
Meaning of FeaturesM:N relationships (1/4)
• M:N relationships are generally implicitly understood
• Consider levels Day and City
– Many cities exist at a given day– A city exists for many days
• There is no need to model this M:N relationship(if we don’t do history)
DOLAP 2005, November 5 Jens Lechtenborger 7
![Page 11: My Favorite Issues in Data Warehouse Modelingdbis-group.uni-muenster.de/dbms/media/people/lechtenboerger/... · Conceptual representation of multidimensional scenario – System-](https://reader030.fdocuments.us/reader030/viewer/2022040822/5e6c59d03e4b223e172eea73/html5/thumbnails/11.jpg)
Meaning of FeaturesM:N relationships (2/4)
Consider geographical levels City, Region, State, Country
• One Region per City, i.e., City→ Region
• M:N between Region and State, i.e., Region←→ State
• One Country per State, i.e., State→ Country
City
Location
All
Country
State
Region
Legal instance City Region State Countryci1 r1 s1 co1
ci1 r1 s2 co2
• City and State are in M:N relationship.
• Probably not intended. Different dimension schema needed.
DOLAP 2005, November 5 Jens Lechtenborger 8
![Page 12: My Favorite Issues in Data Warehouse Modelingdbis-group.uni-muenster.de/dbms/media/people/lechtenboerger/... · Conceptual representation of multidimensional scenario – System-](https://reader030.fdocuments.us/reader030/viewer/2022040822/5e6c59d03e4b223e172eea73/html5/thumbnails/12.jpg)
Meaning of FeaturesM:N relationships (3/4)
City
Location
All
Country
Region State
• Implicit M:N relationship
• No problems with summarizability
• Guideline
– Avoid “M:N arcs” within dimensions– Joint work with Bodo Husemann and Gottfried
Vossen, DMDW 2000∗ Synthesize fact schemata∗ Follow FDs to build dimension schemata
– Side remark: Bridge tables of Kimball et al. ariseautomatically as fact schemata
DOLAP 2005, November 5 Jens Lechtenborger 9
![Page 13: My Favorite Issues in Data Warehouse Modelingdbis-group.uni-muenster.de/dbms/media/people/lechtenboerger/... · Conceptual representation of multidimensional scenario – System-](https://reader030.fdocuments.us/reader030/viewer/2022040822/5e6c59d03e4b223e172eea73/html5/thumbnails/13.jpg)
Meaning of FeaturesM:N relationships (4/4)
However
• Maybe there was a reason to place State above Region
• Roll-Up like change in granularity
– In general, regions fit into state boundaries– But not always
• Then, add a new type of “M:N navigational arc”
– This is not Roll-Up! City
Location
All
Country
Region State
DOLAP 2005, November 5 Jens Lechtenborger 10
![Page 14: My Favorite Issues in Data Warehouse Modelingdbis-group.uni-muenster.de/dbms/media/people/lechtenboerger/... · Conceptual representation of multidimensional scenario – System-](https://reader030.fdocuments.us/reader030/viewer/2022040822/5e6c59d03e4b223e172eea73/html5/thumbnails/14.jpg)
Meaning of FeaturesJoining Levels (1/5)
City
Location
All
Country
Region State
City
Location
All
Region State
SCountryRCountry
1..*1..*
1..* 1..*
11
11..*
11
All
Country
Region State
City
Location
DOLAP 2005, November 5 Jens Lechtenborger 11
![Page 15: My Favorite Issues in Data Warehouse Modelingdbis-group.uni-muenster.de/dbms/media/people/lechtenboerger/... · Conceptual representation of multidimensional scenario – System-](https://reader030.fdocuments.us/reader030/viewer/2022040822/5e6c59d03e4b223e172eea73/html5/thumbnails/15.jpg)
Meaning of FeaturesJoining Levels (2/5)
Semantics of schema definable via admissible instances.Consider City c in Region r and State s.
• With universal relations, admissible instances are tables that satisfy FDs
– For left schema, by transitivity of FDs Country of r must be equal toCountry of s
• With objects, associations are implemented via references
– Object c has references to r and s– Objects r and s each have exactly one reference to a country object– That object for r may be distinct from the one of s
• Thus, left schema on previous slide has different meaning than other two,whose meaning is the same
DOLAP 2005, November 5 Jens Lechtenborger 12
![Page 16: My Favorite Issues in Data Warehouse Modelingdbis-group.uni-muenster.de/dbms/media/people/lechtenboerger/... · Conceptual representation of multidimensional scenario – System-](https://reader030.fdocuments.us/reader030/viewer/2022040822/5e6c59d03e4b223e172eea73/html5/thumbnails/16.jpg)
Meaning of FeaturesJoining Levels (3/5)
It’s even worse. . .
• Consider a 3NF implementation of left schema
– Tables for City, Region, State, Country
– Table for City has foreign keys to tables for Region, State
– Tables for Region and State each have a foreign key to table for Country
∗ Those foreign keys need not be “in sync”
• Thus, again a city may wind up in two countries
– Star and snowflake schemata have different semantics!
• What does your favorite OLAP tool do?
• Gap in relational theory. Research in progress.
• Guideline: Use handwritten code to maintain consistency. Be careful!
DOLAP 2005, November 5 Jens Lechtenborger 13
![Page 17: My Favorite Issues in Data Warehouse Modelingdbis-group.uni-muenster.de/dbms/media/people/lechtenboerger/... · Conceptual representation of multidimensional scenario – System-](https://reader030.fdocuments.us/reader030/viewer/2022040822/5e6c59d03e4b223e172eea73/html5/thumbnails/17.jpg)
Meaning of FeaturesJoining Levels (4/5)
Reuse of levels is different from joining
City
Product
Amount Supplier
CustID
State
Sales
...
...
Customer
SuppID
ProdID
Region
Country
...
...
Here, customer and supplier must be in the same city
DOLAP 2005, November 5 Jens Lechtenborger 14
![Page 18: My Favorite Issues in Data Warehouse Modelingdbis-group.uni-muenster.de/dbms/media/people/lechtenboerger/... · Conceptual representation of multidimensional scenario – System-](https://reader030.fdocuments.us/reader030/viewer/2022040822/5e6c59d03e4b223e172eea73/html5/thumbnails/18.jpg)
Meaning of FeaturesJoining Levels (5/5)
Reuse of levels is different from joining
Product
[City]CCity
Amount Supplier
CustIDSales
...
...
Customer
SuppID
ProdID ...
...
[City]SCity
Notice: New notation
DOLAP 2005, November 5 Jens Lechtenborger 15
![Page 19: My Favorite Issues in Data Warehouse Modelingdbis-group.uni-muenster.de/dbms/media/people/lechtenboerger/... · Conceptual representation of multidimensional scenario – System-](https://reader030.fdocuments.us/reader030/viewer/2022040822/5e6c59d03e4b223e172eea73/html5/thumbnails/19.jpg)
Meaning of FeaturesParallel vs Alternative Paths (1/5)
Parallel paths allow levels from different paths in single Group-By clause, e.g.:
City
Location
All
Country
Region State
All
Month Week
Year
Quarter
Day
Time
DOLAP 2005, November 5 Jens Lechtenborger 16
![Page 20: My Favorite Issues in Data Warehouse Modelingdbis-group.uni-muenster.de/dbms/media/people/lechtenboerger/... · Conceptual representation of multidimensional scenario – System-](https://reader030.fdocuments.us/reader030/viewer/2022040822/5e6c59d03e4b223e172eea73/html5/thumbnails/20.jpg)
Meaning of FeaturesParallel vs Alternative Paths (2/5)
Observations on parallel paths
• Including levels from more than one path increases level of detail
– E.g., grouping by Week and Month is OK
• Guideline: There are less problems than you might have thought
DOLAP 2005, November 5 Jens Lechtenborger 17
![Page 21: My Favorite Issues in Data Warehouse Modelingdbis-group.uni-muenster.de/dbms/media/people/lechtenboerger/... · Conceptual representation of multidimensional scenario – System-](https://reader030.fdocuments.us/reader030/viewer/2022040822/5e6c59d03e4b223e172eea73/html5/thumbnails/21.jpg)
Meaning of FeaturesParallel vs Alternative Paths (3/5)
Alternative paths require exclusive choice, e.g.:
Context dependency
CustType:CompanyPerson
CustType:
All
Artist null
P1 P2 P42042
...
...
Airline
all
null
CustType
Customer
Job ... Zoo director
C1...
Branch
CustID
Person Company
Grouping by Job and Branch is inconsistent
DOLAP 2005, November 5 Jens Lechtenborger 18
![Page 22: My Favorite Issues in Data Warehouse Modelingdbis-group.uni-muenster.de/dbms/media/people/lechtenboerger/... · Conceptual representation of multidimensional scenario – System-](https://reader030.fdocuments.us/reader030/viewer/2022040822/5e6c59d03e4b223e172eea73/html5/thumbnails/22.jpg)
Meaning of FeaturesParallel vs Alternative Paths (4/5)
Observations on alternative paths
• Alternative paths usually arise from optional levels
• Use context dependencies to explain presence of structural NULLs
• Or more complex dimension constraints
– Hurtado and Mendelzon, PODS 2002
• Guideline: Avoid/explain optional levels.
– Notice: Subclassing in object-oriented models expresses contextdependencies
DOLAP 2005, November 5 Jens Lechtenborger 19
![Page 23: My Favorite Issues in Data Warehouse Modelingdbis-group.uni-muenster.de/dbms/media/people/lechtenboerger/... · Conceptual representation of multidimensional scenario – System-](https://reader030.fdocuments.us/reader030/viewer/2022040822/5e6c59d03e4b223e172eea73/html5/thumbnails/23.jpg)
Meaning of FeaturesParallel vs Alternative Paths (5/5)
CustID
CustID
CustID
CustType
CustTypeJob
CustIDBranch
Customer
Company
Person
All
Capital C. Subs. CapitalBusiness P. Legal Form
CustTypeLegal Form
DOLAP 2005, November 5 Jens Lechtenborger 20
![Page 24: My Favorite Issues in Data Warehouse Modelingdbis-group.uni-muenster.de/dbms/media/people/lechtenboerger/... · Conceptual representation of multidimensional scenario – System-](https://reader030.fdocuments.us/reader030/viewer/2022040822/5e6c59d03e4b223e172eea73/html5/thumbnails/24.jpg)
Outline
• Context
• Conceptual Modeling
• Meaning of Features
• Multidimensional Normal Forms
• Schema Versioning
• Conclusions
![Page 25: My Favorite Issues in Data Warehouse Modelingdbis-group.uni-muenster.de/dbms/media/people/lechtenboerger/... · Conceptual representation of multidimensional scenario – System-](https://reader030.fdocuments.us/reader030/viewer/2022040822/5e6c59d03e4b223e172eea73/html5/thumbnails/25.jpg)
Multidimensional Normal Forms (1/4)
Joint work with Gottfried Vossen: Multidimensional Normal Forms for DataWarehouse Design, Information Systems, 2003
• Three multidimensional normal forms (MNFs)
• 1MNF based on analysis of FDs
• 2MNF requires context dependencies for optional levels
• 3MNF places restrictions upon context dependencies
DOLAP 2005, November 5 Jens Lechtenborger 21
![Page 26: My Favorite Issues in Data Warehouse Modelingdbis-group.uni-muenster.de/dbms/media/people/lechtenboerger/... · Conceptual representation of multidimensional scenario – System-](https://reader030.fdocuments.us/reader030/viewer/2022040822/5e6c59d03e4b223e172eea73/html5/thumbnails/26.jpg)
Multidimensional Normal Forms (2/4)
Implications of 1MNF
• Faithful representation of the application domain
• Completeness w.r.t. the application domain
• Avoidance of redundancies
• Avoidance of M:N relationships
DOLAP 2005, November 5 Jens Lechtenborger 22
![Page 27: My Favorite Issues in Data Warehouse Modelingdbis-group.uni-muenster.de/dbms/media/people/lechtenboerger/... · Conceptual representation of multidimensional scenario – System-](https://reader030.fdocuments.us/reader030/viewer/2022040822/5e6c59d03e4b223e172eea73/html5/thumbnails/27.jpg)
Multidimensional Normal Forms (3/4)
Implications of 2MNF and 3MNF
• Explanation for structural NULLs allows
– context-sensitive summarizability– avoidance of contradictory queries
• Relational implementation of class hierarchies within dimensions withoutstructural NULLs possible
• Avoidance of alternative paths
DOLAP 2005, November 5 Jens Lechtenborger 23
![Page 28: My Favorite Issues in Data Warehouse Modelingdbis-group.uni-muenster.de/dbms/media/people/lechtenboerger/... · Conceptual representation of multidimensional scenario – System-](https://reader030.fdocuments.us/reader030/viewer/2022040822/5e6c59d03e4b223e172eea73/html5/thumbnails/28.jpg)
Multidimensional Normal Forms (4/4)
Final remarks concerning 2MNF and 3MNF
• Both rely on purely relational techniques
• For object-oriented models considerable simplifications possible
– Disallow optional levels– Construction (see paper in Information Systems mentioned above)∗ As long as optional level l exists, introduce further sub-classes∗ One with l, now mandatory∗ The other without l
DOLAP 2005, November 5 Jens Lechtenborger 24
![Page 29: My Favorite Issues in Data Warehouse Modelingdbis-group.uni-muenster.de/dbms/media/people/lechtenboerger/... · Conceptual representation of multidimensional scenario – System-](https://reader030.fdocuments.us/reader030/viewer/2022040822/5e6c59d03e4b223e172eea73/html5/thumbnails/29.jpg)
Outline
• Context
• Conceptual Modeling
• Meaning of Features
• Multidimensional Normal Forms
• Schema Versioning
• Conclusions
![Page 30: My Favorite Issues in Data Warehouse Modelingdbis-group.uni-muenster.de/dbms/media/people/lechtenboerger/... · Conceptual representation of multidimensional scenario – System-](https://reader030.fdocuments.us/reader030/viewer/2022040822/5e6c59d03e4b223e172eea73/html5/thumbnails/30.jpg)
Schema Versioning (1/14)
Joint work with Matteo Golfarelli, Stefano Rizzi, Gottfried Vossen.Schema Versioning in Data Warehouses: Enabling Cross-Version Querying viaSchema Augmentation. To appear in Data & Knowledge Engineering.
Challenges
• Storage of historical data under changing business requirements
• Non-volatility, in particular consistent re-execution of old queries
Our proposal
• Maintenance of history of schema versions
• Simple graph model representing core of multidimensional models
• Schema augmentation to represent new schema information on old data
• Schema intersection to answer cross-version queries
DOLAP 2005, November 5 Jens Lechtenborger 25
![Page 31: My Favorite Issues in Data Warehouse Modelingdbis-group.uni-muenster.de/dbms/media/people/lechtenboerger/... · Conceptual representation of multidimensional scenario – System-](https://reader030.fdocuments.us/reader030/viewer/2022040822/5e6c59d03e4b223e172eea73/html5/thumbnails/31.jpg)
Schema Versioning (2/14)
Part Customer
Size SaleDistrict
Deal
Type City
Nation
Brand
Region
Shipment
Qty Shipped
Category
Type Carrier
ShipMode
Incentive
Allowance
Year
Month
Container
Terms
Shipping CostsDM
Date
DOLAP 2005, November 5 Jens Lechtenborger 26
![Page 32: My Favorite Issues in Data Warehouse Modelingdbis-group.uni-muenster.de/dbms/media/people/lechtenboerger/... · Conceptual representation of multidimensional scenario – System-](https://reader030.fdocuments.us/reader030/viewer/2022040822/5e6c59d03e4b223e172eea73/html5/thumbnails/32.jpg)
Schema Versioning (3/14)
At t1 = 1/1/2003, the schema undergoes a major revision.
1. The temporal granularity changes from Date to Month.
2. A classification into Subcategories is added to part hierarchy.
3. A new constraint in customer hierarchy states that SaleDistricts belong toNations.
4. The Incentive is independent of shipment Terms.
At t2 = 1/1/2004, another version is created.
1. New measures ShippingCostsEU and ShippingCostsLIT are added.
2. The ShipMode dimension is deleted.
3. A ShipFrom dimension is added.
4. A descriptive attribute PartDescr is added to Part.
DOLAP 2005, November 5 Jens Lechtenborger 27
![Page 33: My Favorite Issues in Data Warehouse Modelingdbis-group.uni-muenster.de/dbms/media/people/lechtenboerger/... · Conceptual representation of multidimensional scenario – System-](https://reader030.fdocuments.us/reader030/viewer/2022040822/5e6c59d03e4b223e172eea73/html5/thumbnails/33.jpg)
Schema Versioning (4/14)
Part Customer
Size SaleDistrict
Deal
Type City
Nation
Brand
Region
Shipment
Qty Shipped
Year
Container
Category
Incentive
Allowance
Shipping CostsEUShipping CostsDM
Month
Subcategory
PartDescr
Terms
ShipFrom
Shipping CostsDM
Shipping CostsLIT
Resulting schema graph
DOLAP 2005, November 5 Jens Lechtenborger 28
![Page 34: My Favorite Issues in Data Warehouse Modelingdbis-group.uni-muenster.de/dbms/media/people/lechtenboerger/... · Conceptual representation of multidimensional scenario – System-](https://reader030.fdocuments.us/reader030/viewer/2022040822/5e6c59d03e4b223e172eea73/html5/thumbnails/34.jpg)
Schema Versioning (5/14)
Part Customer
Size SaleDistrict
Deal
Type City
Nation
Brand
Region
Shipment
Qty Shipped
Year
Container
Category
Incentive
Allowance
Shipping CostsEUShipping CostsDM
Month
Subcategory
PartDescr
Terms
ShipFrom
Shipping CostsDM
Shipping CostsLIT
Three sample query challenges:
• Compute the total quantity of each part category Shipped From eachwarehouse to each customer nation since July 2002.
• Drill down from Category to Subcategory
• Drill down from Nation to SaleDistrict
DOLAP 2005, November 5 Jens Lechtenborger 29
![Page 35: My Favorite Issues in Data Warehouse Modelingdbis-group.uni-muenster.de/dbms/media/people/lechtenboerger/... · Conceptual representation of multidimensional scenario – System-](https://reader030.fdocuments.us/reader030/viewer/2022040822/5e6c59d03e4b223e172eea73/html5/thumbnails/35.jpg)
Schema Versioning (6/14)Schema Modification (1/4)
Four schema modification operations on schema graph
• AddA() to add a new attribute
• DelA() to delete an existing attribute
• AddF() to add an arc involving existing attribute
• DelF() to remove an existing arc
DOLAP 2005, November 5 Jens Lechtenborger 30
![Page 36: My Favorite Issues in Data Warehouse Modelingdbis-group.uni-muenster.de/dbms/media/people/lechtenboerger/... · Conceptual representation of multidimensional scenario – System-](https://reader030.fdocuments.us/reader030/viewer/2022040822/5e6c59d03e4b223e172eea73/html5/thumbnails/36.jpg)
Schema Versioning (7/14)Schema Modification (2/4)
Consider again
Part Customer
Size SaleDistrict
Deal
Type City
Nation
Brand
Region
Shipment
Qty Shipped
Category
Type Carrier
ShipMode
Incentive
Allowance
Year
Month
Container
Terms
Shipping CostsDM
Date
First goal: Delete Date
DOLAP 2005, November 5 Jens Lechtenborger 31
![Page 37: My Favorite Issues in Data Warehouse Modelingdbis-group.uni-muenster.de/dbms/media/people/lechtenboerger/... · Conceptual representation of multidimensional scenario – System-](https://reader030.fdocuments.us/reader030/viewer/2022040822/5e6c59d03e4b223e172eea73/html5/thumbnails/37.jpg)
Schema Versioning (8/14)Schema Modification (3/4)
Result of DelA(Date)
Part Customer
Size SaleDistrict
Deal
Type City
Nation
Brand
Region
Shipment
Qty Shipped
Category
Type Carrier
ShipMode
Incentive
Allowance
Container
Year Terms
Shipping CostsDM
Month
Next goal: Insert Subcategory below CategoryDOLAP 2005, November 5 Jens Lechtenborger 32
![Page 38: My Favorite Issues in Data Warehouse Modelingdbis-group.uni-muenster.de/dbms/media/people/lechtenboerger/... · Conceptual representation of multidimensional scenario – System-](https://reader030.fdocuments.us/reader030/viewer/2022040822/5e6c59d03e4b223e172eea73/html5/thumbnails/38.jpg)
Schema Versioning (9/14)Schema Modification (4/4)
Result ofAddA(Subcategory)
Part
TypeBrand
Shipment
Container
Size
Category
.........
Subcategory
DOLAP 2005, November 5 Jens Lechtenborger 33
![Page 39: My Favorite Issues in Data Warehouse Modelingdbis-group.uni-muenster.de/dbms/media/people/lechtenboerger/... · Conceptual representation of multidimensional scenario – System-](https://reader030.fdocuments.us/reader030/viewer/2022040822/5e6c59d03e4b223e172eea73/html5/thumbnails/39.jpg)
Schema Versioning (9/14)Schema Modification (4/4)
Result ofAddA(Subcategory),AddF(Type→ Subcategory)
Part
TypeBrand
Shipment
Container
Size
Category
.........
Subcategory
DOLAP 2005, November 5 Jens Lechtenborger 33
![Page 40: My Favorite Issues in Data Warehouse Modelingdbis-group.uni-muenster.de/dbms/media/people/lechtenboerger/... · Conceptual representation of multidimensional scenario – System-](https://reader030.fdocuments.us/reader030/viewer/2022040822/5e6c59d03e4b223e172eea73/html5/thumbnails/40.jpg)
Schema Versioning (9/14)Schema Modification (4/4)
Result ofAddA(Subcategory),AddF(Type→ Subcategory),AddF(Subcategory→Category)
Part
TypeBrand
Shipment
Container
Size
Subcategory
Category
.........
DOLAP 2005, November 5 Jens Lechtenborger 33
![Page 41: My Favorite Issues in Data Warehouse Modelingdbis-group.uni-muenster.de/dbms/media/people/lechtenboerger/... · Conceptual representation of multidimensional scenario – System-](https://reader030.fdocuments.us/reader030/viewer/2022040822/5e6c59d03e4b223e172eea73/html5/thumbnails/41.jpg)
Schema Versioning (10/14)Schema Augmentation (1/2)
Previous schema versions associated with augmented schemata
• Previous schema computable via projection from augmented one
• Designer chooses to add information to augmented schemata based oncurrent schema modification, e.g.,
– old data enriched with new attributes, e.g., Subcategory
– more constraints expressed on old data, e.g., SaleDistrict→ Nation
• Augmented schemata used by querying subsystem
DOLAP 2005, November 5 Jens Lechtenborger 34
![Page 42: My Favorite Issues in Data Warehouse Modelingdbis-group.uni-muenster.de/dbms/media/people/lechtenboerger/... · Conceptual representation of multidimensional scenario – System-](https://reader030.fdocuments.us/reader030/viewer/2022040822/5e6c59d03e4b223e172eea73/html5/thumbnails/42.jpg)
Schema Versioning (11/14)Schema Augmentation (2/2)
Element Condition Augm. actionA is measure estimate values for A
(E→ A) ∈ F ′A is dimension disaggregate measure values
A is derived measure compute values for AA ∈ Diff+
A(S,S′)
(E→ A) 6∈ F ′A is property consistently add values for A
f ∈ Diff+F(S,S′) - check if f holds
DOLAP 2005, November 5 Jens Lechtenborger 35
![Page 43: My Favorite Issues in Data Warehouse Modelingdbis-group.uni-muenster.de/dbms/media/people/lechtenboerger/... · Conceptual representation of multidimensional scenario – System-](https://reader030.fdocuments.us/reader030/viewer/2022040822/5e6c59d03e4b223e172eea73/html5/thumbnails/43.jpg)
Schema Versioning (12/14)Cross-version Querying (1/3)
General idea: Formulation context for OLAP query is a schema graph
• Intersection of schema versions is the largest schema for uniform querying
• Query can be answered if formulation context is sub-graph of intersection
• More precisely, augmented schemata instead of real versions
DOLAP 2005, November 5 Jens Lechtenborger 36
![Page 44: My Favorite Issues in Data Warehouse Modelingdbis-group.uni-muenster.de/dbms/media/people/lechtenboerger/... · Conceptual representation of multidimensional scenario – System-](https://reader030.fdocuments.us/reader030/viewer/2022040822/5e6c59d03e4b223e172eea73/html5/thumbnails/44.jpg)
Schema Versioning (13/14)Cross-version Querying (2/3)
Customer
Size SaleDistrict
Deal
Incentive
Type City AllowanceBrand
Region
Month
Year
Container
Shipment
ShippingCostsDM
ShipFrom
Subcategory
Part
Nation
Terms
QtyShipped
Category
Compute the total quantity of each part category shipped from each warehouseto each customer nation since July 2002.
DOLAP 2005, November 5 Jens Lechtenborger 37
![Page 45: My Favorite Issues in Data Warehouse Modelingdbis-group.uni-muenster.de/dbms/media/people/lechtenboerger/... · Conceptual representation of multidimensional scenario – System-](https://reader030.fdocuments.us/reader030/viewer/2022040822/5e6c59d03e4b223e172eea73/html5/thumbnails/45.jpg)
Schema Versioning (14/14)Cross-version Querying (3/3)
Observations
• Query well-formulated only if ShipFrom augmented
• Drilling down from Category to Subcategory only if subcategoriesestablished also for 2002 data
• Drilling down from Nation to SaleDistrict only if FD from sale districts tonations also satisfied before 2003.
DOLAP 2005, November 5 Jens Lechtenborger 38
![Page 46: My Favorite Issues in Data Warehouse Modelingdbis-group.uni-muenster.de/dbms/media/people/lechtenboerger/... · Conceptual representation of multidimensional scenario – System-](https://reader030.fdocuments.us/reader030/viewer/2022040822/5e6c59d03e4b223e172eea73/html5/thumbnails/46.jpg)
Outline
• Context
• Conceptual Modeling
• Meaning of Features
• Multidimensional Normal Forms
• Schema Versioning
• Conclusions
![Page 47: My Favorite Issues in Data Warehouse Modelingdbis-group.uni-muenster.de/dbms/media/people/lechtenboerger/... · Conceptual representation of multidimensional scenario – System-](https://reader030.fdocuments.us/reader030/viewer/2022040822/5e6c59d03e4b223e172eea73/html5/thumbnails/47.jpg)
Conclusions (1/3)
Summary
• FDs help in data warehouse design
• Meaning and potential of multidimensional features sometimesunderspecified
• Sub-classing helps to structure multidimensional schemata
• Versioning with cross-version querying is feasible
DOLAP 2005, November 5 Jens Lechtenborger 39
![Page 48: My Favorite Issues in Data Warehouse Modelingdbis-group.uni-muenster.de/dbms/media/people/lechtenboerger/... · Conceptual representation of multidimensional scenario – System-](https://reader030.fdocuments.us/reader030/viewer/2022040822/5e6c59d03e4b223e172eea73/html5/thumbnails/48.jpg)
Conclusions (2/3)
• Schema versioning offers further potential
– What-if analysis– Horizontal benchmarking
• Open issue: Generalization to hyper-graphs(cross-dimensional attributes, derived measures)
DOLAP 2005, November 5 Jens Lechtenborger 40
![Page 49: My Favorite Issues in Data Warehouse Modelingdbis-group.uni-muenster.de/dbms/media/people/lechtenboerger/... · Conceptual representation of multidimensional scenario – System-](https://reader030.fdocuments.us/reader030/viewer/2022040822/5e6c59d03e4b223e172eea73/html5/thumbnails/49.jpg)
Conclusions (3/3)
There’s more. . .
• Taking full advantage of rich models
• Transformations of conceptual to logical models for ETL
– Alkis Simitsis: Mapping Conceptual to Logical Models for ETLProcesses. DOLAP 2005
• More generally, model-driven design
– Jose-Norberto Mazon et al.: Applying MDA to the Development ofData Warehouses. DOLAP 2005
• Where do the requirements come from?
– Paolo Giorgini et al.: Goal-oriented requirement analysis for datawarehouse design. DOLAP 2005
DOLAP 2005, November 5 Jens Lechtenborger 41
![Page 50: My Favorite Issues in Data Warehouse Modelingdbis-group.uni-muenster.de/dbms/media/people/lechtenboerger/... · Conceptual representation of multidimensional scenario – System-](https://reader030.fdocuments.us/reader030/viewer/2022040822/5e6c59d03e4b223e172eea73/html5/thumbnails/50.jpg)
http://dbms.uni-muenster.de
Thank you for your attention!
DOLAP 2005, November 5 Jens Lechtenborger 42