Author: Graeme C. Simsion and Graham C. Witt Chapter 11 Logical Database Design.
-
Upload
melissa-davidson -
Category
Documents
-
view
215 -
download
3
Transcript of Author: Graeme C. Simsion and Graham C. Witt Chapter 11 Logical Database Design.
![Page 1: Author: Graeme C. Simsion and Graham C. Witt Chapter 11 Logical Database Design.](https://reader036.fdocuments.us/reader036/viewer/2022082917/551479da550346b0158b5489/html5/thumbnails/1.jpg)
Author: Graeme C. Simsion and Graham C. Witt
Chapter 11Logical Database Design
![Page 2: Author: Graeme C. Simsion and Graham C. Witt Chapter 11 Logical Database Design.](https://reader036.fdocuments.us/reader036/viewer/2022082917/551479da550346b0158b5489/html5/thumbnails/2.jpg)
Copyright: ©2005 by Elsevier Inc. All rights reserved. 2
Are data warehouses / marts just operational databases?
LoadProgram
LoadProgram
LoadProgram
QueryTools
QueryTools
QueryTools
LoadProgram
LoadProgram
LoadProgram
DataMart
DataMart
DataMart
Data Warehouse
SourceData
SourceData
SourceData
LoadProgram
SourceData
LoadProgram
ExternalData
![Page 3: Author: Graeme C. Simsion and Graham C. Witt Chapter 11 Logical Database Design.](https://reader036.fdocuments.us/reader036/viewer/2022082917/551479da550346b0158b5489/html5/thumbnails/3.jpg)
Copyright: ©2005 by Elsevier Inc. All rights reserved. 3
Differences Compared with Operational Databases
• Different requirements• Database technology may not be
relational– Special multi-dimensional databases now
exist to service this area
• Many techniques of database design can still apply
![Page 4: Author: Graeme C. Simsion and Graham C. Witt Chapter 11 Logical Database Design.](https://reader036.fdocuments.us/reader036/viewer/2022082917/551479da550346b0158b5489/html5/thumbnails/4.jpg)
Copyright: ©2005 by Elsevier Inc. All rights reserved. 4
Characteristics
• Data integration– With data from many operational databases
• Loads data rather than performs updates• Less predictable database ‘hits’• Complex queries but a simple interface• May emphasize historical data• May summarize other data
![Page 5: Author: Graeme C. Simsion and Graham C. Witt Chapter 11 Logical Database Design.](https://reader036.fdocuments.us/reader036/viewer/2022082917/551479da550346b0158b5489/html5/thumbnails/5.jpg)
Copyright: ©2005 by Elsevier Inc. All rights reserved. 5
Quality Criteria Revisited
• Completeness• Non-redundancy• Enforcement of (business) rules• Data reusability• Stability and flexibility• Simplicity and elegance• Communication and effectiveness• Performance
![Page 6: Author: Graeme C. Simsion and Graham C. Witt Chapter 11 Logical Database Design.](https://reader036.fdocuments.us/reader036/viewer/2022082917/551479da550346b0158b5489/html5/thumbnails/6.jpg)
Copyright: ©2005 by Elsevier Inc. All rights reserved. 6
Modelling / designing?
• Data warehouses feed data marts– Need a separate approach– Marts are more focused– Warehouses are more general and must
handle all marts envisaged
• Consider each in turn
• (revisit 16.1 over page)
![Page 7: Author: Graeme C. Simsion and Graham C. Witt Chapter 11 Logical Database Design.](https://reader036.fdocuments.us/reader036/viewer/2022082917/551479da550346b0158b5489/html5/thumbnails/7.jpg)
Copyright: ©2005 by Elsevier Inc. All rights reserved. 7
LoadProgram
LoadProgram
LoadProgram
QueryTools
QueryTools
QueryTools
LoadProgram
LoadProgram
LoadProgram
DataMart
DataMart
DataMart
Data Warehouse
SourceData
SourceData
SourceData
LoadProgram
SourceData
LoadProgram
ExternalData
![Page 8: Author: Graeme C. Simsion and Graham C. Witt Chapter 11 Logical Database Design.](https://reader036.fdocuments.us/reader036/viewer/2022082917/551479da550346b0158b5489/html5/thumbnails/8.jpg)
Copyright: ©2005 by Elsevier Inc. All rights reserved. 8
Modeling for Data Warehouses
1. May need an initial corporate model of the business
2. Need to understand existing (operational) data (bases)
3. Determine requirements of the warehouse
4. Determine sources and handling differences
5. Shaping data for data marts
Last two steps are more complex
![Page 9: Author: Graeme C. Simsion and Graham C. Witt Chapter 11 Logical Database Design.](https://reader036.fdocuments.us/reader036/viewer/2022082917/551479da550346b0158b5489/html5/thumbnails/9.jpg)
Copyright: ©2005 by Elsevier Inc. All rights reserved. 9
Sources and Differences when Designing
• Minimize number of source systems• Carefully judge source data item quality• Reconcile multiple sources
– Eg. Differences in timeframe and currency of item
• Handle compatibility of coding schemes for data items
• Unpack overloaded attributes– Eg. Address containing postcode where postcode
becomes an important part of a warehouse
![Page 10: Author: Graeme C. Simsion and Graham C. Witt Chapter 11 Logical Database Design.](https://reader036.fdocuments.us/reader036/viewer/2022082917/551479da550346b0158b5489/html5/thumbnails/10.jpg)
Copyright: ©2005 by Elsevier Inc. All rights reserved. 10
Shaping Data for Data Marts
• Need to maximize flexibility• Cater for common purposes between
marts and basic commonality (sorted out when handing requirements for the warehouse)
• If difficult to cater for both flexibility and common purpose opt for flexibility
• The rule: Maximize Flexibility, Minimize Anticipation
![Page 11: Author: Graeme C. Simsion and Graham C. Witt Chapter 11 Logical Database Design.](https://reader036.fdocuments.us/reader036/viewer/2022082917/551479da550346b0158b5489/html5/thumbnails/11.jpg)
Copyright: ©2005 by Elsevier Inc. All rights reserved. 11
Modeling for Data Marts
• Modeling for general business people– Little technical knowledge– Need for special queries
• Much simpler than operational databases– Facts vs. transaction handling and complex
business rules
• Users of data marts need to move easily between marts
![Page 12: Author: Graeme C. Simsion and Graham C. Witt Chapter 11 Logical Database Design.](https://reader036.fdocuments.us/reader036/viewer/2022082917/551479da550346b0158b5489/html5/thumbnails/12.jpg)
Copyright: ©2005 by Elsevier Inc. All rights reserved. 12
Basic Data Mart Architecture: Star Schema
• Fact table (only one)
• Dimension tables– To classify fact table into categories
Period
Accounting Month NoQuarter NoYear No
Product
Product IDProduct Type CodeProduct Name
Sale
Accounting Month No *Product ID *Customer ID *Location ID *QuantityValue
Location
Location IDLocation Type CodeRegion CodeState CodeLocation Name
Customer
Customer IDCustomer Type CodeRegion CodeState CodeCustomer Name
![Page 13: Author: Graeme C. Simsion and Graham C. Witt Chapter 11 Logical Database Design.](https://reader036.fdocuments.us/reader036/viewer/2022082917/551479da550346b0158b5489/html5/thumbnails/13.jpg)
Copyright: ©2005 by Elsevier Inc. All rights reserved. 13
Alternative Architectures: Snowflake
• One fact table
• Dimensions are hierarchical– Collapse 1:many relationships through de-
normalization
ProductType
Product Type IDProduct Type Name
Product
Product IDProduct Type IDProduct Name
Period
Accounting Month NoQuarter NoYear No Sale
Accounting Month NoProduct IDCustomer IDLocation IDQuantityValue
Customer
Customer IDCustomer Type IDRegion IDCustomer Name
Location
Location IDLocation Type IDRegion IDLocation Name
CustomerType
Customer Type IDCustomer Type Name
LocationType
Location Type IDLocation Type Name
Region
Region IDState IDRegion Name
State IDState Name
State
![Page 14: Author: Graeme C. Simsion and Graham C. Witt Chapter 11 Logical Database Design.](https://reader036.fdocuments.us/reader036/viewer/2022082917/551479da550346b0158b5489/html5/thumbnails/14.jpg)
Copyright: ©2005 by Elsevier Inc. All rights reserved. 14
Snowflakes and Many-to-many Relationships
Cannot be handled without action:1. Ignore less common cases
– But include data in fact (eg. Include #salespeople in fact table)
2. Use a repeating group in dimension table
3. Treat sale-by-salesperson as the fact table
Whatever you do, involve the business users in decision making about architecture
Salesperson
Sale
Product
becredited to
be credited with
beclassified by
classify
![Page 15: Author: Graeme C. Simsion and Graham C. Witt Chapter 11 Logical Database Design.](https://reader036.fdocuments.us/reader036/viewer/2022082917/551479da550346b0158b5489/html5/thumbnails/15.jpg)
Copyright: ©2005 by Elsevier Inc. All rights reserved. 15
Time-dependent Data• History and time are common in data marts
and you must be able to– Handle different granularities of time– Cater for overlapping periods– Consider hierarchies of time periods
• Slowly changing dimensions are common (eg. People may move customer categories over time)– Speed of dimension data change– Speed of moving fact data from one dimension to
another
![Page 16: Author: Graeme C. Simsion and Graham C. Witt Chapter 11 Logical Database Design.](https://reader036.fdocuments.us/reader036/viewer/2022082917/551479da550346b0158b5489/html5/thumbnails/16.jpg)
Copyright: ©2005 by Elsevier Inc. All rights reserved. 16
Dimension Change Example
• Customers can change group
• Solutions– Two group foreign keys (now, and at time
of purchase / transaction)– Ignore if change is slow and cost of
ignoring it is low– Hold a history of each customer’s
membership of groups
CustomerGroup
Customer Purchase
![Page 17: Author: Graeme C. Simsion and Graham C. Witt Chapter 11 Logical Database Design.](https://reader036.fdocuments.us/reader036/viewer/2022082917/551479da550346b0158b5489/html5/thumbnails/17.jpg)
Copyright: ©2005 by Elsevier Inc. All rights reserved. 17
Concluding Word
• Data warehousing and data marts are complex
• Specific design challenges and limitations exist
• Patterns are also useful here– There are resources available
• Do further reading about the area if you’re interested
![Page 18: Author: Graeme C. Simsion and Graham C. Witt Chapter 11 Logical Database Design.](https://reader036.fdocuments.us/reader036/viewer/2022082917/551479da550346b0158b5489/html5/thumbnails/18.jpg)
Copyright: ©2005 by Elsevier Inc. All rights reserved. 18
Next lecture
• Ontology: What’s all the fuss?– How ontology can and cannot help you?– What role does underlying theory (such as
ontology) play in practical data modelling?– If we have ontology what happens with
creativity?