Author: Graeme C. Simsion and Graham C. Witt Chapter 11 Logical Database Design.

18
Author: Graeme C. Simsion and Graham C. Witt Chapter 11 Logical Database Design

Transcript of Author: Graeme C. Simsion and Graham C. Witt Chapter 11 Logical Database Design.

Page 1: Author: Graeme C. Simsion and Graham C. Witt Chapter 11 Logical Database Design.

Author: Graeme C. Simsion and Graham C. Witt

Chapter 11Logical Database Design

Page 2: Author: Graeme C. Simsion and Graham C. Witt Chapter 11 Logical Database Design.

Copyright: ©2005 by Elsevier Inc. All rights reserved. 2

Are data warehouses / marts just operational databases?

LoadProgram

LoadProgram

LoadProgram

QueryTools

QueryTools

QueryTools

LoadProgram

LoadProgram

LoadProgram

DataMart

DataMart

DataMart

Data Warehouse

SourceData

SourceData

SourceData

LoadProgram

SourceData

LoadProgram

ExternalData

Page 3: Author: Graeme C. Simsion and Graham C. Witt Chapter 11 Logical Database Design.

Copyright: ©2005 by Elsevier Inc. All rights reserved. 3

Differences Compared with Operational Databases

• Different requirements• Database technology may not be

relational– Special multi-dimensional databases now

exist to service this area

• Many techniques of database design can still apply

Page 4: Author: Graeme C. Simsion and Graham C. Witt Chapter 11 Logical Database Design.

Copyright: ©2005 by Elsevier Inc. All rights reserved. 4

Characteristics

• Data integration– With data from many operational databases

• Loads data rather than performs updates• Less predictable database ‘hits’• Complex queries but a simple interface• May emphasize historical data• May summarize other data

Page 5: Author: Graeme C. Simsion and Graham C. Witt Chapter 11 Logical Database Design.

Copyright: ©2005 by Elsevier Inc. All rights reserved. 5

Quality Criteria Revisited

• Completeness• Non-redundancy• Enforcement of (business) rules• Data reusability• Stability and flexibility• Simplicity and elegance• Communication and effectiveness• Performance

Page 6: Author: Graeme C. Simsion and Graham C. Witt Chapter 11 Logical Database Design.

Copyright: ©2005 by Elsevier Inc. All rights reserved. 6

Modelling / designing?

• Data warehouses feed data marts– Need a separate approach– Marts are more focused– Warehouses are more general and must

handle all marts envisaged

• Consider each in turn

• (revisit 16.1 over page)

Page 7: Author: Graeme C. Simsion and Graham C. Witt Chapter 11 Logical Database Design.

Copyright: ©2005 by Elsevier Inc. All rights reserved. 7

LoadProgram

LoadProgram

LoadProgram

QueryTools

QueryTools

QueryTools

LoadProgram

LoadProgram

LoadProgram

DataMart

DataMart

DataMart

Data Warehouse

SourceData

SourceData

SourceData

LoadProgram

SourceData

LoadProgram

ExternalData

Page 8: Author: Graeme C. Simsion and Graham C. Witt Chapter 11 Logical Database Design.

Copyright: ©2005 by Elsevier Inc. All rights reserved. 8

Modeling for Data Warehouses

1. May need an initial corporate model of the business

2. Need to understand existing (operational) data (bases)

3. Determine requirements of the warehouse

4. Determine sources and handling differences

5. Shaping data for data marts

Last two steps are more complex

Page 9: Author: Graeme C. Simsion and Graham C. Witt Chapter 11 Logical Database Design.

Copyright: ©2005 by Elsevier Inc. All rights reserved. 9

Sources and Differences when Designing

• Minimize number of source systems• Carefully judge source data item quality• Reconcile multiple sources

– Eg. Differences in timeframe and currency of item

• Handle compatibility of coding schemes for data items

• Unpack overloaded attributes– Eg. Address containing postcode where postcode

becomes an important part of a warehouse

Page 10: Author: Graeme C. Simsion and Graham C. Witt Chapter 11 Logical Database Design.

Copyright: ©2005 by Elsevier Inc. All rights reserved. 10

Shaping Data for Data Marts

• Need to maximize flexibility• Cater for common purposes between

marts and basic commonality (sorted out when handing requirements for the warehouse)

• If difficult to cater for both flexibility and common purpose opt for flexibility

• The rule: Maximize Flexibility, Minimize Anticipation

Page 11: Author: Graeme C. Simsion and Graham C. Witt Chapter 11 Logical Database Design.

Copyright: ©2005 by Elsevier Inc. All rights reserved. 11

Modeling for Data Marts

• Modeling for general business people– Little technical knowledge– Need for special queries

• Much simpler than operational databases– Facts vs. transaction handling and complex

business rules

• Users of data marts need to move easily between marts

Page 12: Author: Graeme C. Simsion and Graham C. Witt Chapter 11 Logical Database Design.

Copyright: ©2005 by Elsevier Inc. All rights reserved. 12

Basic Data Mart Architecture: Star Schema

• Fact table (only one)

• Dimension tables– To classify fact table into categories

Period

Accounting Month NoQuarter NoYear No

Product

Product IDProduct Type CodeProduct Name

Sale

Accounting Month No *Product ID *Customer ID *Location ID *QuantityValue

Location

Location IDLocation Type CodeRegion CodeState CodeLocation Name

Customer

Customer IDCustomer Type CodeRegion CodeState CodeCustomer Name

Page 13: Author: Graeme C. Simsion and Graham C. Witt Chapter 11 Logical Database Design.

Copyright: ©2005 by Elsevier Inc. All rights reserved. 13

Alternative Architectures: Snowflake

• One fact table

• Dimensions are hierarchical– Collapse 1:many relationships through de-

normalization

ProductType

Product Type IDProduct Type Name

Product

Product IDProduct Type IDProduct Name

Period

Accounting Month NoQuarter NoYear No Sale

Accounting Month NoProduct IDCustomer IDLocation IDQuantityValue

Customer

Customer IDCustomer Type IDRegion IDCustomer Name

Location

Location IDLocation Type IDRegion IDLocation Name

CustomerType

Customer Type IDCustomer Type Name

LocationType

Location Type IDLocation Type Name

Region

Region IDState IDRegion Name

State IDState Name

State

Page 14: Author: Graeme C. Simsion and Graham C. Witt Chapter 11 Logical Database Design.

Copyright: ©2005 by Elsevier Inc. All rights reserved. 14

Snowflakes and Many-to-many Relationships

Cannot be handled without action:1. Ignore less common cases

– But include data in fact (eg. Include #salespeople in fact table)

2. Use a repeating group in dimension table

3. Treat sale-by-salesperson as the fact table

Whatever you do, involve the business users in decision making about architecture

Salesperson

Sale

Product

becredited to

be credited with

beclassified by

classify

Page 15: Author: Graeme C. Simsion and Graham C. Witt Chapter 11 Logical Database Design.

Copyright: ©2005 by Elsevier Inc. All rights reserved. 15

Time-dependent Data• History and time are common in data marts

and you must be able to– Handle different granularities of time– Cater for overlapping periods– Consider hierarchies of time periods

• Slowly changing dimensions are common (eg. People may move customer categories over time)– Speed of dimension data change– Speed of moving fact data from one dimension to

another

Page 16: Author: Graeme C. Simsion and Graham C. Witt Chapter 11 Logical Database Design.

Copyright: ©2005 by Elsevier Inc. All rights reserved. 16

Dimension Change Example

• Customers can change group

• Solutions– Two group foreign keys (now, and at time

of purchase / transaction)– Ignore if change is slow and cost of

ignoring it is low– Hold a history of each customer’s

membership of groups

CustomerGroup

Customer Purchase

Page 17: Author: Graeme C. Simsion and Graham C. Witt Chapter 11 Logical Database Design.

Copyright: ©2005 by Elsevier Inc. All rights reserved. 17

Concluding Word

• Data warehousing and data marts are complex

• Specific design challenges and limitations exist

• Patterns are also useful here– There are resources available

• Do further reading about the area if you’re interested

Page 18: Author: Graeme C. Simsion and Graham C. Witt Chapter 11 Logical Database Design.

Copyright: ©2005 by Elsevier Inc. All rights reserved. 18

Next lecture

• Ontology: What’s all the fuss?– How ontology can and cannot help you?– What role does underlying theory (such as

ontology) play in practical data modelling?– If we have ontology what happens with

creativity?