Planning & Project Management
-
Upload
fecaxeyivu -
Category
Documents
-
view
216 -
download
0
Transcript of Planning & Project Management
-
8/11/2019 Planning & Project Management
1/65
Prof. Chandan Singhavi
-
8/11/2019 Planning & Project Management
2/65
Requirements gathering
Requirements definition document with
information packages)
Data design
Dimensional model
-
8/11/2019 Planning & Project Management
3/65
-
8/11/2019 Planning & Project Management
4/65
Choosing the process Selecting the subjects from the information pakages for
the first set of logical structures to be designed
Choosing the grain Determining the level of detail for the data in the data
structures
Identifying and conforming dimensions Making sure that each particular data element in every
business dimension is conformed to one another
Choosing the facts Selecting the metrics or units of measures (eg, product
sales unit, dollar sales, dollar revenue) to be included infirst set
Choosing the duration of the database Determining how far back in time you should go for
historical data.
-
8/11/2019 Planning & Project Management
5/65
It is logical design technique to structure thebusiness dimensions and metrics that areanalyzed along these techniques
Gets its name form business dimensions
The model has also proved to provide high
performance for queries and analysis
Information Package is the foundation
-
8/11/2019 Planning & Project Management
6/65
Reviewing the information package diagram,we notice three types of entities
Measurements or metrics
Business dimensions
Attributes for each business dimension
-
8/11/2019 Planning & Project Management
7/65
-
8/11/2019 Planning & Project Management
8/65
It represents business dimensions Facts are used for the analysis
The attributes in the dimension table acts as filtersin our queries
Each dimension table has an equal chance of aquery
Each dimension table has direct relationship withthe fact table in the middle
Each dimension table has one to many relationship
with the fact table Such organization looks like a STAR
-
8/11/2019 Planning & Project Management
9/65
-
8/11/2019 Planning & Project Management
10/65
Dimension modeling should primarilyfacilitate queries and analysis
Typical query could be
How much sales proceeds did the jeepcherokee, year 2000 model with standardoptions, generate in jan 2000 at big sam
autodealership for buyers who own theirhomes and who took three years leases,financed by diamler chrysler financing.
-
8/11/2019 Planning & Project Management
11/65
Some criteria for combining the tables intodimension model
Model should provide best data access Must be query centric Optimize for queries and analysis Must show the dimension tables interact with fact
table Should be structured in a way that every
dimension should interact equally to the facttable Should allow drilling down or rolling up along
dimension hierarchy
-
8/11/2019 Planning & Project Management
12/65
-
8/11/2019 Planning & Project Management
13/65
-
8/11/2019 Planning & Project Management
14/65
definition A simple database design in which dimensional
data are separated from fact or eventdata(describing individual businesstransactions).
Also known as dimension model Suitable for Ad-hoc queries Simplest star schema consists of one fact table
surrounded by many dimension tables
Fact table Contain factual or quantitative data about a
business such as Units sold, order booked etc. PK of fact table is composite of FK
-
8/11/2019 Planning & Project Management
15/65
-
8/11/2019 Planning & Project Management
16/65
-
8/11/2019 Planning & Project Management
17/65
-
8/11/2019 Planning & Project Management
18/65
Key component of dimension model is set ofdimension tables
-
8/11/2019 Planning & Project Management
19/65
-
8/11/2019 Planning & Project Management
20/65
-
8/11/2019 Planning & Project Management
21/65
-
8/11/2019 Planning & Project Management
22/65
-
8/11/2019 Planning & Project Management
23/65
STAR scheme is a relational model, it is not a
normalized model:
Easy for user to understand
Optimizes navigation
Most suitable for query processing
STARjoin and STARindex
-
8/11/2019 Planning & Project Management
24/65
Over time size of fact table goes onincreasing -- may be new records or updates
Dimension table are more stable and lessvolatile
-
8/11/2019 Planning & Project Management
25/65
Slowly changing dimensions
Type 1 changes: correction of errors
Type 2 changes: preservation of history
Type 3 changes: tentative soft revisions
-
8/11/2019 Planning & Project Management
26/65
Most dimensions are constant over time Change slowly
Product key of source record does not change
Description and other attribute Changesslowly over the time
overwriting is not always appropriate
-
8/11/2019 Planning & Project Management
27/65
Principles Change relate to correction of errors
Change in the source system have nosignificance
Need not be preserve in the data warehouse
-
8/11/2019 Planning & Project Management
28/65
-
8/11/2019 Planning & Project Management
29/65
True changes in the source system Need to preserve history in the data
warehouse
Partitions the history in the data warehouse
Every change for the same attribute must bepreserve
-
8/11/2019 Planning & Project Management
30/65
-
8/11/2019 Planning & Project Management
31/65
They usually relate to soft or tentativechanges in the source system
There is a need to keep track of history withold and new values of the changed attribute
They are used to compare performanceacross the transition.
They provide the ability to track forward and
backward
-
8/11/2019 Planning & Project Management
32/65
-
8/11/2019 Planning & Project Management
33/65
-
8/11/2019 Planning & Project Management
34/65
Large dimensions, multiple hierarchies
Rapidly changing dimensions
Junk dimensions
-
8/11/2019 Planning & Project Management
35/65
Very deep or wide
Customer
Product
-
8/11/2019 Planning & Project Management
36/65
Need to address following issues by using effectivedesign methods, by choosing proper indexes and byapplying other optimization techniques
Population of very large dimension tables
Browse performance of unconstrained dimension,especially where the cardinality of the attribute is low
Browsing time for cross constrained values ofdimension attributes
Inefficiencies in fact table queries when largedimensions need to be used
Additional rows created to handle type 2 slowingchanging dimension
-
8/11/2019 Planning & Project Management
37/65
-
8/11/2019 Planning & Project Management
38/65
Dimension table could be littered with a verylarge number of additional rows created everytime there is an incremental load.
Effective approach is break the largedimension table may be separated into one ormore simpler dimension table.
-
8/11/2019 Planning & Project Management
39/65
-
8/11/2019 Planning & Project Management
40/65
Miscellaneous flags and textual field
Choices Exclude and discard all flags and texts. Place the flags and texts unchanged in the fact
table Make each flag and text a separate dimension
table on its own. Keep only those flags and texts that are
meaningful; group all the useful flags into a
single dimension junk These junk dimension attributes are useful for
constraining queries based on flag/text values.
-
8/11/2019 Planning & Project Management
41/65
Options to normalize
Advantages and disadvantages
When to snowflake
-
8/11/2019 Planning & Project Management
42/65
Snowflaking is a method of normalizing thedimension tables in a STAR schema.
When you completely normalize all the
dimension tables, the resultant structureresembles a snowflake with the fact table inthe middle.
-
8/11/2019 Planning & Project Management
43/65
-
8/11/2019 Planning & Project Management
44/65
-
8/11/2019 Planning & Project Management
45/65
-
8/11/2019 Planning & Project Management
46/65
-
8/11/2019 Planning & Project Management
47/65
Advantages Small saving in storage space
Normalized structures are easier to update andmaintain
Disadvantages Schema less intuitive and end users are put off by
the complexity
Ability to browse through the contents difficult Degraded query performance because of additionaljoins
-
8/11/2019 Planning & Project Management
48/65
Snow flaking is not generally recommendedin a data warehouse environment. Queryperformance takes highest priority
-
8/11/2019 Planning & Project Management
49/65
space Sub dimension
-
8/11/2019 Planning & Project Management
50/65
-
8/11/2019 Planning & Project Management
51/65
-
8/11/2019 Planning & Project Management
52/65
-
8/11/2019 Planning & Project Management
53/65
-
8/11/2019 Planning & Project Management
54/65
-
8/11/2019 Planning & Project Management
55/65
Tremendous boost to performance
-
8/11/2019 Planning & Project Management
56/65
-
8/11/2019 Planning & Project Management
57/65
Effect of sparsity on aggregation When you go for higher levels of aggregates, The
sparsity percentage moves up. You have to payattention to this problem
Aggregation option
-
8/11/2019 Planning & Project Management
58/65
-
8/11/2019 Planning & Project Management
59/65
Almost all data warehouses contain multiple
STAR scheme structures figure 11-16)
Snapshot and transaction tables figure 11-
17)
Core and custom tables figure 11-18)
Supporting enterprise value chain
Conforming dimensions,
standardizing facts
-
8/11/2019 Planning & Project Management
60/65
-
8/11/2019 Planning & Project Management
61/65
-
8/11/2019 Planning & Project Management
62/65
-
8/11/2019 Planning & Project Management
63/65
A conformed dimension is a comprehensivecombination of attributes from the sourcesystem after resolving all discrepancies andconflicts.
Confirm dimension allows rollup acrossdatamarts
-
8/11/2019 Planning & Project Management
64/65
-
8/11/2019 Planning & Project Management
65/65