DWDM_Lect2
-
Upload
ajaykumar988 -
Category
Documents
-
view
214 -
download
2
description
Transcript of DWDM_Lect2
-
Data Warehousing and Data Mining
Sunil [email protected]
-
OutlineReview on RDBMSOLAP Operations
-
DBMS Overview DBMS (Database Management Systems) are designed to
achieve the following four main goals:1. Increase Data Independence
Data & programs are independent Change in data did not affect user programs
2. Reduce Data Redundancy2. Reduce Data Redundancy Data is only stored once Different applications share the same centralized data
3. Increase Data Security Authorize the access to the database Place restrictions on operations that may be performed on data
4. Maintain Data Integrity Same data is used by many users
-
RDBMSA relational database is a database that is
perceived by the user as a collection of tablesThis user view is independent of the actual way
the data is storedTables are sets of data made up from rows and Tables are sets of data made up from rows and
columnsStructure
Very flexible -- create views Keep the data secure (use views) Relation between tables Primary & Foreign Keys Normalization
-
RDBMS- Normalization Normalization is the process of streamlining your tables
and their relationships1. Normal Form (1NF)
Action: Eliminate repeating values in one atom and repeating groups
Rule: Each column must be a fact about .... the key
5
Rule: Each column must be a fact about .... the key2. Normal Form (2NF)
Action: Regroup columns dependent on only one part of the composite key
Rule: Each column must be a fact about .... the whole key3. Normal Form (3NF)
Action: Regroup non-key columns representing a fact about another non-key column
Rule: Each column must be a fact about .... nothing but the key
-
Views and JoinsViews are ways of looking at data from one or
more tablesTables can be related to each other by the data
they hold (called joins)Join Strategies:
6
Join Strategies: Cross Product Inner Join Outer Join
-
SQL
SQL is divided into three major categories:1. DDL Data Definition Language
Used to create, modify or drop database objects2.DML Data Manipulation Language
Used to select, insert , update or delete database data
7
Used to select, insert , update or delete database data(records)3. DCL Data Control Language
Used to provide data object access control E.g. connect to database, grant, revoke
-
Multidimesional Data A data warehouse is based on a multidimensional data
model which views data in the form of a data cube
-
Sample Quary Query:
"What are the net sales, in terms of revenue and quantities of items sold, Per product, Per store and sales region, Per customer and customer sales area, Per day as well as aggregated over time, Over the last two weeks?
Evaluation entails viewing historical sales figures from multiple perspectives such as: Sales (overall) Sales per product Sales per store and per sales region Sales per customer and customer sales area Sales per day and aggregated over time Sales and aggregated sales over given time periods
-
Representation of Query as a Cube
Multi-dimensional data models could be presented using cubes or using amathematical notation technique representing points in a multi-dimensional space, for example: QTY_SOLD = F(S,P,C,t)
-
Query as a Cube : Usage
-
Hypercube Representation If more than three dimensions are present in the
solution, the cube or 3D space representation is no longer usable.
The principle of the cube can be extended to hyper-cube
4th Dimension
-
Sample Multidimensional Representation
-
Six Basic Concepts of MDDMTo build an initial multi-dimensional data model, the
following six base elements have to be identified:1. Measures2. Dimensions3. Grains of dimensions and granularities of measures and
factsfacts4. Facts5. Dimension hierarchies6. Aggregation levels
-
1. Measure A measure is a data item which information analysts use
in their queries to measure the performance or behavior of a business process or a business object
Sample types of measures Quantities Sizes Sizes Amounts Durations, delay And so forth
KPI Key Performance Indicator is a common known synonym for the most important measures of a business.
-
2. Dimensions A dimension is an entity or a collection of related
entities, used by information analysts to identify the context of the measures they work with Examples: Product, Customer, Store, Time
Dimensions are referred to through so-called Dimension keys
Dimensions contain Dimension entities Dimension attributes Dimension hierarchies
As an example, the measure sales revenue only make sense if we know this value for special item, special customer, at a day and in a certain store - we got the four dimensions: item, store, customer and time.
-
3. Granularity The grain of a dimension is the lowest level of detail
available within that dimension Product grain: Item Customer grain: Customer Store grain: Store Time grain: Day Time grain: Day
The granularity of a measure is determined by the combination of the grains of all its dimensions
For example the granularity of the measure QTY_SOLD is: (item, customer, store, day).
Fine granularity enables fine analysis possibilities, but on the other side it has a big impact on the size of the Data Warehouse.
-
Granularity
Here we see an example how fine granularity can show hidden information, like that stores in a region are better performers than other.
-
4. FactsA fact is a collection of related measures and
their associated dimensions, represented by the dimension keys Example: Sales
A fact can represent a business object, a A fact can represent a business object, a business transaction or an event which is used by the information analyst
Facts contain A Fact Identifier Dimension Keys Measures Supportive Attributes
-
5. Dimension Hierarchies Dimensions consist of one or more dimension hierarchies Examples: Hierarchies in the Product Dimension
Product Classification Hierarchy ("Merchandising Hierarchy") Branding Hierarchy
Each dimension hierarchy can include several aggregation levelslevels
Examples: Aggregation Levels in the Product Classification Hierarchy
-
6. Aggregation Levels
Each dimension hierarchy can include several aggregation levelsFor example:
Item: 4-pack Duracell AA Alkaline Batteries. Product: Duracell AA Alkaline BatteriesSub-category: AA Alkaline BatteriesCategory: BatteriesDepartment: Supplies
Dimension hierarchies and aggregation levels are used by users when drilling up or down.
-
Summary
-
Initial MDM- Example
This shows the six base concepts as they apply to our Sales Query and the initial model that corresponds with that query.
-
Star Schema A star schema is a way to represent multidimensional
data in a relational database The star schema logical design, unlike the entity-
relationship model, is specifically geared towards decision support applications.
Fact table stores business data Fact table stores business data Generally several orders of magnitude larger than any dimension
table One key column joined to each dimension table One or more data columns
Multidimensional queries can be built by joining fact and dimension tables
Some products use this method to make a relational OLAP (ROLAP) system
-
Star Schema- Example
-
Sales
time_keybranch_keylocation_keyproduct_key
Time
time_keydaymonthyear
ProductLocation
Branch
branch_keyname
type
1
n
1
n
n
n
???
Logical Data Modeling: A Star Schema Example
num_unitsamount_usd
Product
product_keyname
brandtype
Supplier
supplier_keyname
type
Location
location_keycitystatecountry
1 1
One-to-many relationships between the fact and dimensions. The fact-dimension relationships are certain. Dimensions in star models are often tightly coupled.
-
Snowflake Schema The snowflake model is a further
normalized version of the star schema.
When a dimension table contains data that is not always necessary for queries, too much data may be for queries, too much data may be picked up each time a dimension table is accessed.
To eliminate access to this data, it is kept in a separate table off the dimension, thereby making the star resemble a snowflake.
-
Typical OLAP Operations Roll up (drill-up): summarize data
by climbing up hierarchy or by dimension reduction Drill down (roll down): reverse of roll-up
from higher level summary to lower level summary or detailed data, or introducing new dimensions
Slice and dice project and select project and select
Pivot (rotate) reorient the cube, visualization, 3D to series of 2D planes.
Other operations drill across: involving (across) more than one fact table drill through: through the bottom level of the cube to its back-end
relational tables (using SQL) Rankings time functions: e.g. time avg.