Warehousing dimension star-snowflake_schemas
-
Upload
eric-matthews -
Category
Technology
-
view
1.958 -
download
0
description
Transcript of Warehousing dimension star-snowflake_schemas
Data Warehousing – Dimensions | Star and Snowflake Schemas
Eric Matthews - DataWithUs
Defining Some Key Terms
Dimension
• Data Element• Categorizes each item in a data set• Provides Structured Labeling/Tagging • Dimensions can consist of hierarchies. For example: Date
| Month, Quarter, Year• Dimension tables contain appropriate foreign keys to join
to fact tables.
Dimension – Primary Role
• Data Filtering• Data Grouping• Data Labeling
Fact
• Measures, Counted, or aggregate event. For example: Sales, Admissions, Blood Pressure, Inventory can all be construed as “facts”
• Fact Tables contain appropriate joining keys
Defining Some Key Terms (continued)
Conformed Dimension
• Common set of data structures/attributes• Can cut across many facts, but…• The row headers in an answer must be able to exactly
match, or…• Can be an exact subset
These definitions will come into brighter light as we look at some examples.
Star Schema
• Most atomic form of dimension modeling
• Consists of dimension table(s) modeled around a fact table
• Optimized for querying large data sets
Keys
Facts
Fact Table
Dimension Table
Patient DemographicsDimension Table
Date/Time
Dimension Table
Insurance Carrier
Dimension Table
Referring Physician
Star SchemaLogical
Star Schema – Talking Points for Next Diagram
• Discuss aggregation from source table to fact table rolling up totals (How this needed to be done).
• Discuss the notion of rolling up fact tables to create other fact tables (use account type, financial class, and service code columns in the fact table for basis of discussion)
• Discuss some of the pitfalls of dimension tables by using the physician dimension as an example (example: Physicians can change jobs)
• Discuss the Date Dimension from the perspective of the data in the table… which transitions us to a key point…
Note: Have original table schema as point of reference.
…which is similar to how one needs to resolve foreign keys in reporting the dimension table is a table form of the same concept.
Additionally, If one has well defined master data then populating the dimension tables can be done using a columnar subset of the source master data table.
ACCT_NUMACCT_PTPTR ACCT_GUARANTOR_IDACCT_REFERRING_MDACCT_START_DATEACCT_END_DATEPLAN_SEQ1ACCT_TYPEFCHOSPITAL_SERVICE_CODE
TOT_TOTAL_CHARGESTOT_TOTAL_PAYMENTSTOT_TOTAL_ADJUSTMENTS TOT_BALANCE
ACCT_PTPTR PATIENT_NAMECITYSTATEZIP
ACCT_REFERRING_MDPHYSICIAN_NAMEAFFILIATIONAFFILIATION_CITYAFFILIATION_STATEAFFILIATION_ZIP
PLAN_SEQ1PLAN_NAMECARRIERCITYSTATEZIP
WEEKYEARQUARTERMONTH
Fact Table: Acct Fin Rollup
Dimension TableInsurance Plan/Carrier
Dimension TableDate
Dimension TableReferring Physician
Dimension TablePatient
Snowflake Schema
• Think Star Schema where the dimension tables are normalized
• Can be used to segregate rows in dimension tables that have a high percentage of null data (for faster lookup, you cannot index null )
product_key
Fact Table
Dimension Table
product_keysupplier_key
Snowflake Schema
UnitsCost Per Unit
Dimension Table
supplier_key
Product Info
Supplier Info
Dimension Table
Patient Demographics(Gender, Age)
Conformed Dimension
Fact TableHypertensionStudies
Fact Table
Lab Results
Fact TableDiabetes Assessment
A conformed dimension is a set of data attributes that have been physically implemented in multiple tables using the same structure. A conformed dimension can be applied to different fact tables. For example:
Note: The classic example for a conformed dimension is date. I wanted to offer a different example.
Star and Snowflake schemas are optimized for querying large data sets.
They should support:
Transition to Next Point of Discussion
• OLAP cubes • Business Intelligence and Analytic Applications• Ad hoc queries
The End