Warehousing dimension star-snowflake_schemas

12
Data Warehousing – Dimensions | Star and Snowflake Schemas Eric Matthews - DataWithUs

description

High level presentation on the use of dimensions, star and snowflake schemas in data warehousing.

Transcript of Warehousing dimension star-snowflake_schemas

Page 1: Warehousing dimension star-snowflake_schemas

Data Warehousing – Dimensions | Star and Snowflake Schemas

Eric Matthews - DataWithUs

Page 2: Warehousing dimension star-snowflake_schemas

Defining Some Key Terms

Dimension

• Data Element• Categorizes each item in a data set• Provides Structured Labeling/Tagging • Dimensions can consist of hierarchies. For example: Date

| Month, Quarter, Year• Dimension tables contain appropriate foreign keys to join

to fact tables.

Dimension – Primary Role

• Data Filtering• Data Grouping• Data Labeling

Fact

• Measures, Counted, or aggregate event. For example: Sales, Admissions, Blood Pressure, Inventory can all be construed as “facts”

• Fact Tables contain appropriate joining keys

Page 3: Warehousing dimension star-snowflake_schemas

Defining Some Key Terms (continued)

Conformed Dimension

• Common set of data structures/attributes• Can cut across many facts, but…• The row headers in an answer must be able to exactly

match, or…• Can be an exact subset

These definitions will come into brighter light as we look at some examples.

Page 4: Warehousing dimension star-snowflake_schemas

Star Schema

• Most atomic form of dimension modeling

• Consists of dimension table(s) modeled around a fact table

• Optimized for querying large data sets

Page 5: Warehousing dimension star-snowflake_schemas

Keys

Facts

Fact Table

Dimension Table

Patient DemographicsDimension Table

Date/Time

Dimension Table

Insurance Carrier

Dimension Table

Referring Physician

Star SchemaLogical

Page 6: Warehousing dimension star-snowflake_schemas

Star Schema – Talking Points for Next Diagram

• Discuss aggregation from source table to fact table rolling up totals (How this needed to be done).

• Discuss the notion of rolling up fact tables to create other fact tables (use account type, financial class, and service code columns in the fact table for basis of discussion)

• Discuss some of the pitfalls of dimension tables by using the physician dimension as an example (example: Physicians can change jobs)

• Discuss the Date Dimension from the perspective of the data in the table… which transitions us to a key point…

Note: Have original table schema as point of reference.

…which is similar to how one needs to resolve foreign keys in reporting the dimension table is a table form of the same concept.

Additionally, If one has well defined master data then populating the dimension tables can be done using a columnar subset of the source master data table.

Page 7: Warehousing dimension star-snowflake_schemas

ACCT_NUMACCT_PTPTR ACCT_GUARANTOR_IDACCT_REFERRING_MDACCT_START_DATEACCT_END_DATEPLAN_SEQ1ACCT_TYPEFCHOSPITAL_SERVICE_CODE

TOT_TOTAL_CHARGESTOT_TOTAL_PAYMENTSTOT_TOTAL_ADJUSTMENTS TOT_BALANCE

ACCT_PTPTR PATIENT_NAMECITYSTATEZIP

ACCT_REFERRING_MDPHYSICIAN_NAMEAFFILIATIONAFFILIATION_CITYAFFILIATION_STATEAFFILIATION_ZIP

PLAN_SEQ1PLAN_NAMECARRIERCITYSTATEZIP

WEEKYEARQUARTERMONTH

Fact Table: Acct Fin Rollup

Dimension TableInsurance Plan/Carrier

Dimension TableDate

Dimension TableReferring Physician

Dimension TablePatient

Page 8: Warehousing dimension star-snowflake_schemas

Snowflake Schema

• Think Star Schema where the dimension tables are normalized

• Can be used to segregate rows in dimension tables that have a high percentage of null data (for faster lookup, you cannot index null )

Page 9: Warehousing dimension star-snowflake_schemas

product_key

Fact Table

Dimension Table

product_keysupplier_key

Snowflake Schema

UnitsCost Per Unit

Dimension Table

supplier_key

Product Info

Supplier Info

Page 10: Warehousing dimension star-snowflake_schemas

Dimension Table

Patient Demographics(Gender, Age)

Conformed Dimension

Fact TableHypertensionStudies

Fact Table

Lab Results

Fact TableDiabetes Assessment

A conformed dimension is a set of data attributes that have been physically implemented in multiple tables using the same structure. A conformed dimension can be applied to different fact tables. For example:

Note: The classic example for a conformed dimension is date. I wanted to offer a different example.

Page 11: Warehousing dimension star-snowflake_schemas

Star and Snowflake schemas are optimized for querying large data sets.

They should support:

Transition to Next Point of Discussion

• OLAP cubes • Business Intelligence and Analytic Applications• Ad hoc queries

Page 12: Warehousing dimension star-snowflake_schemas

The End