Post on 22-Dec-2015
1
MIS 4346/5346 DATA WAREHOUSING
Data Warehouse Implementation
2
Agenda
Review Development Approach Review Dimensional Modeling Implementing the Data Warehouse
with SQL Server Enterprise Edition Implementing Data Mart Physical
Structures Creating the data mart database Creating dimension tables Creating fact tables Using scripts
3
DW Development Approach: Kimball Methodology
DW Project Lifecycle
Business requirements Business Requirements Documentation Bus Matrix
Design, build and deliver in increments DW Architecture DW Design ETL system Cube, Reports, query tools, …
4
Review: Dimensional Modeling
5
Dimensional Model: Revisited
6
Data Warehouse Project Lifecycle
Source: Mundy, Thornthwaite, and Kimball (2006). The Microsoft Data Warehouse Toolkit, Wiley Publishing Inc., Indianapolis, IN.
7
IT Architecture/Infrastructure Physical Design*: SQL Server Enterprise Edition
SQL ServerDatabase Engine
* Specifically Product Selection & Installation
8
Data Warehouse Project Lifecycle
Source: Mundy, Thornthwaite, and Kimball (2006). The Microsoft Data Warehouse Toolkit, Wiley Publishing Inc., Indianapolis, IN.
9
DW/DM Implementation: Building the Data Mart Database Typically one database per data mart Example:
USE MASTER
CREATE DATABASE ClassPerformanceDW;
GO
ALTER DATABASE ClassPerformanceDW SET RECOVERY SIMPLE
GO
10
Creating Dimension Tables Naming is typically DimTableName Consider data compression Example:
CREATE TABLE DimStudent(student_sk int identity(1,1),student_id varchar(9),firstname varchar(30),lastname varchar(30),major varchar(7),classification varchar(25),gpa numeric(2, 1),clubname varchar(25),undergradschool varchar(25),gmat int,undergradORgrad varchar(10),
CONSTRAINT dimstudent_pk PRIMARY KEY (student_sk)); GO
CREATE INDEX student_id_idx on DimStudent (student_id);GO
ALTER TABLE DimStudent REBUILD WITH (DATA_COMPRESSION = PAGE); GO
GRANT SELECT ON DimStudent TO PUBLIC; GO
See http://blog.sqlauthority.com/2010/03/01/sql-server-data-and-page-compressions-data-storage-and-io-improvement/
OR http://sqlmag.com/database-performance-tuning/practical-data-compression-sql-server
11
Creating Fact Tables Naming typically FactTableName Example:
CREATE TABLE fact_enrollment(student_sk int,class_sk int,date_sk int,professor_sk int,
location_sk int, termyear_sk int,
coursegrade numeric(2, 1), CONSTRAINT fact_enrollment_pk PRIMARY KEY (student_sk, class_sk, date_sk, professor_sk), CONSTRAINT fact_enrollment_student_fk FOREIGN KEY (student_sk) REFERENCES dimstudent(student_sk), CONSTRAINT fact_enrollment_class_fk FOREIGN KEY(class_sk) REFERENCES dimclass (class_sk), CONSTRAINT fact_enrollment_date_fk FOREIGN KEY(date_sk) REFERENCES dimtime (date_sk), CONSTRAINT fact_enrollment_professor_fk FOREIGN KEY(professor_sk) REFERENCES dimprofessor
(professor_sk), CONSTRAINT fact_enrollment_location_fk FOREIGN KEY(location_sk) REFERENCES dimlocation
(location_sk), CONSTRAINT fact_enrollment_termyear_fk FOREIGN KEY(termyear_sk) REFERENCES dimtermyear
(termyear_sk), );
GO
GRANT SELECT ON factenrollment TO PUBLIC;
GO
12
Using Scripts
Contains all statements to create data mart tables
Advantages: Can easily create test environments Can easily create production tables Fewer files to manage Code reuse
13
Example Script “Design”
CREATE Script Contains CREATEs for all tables
TRANSFORM/LOAD Script (next topic) Calls individual transform/load scripts
One for each table Cleanup
Clear and shrink the log file
Example:http://business.baylor.edu/gina_green/teaching/sqlserver/scripts/generate_class_performance_dw_tables/generate_class_performance_dw_tables.zip
14
Summary
Physical Design: Infrastructure and DW
Creating and Naming: Database Dimension tables Fact tables
Considerations when creating above objects
Using scripts