Syllabus - University of British Columbiarap/teaching/504/2009/slides/OLAP.pdf · Microsoft...

6
1 Presentation: Sophia Discussion: Tianyu 3 Syllabus Motivation Definition Why not RDMS & OLTP Typical OLAP Architecture Logical Model Schemas Materialized Views Metadata Requirements and Conclusion 4 Syllabus Motivation Definition Why not RDMS & OLTP Typical OLAP Architecture Logical Model Schemas Materialized Views Metadata Requirements and Conclution Decision Making: Everyday, Everywhere Decision Support System: a class of computerized information systems that support decision making activities. What are needed to support decision making? 5 Motivation– Decision Support Data Warehouse OLAP tools Data Data Data Data Heteroguous Heteroguous Heteroguous Heteroguous Large Scale Large Scale Large Scale Large Scale Analysis Analysis Analysis Analysis Tools Tools Tools Tools User Friendly User Friendly User Friendly User Friendly Good Query Good Query Good Query Good Query Throughput Throughput Throughput Throughput 6 Syllabus Motivation Definition Why not RDMS & OLTP Typical OLAP Architecture Logical Model Schemas Materialized Views Metadata Requirements and Conclution

Transcript of Syllabus - University of British Columbiarap/teaching/504/2009/slides/OLAP.pdf · Microsoft...

Page 1: Syllabus - University of British Columbiarap/teaching/504/2009/slides/OLAP.pdf · Microsoft PowerPoint - OLAP.ppt [Compatibility Mode] Author: rap Created Date: 11/4/2009 1:38:13

1

Presentation: Sophia

Discussion: Tianyu

3

Syllabus

Motivation

Definition

Why not RDMS & OLTP

Typical OLAP Architecture

Logical Model

Schemas

Materialized Views

Metadata Requirements and Conclusion

4

Syllabus

Motivation

Definition

Why not RDMS & OLTP

Typical OLAP Architecture

Logical Model

Schemas

Materialized Views

Metadata Requirements and Conclution

�Decision Making: Everyday, Everywhere

�Decision Support System:�a class of computerized information systems that

support decision making activities.

�What are needed to support decision making?

5

Motivation– Decision Support

Data Warehouse

OLAP tools

DataDataDataDataHeteroguousHeteroguousHeteroguousHeteroguous

Large ScaleLarge ScaleLarge ScaleLarge Scale

Analysis Analysis Analysis Analysis ToolsToolsToolsTools

User FriendlyUser FriendlyUser FriendlyUser Friendly

Good Query Good Query Good Query Good Query ThroughputThroughputThroughputThroughput

6

Syllabus

Motivation

Definition

Why not RDMS & OLTP

Typical OLAP Architecture

Logical Model

Schemas

Materialized Views

Metadata Requirements and Conclution

Page 2: Syllabus - University of British Columbiarap/teaching/504/2009/slides/OLAP.pdf · Microsoft PowerPoint - OLAP.ppt [Compatibility Mode] Author: rap Created Date: 11/4/2009 1:38:13

2

�A data warehouse is a A data warehouse is a A data warehouse is a A data warehouse is a

collection of data collection of data collection of data collection of data in support of management’sin support of management’sin support of management’sin support of management’s

decision making process.decision making process.decision making process.decision making process.

7

Definition--data warehouse

SubjectSubjectSubjectSubject----orientedorientedorientedoriented

Integrated

Time-variant

Non-volatile

DayDayDayDay----totototo----day, day, day, day, transactiontransactiontransactiontransaction

Usually from one source

Within a short period of time

Frequent updates

OLAP OLTP

RDMS

� Manager: “I need to analyze the curve of sales of our company within the past 10 years.”

“Let’s try to find something interesting!”

� Casher: “One computer was sold and I got $1000. I need to update the database.”

“That’s what I do everyday!”

8

OLAP VS. OLTP

OLTP OLAP

Users Clerk, IT professional Knowledge worker

Function Day to day operations Decision support

DB Design Application-oriented Subject-oriented

Data Current, up-to-date detailed.

Historical, summarized, multidimensional,…

Usage repetitive Ad-hoc

Access Read/write Lots of scans

Unit of work Short, simple transaction Complex query

# rec accessed tens Millions

# users thousands Hundreds

DB size 100 MB-GB 100 GB-TB

Metric Transaction throughput Query throughput

9

10

Syllabus

Motivation

Definition

Why not RDMS & OLTP

Typical OLAP Architecture

Logical Model

Schemas

Materialized Views

Metadata Requirements and Conclution

Decision Maker VS. Daily User

11

Why not RDBM & OLTP?

Decision

maker

Daily

User

I need special design to access Large Scale and Multidimensional

Data

No way. It will downgrade the performance.

I just read data! Cut concurrency

control and recovery

mechanism!!!

Are you kidding!!! It’s so important!!!

I need historical data!!! Where are

they?!

I don’t have them, and I don’t want to

have them!Go away!

Alright, I’ll build my own system!

12

Syllabus

Motivation

Definition

Why not RDMS & OLTP

Typical OLAP Architecture

Logical Model

Schemas

Materialized Views

Metadata Requirements and Conclution

Page 3: Syllabus - University of British Columbiarap/teaching/504/2009/slides/OLAP.pdf · Microsoft PowerPoint - OLAP.ppt [Compatibility Mode] Author: rap Created Date: 11/4/2009 1:38:13

3

13

Typical OLAP architectureBack-end Tools

Front-end Tools

14

Syllabus

Motivation

Definition

Why not RDMS & OLTP

Typical OLAP Architecture

Logical Model

Schemas

Materialized Views

Metadata Requirements and Conclution

� Multidimensional Numerical Measure – Dimension -- Attribute

15

Logical Model

Sales

Data

ProductCity

� Front-end operations◦ Pivot

◦ Rollup

◦ Drilldown

◦ Slice and dice

16

Logical Model

17

Syllabus

Motivation

Definition

Why not RDMS & OLTP

Typical OLAP Architecture

Logical Model

Schemas

Materialized Views

Metadata Requirements and Conclution

�Star Schema

18

Database Design Methodology

Fact Table(dimensions)

Dimension Table(attributes)

Dimension Table(attributes)

Dimension Table(attributes)

Dimension Table(attributes)

Foreign Key

Page 4: Syllabus - University of British Columbiarap/teaching/504/2009/slides/OLAP.pdf · Microsoft PowerPoint - OLAP.ppt [Compatibility Mode] Author: rap Created Date: 11/4/2009 1:38:13

4

19

Star Schema Example�Problem of Star Schema�Un-normalized (redundancy)

�No attribute hierarchy

20

Database Design Methodology

Ex: (Vancouver, BC, Canada, North America)

(Victoria, BC, Canada, North America)

Normalize dimension tables����

Snowflake Schema

21

Example Snowflake Schema

More Problems:�saving of space is negligible in comparison to the typical

magnitude of the fact table.

�Efficiency

22

Syllabus

Motivation

Definition

Why not RDMS & OLTP

Typical OLAP Architecture

Logical Model

Schemas

Materialized Views

Metadata Requirements and Conclution

� IndicesIndicesIndicesIndices

� Materialized ViewsMaterialized ViewsMaterialized ViewsMaterialized Views

� Answer queries using indices and viewsAnswer queries using indices and viewsAnswer queries using indices and viewsAnswer queries using indices and views

� Optimization of Complex queriesOptimization of Complex queriesOptimization of Complex queriesOptimization of Complex queries

� Parallel ProcessingParallel ProcessingParallel ProcessingParallel Processing

� Server Architecture:Server Architecture:Server Architecture:Server Architecture:◦ Specialized SQL ServersSpecialized SQL ServersSpecialized SQL ServersSpecialized SQL Servers

◦ ROLAPROLAPROLAPROLAP

◦ MOLAPMOLAPMOLAPMOLAP

� SQL Extensions: next paperSQL Extensions: next paperSQL Extensions: next paperSQL Extensions: next paper

23

Warehouse Servers� Bit map – One bit for each record

� Bitmap Compression

� Joint Indices

24

Indices

Page 5: Syllabus - University of British Columbiarap/teaching/504/2009/slides/OLAP.pdf · Microsoft PowerPoint - OLAP.ppt [Compatibility Mode] Author: rap Created Date: 11/4/2009 1:38:13

5

�Joins of the fact table with a subset of dimension tables, with the aggregation of one or more measures.

25

Materialized Views

Example:Total Sale of each product in each State

�What to Materialize?�Workload, cost of update, storage

�A greedy algorithm

Fact table

Product

State

� Selection, Rollup

� Generator(V: view Q: query):�Same dimensions

�Selection clause of Q implies that of V

�Group by of V is a subset of Q

� Minimal Generator M’:�A set of generators (could be more than one)

�Other generators generate some member of it

26

Use of Views

� Query: Total Sales of clothing in Washington

� Minimal Generators:

◦ Total sales by each state for each product

◦ Total sales by each city for each category

� Other Generators:

◦ Total sales by each city for each product

27 28

Syllabus

Motivation

Definition

Why not RDMS & OLTP

Typical OLAP Architecture

Logical Model

Schemas

Materialized Views

Metadata Requirements and Conclusion

�Administrative metadata�Information necessary for setting up and using a

warehouse

�Business metadata�Business terms; ownership; charging policy

�Operational metadata�Information collected during the operation of the

warehouse

29

Metadata requirements

30

Metadata requirements

Store and manage

all the metadata

associated with

the warehouse

Page 6: Syllabus - University of British Columbiarap/teaching/504/2009/slides/OLAP.pdf · Microsoft PowerPoint - OLAP.ppt [Compatibility Mode] Author: rap Created Date: 11/4/2009 1:38:13

6

31

Conclusion�Data Warehousing is really hard

�Data Cleansing�Physical Design�Management�Visualization

There’re a lot of

opportunities

waiting for you

guys~