Syllabus - University of British Columbiarap/teaching/504/2009/slides/OLAP.pdf · Microsoft...
Transcript of Syllabus - University of British Columbiarap/teaching/504/2009/slides/OLAP.pdf · Microsoft...
1
Presentation: Sophia
Discussion: Tianyu
3
Syllabus
Motivation
Definition
Why not RDMS & OLTP
Typical OLAP Architecture
Logical Model
Schemas
Materialized Views
Metadata Requirements and Conclusion
4
Syllabus
Motivation
Definition
Why not RDMS & OLTP
Typical OLAP Architecture
Logical Model
Schemas
Materialized Views
Metadata Requirements and Conclution
�Decision Making: Everyday, Everywhere
�Decision Support System:�a class of computerized information systems that
support decision making activities.
�What are needed to support decision making?
5
Motivation– Decision Support
Data Warehouse
OLAP tools
DataDataDataDataHeteroguousHeteroguousHeteroguousHeteroguous
Large ScaleLarge ScaleLarge ScaleLarge Scale
Analysis Analysis Analysis Analysis ToolsToolsToolsTools
User FriendlyUser FriendlyUser FriendlyUser Friendly
Good Query Good Query Good Query Good Query ThroughputThroughputThroughputThroughput
6
Syllabus
Motivation
Definition
Why not RDMS & OLTP
Typical OLAP Architecture
Logical Model
Schemas
Materialized Views
Metadata Requirements and Conclution
2
�A data warehouse is a A data warehouse is a A data warehouse is a A data warehouse is a
collection of data collection of data collection of data collection of data in support of management’sin support of management’sin support of management’sin support of management’s
decision making process.decision making process.decision making process.decision making process.
7
Definition--data warehouse
SubjectSubjectSubjectSubject----orientedorientedorientedoriented
Integrated
Time-variant
Non-volatile
DayDayDayDay----totototo----day, day, day, day, transactiontransactiontransactiontransaction
Usually from one source
Within a short period of time
Frequent updates
OLAP OLTP
RDMS
� Manager: “I need to analyze the curve of sales of our company within the past 10 years.”
“Let’s try to find something interesting!”
� Casher: “One computer was sold and I got $1000. I need to update the database.”
“That’s what I do everyday!”
8
OLAP VS. OLTP
OLTP OLAP
Users Clerk, IT professional Knowledge worker
Function Day to day operations Decision support
DB Design Application-oriented Subject-oriented
Data Current, up-to-date detailed.
Historical, summarized, multidimensional,…
Usage repetitive Ad-hoc
Access Read/write Lots of scans
Unit of work Short, simple transaction Complex query
# rec accessed tens Millions
# users thousands Hundreds
DB size 100 MB-GB 100 GB-TB
Metric Transaction throughput Query throughput
9
10
Syllabus
Motivation
Definition
Why not RDMS & OLTP
Typical OLAP Architecture
Logical Model
Schemas
Materialized Views
Metadata Requirements and Conclution
Decision Maker VS. Daily User
11
Why not RDBM & OLTP?
Decision
maker
Daily
User
I need special design to access Large Scale and Multidimensional
Data
No way. It will downgrade the performance.
I just read data! Cut concurrency
control and recovery
mechanism!!!
Are you kidding!!! It’s so important!!!
I need historical data!!! Where are
they?!
I don’t have them, and I don’t want to
have them!Go away!
Alright, I’ll build my own system!
12
Syllabus
Motivation
Definition
Why not RDMS & OLTP
Typical OLAP Architecture
Logical Model
Schemas
Materialized Views
Metadata Requirements and Conclution
3
13
Typical OLAP architectureBack-end Tools
Front-end Tools
14
Syllabus
Motivation
Definition
Why not RDMS & OLTP
Typical OLAP Architecture
Logical Model
Schemas
Materialized Views
Metadata Requirements and Conclution
� Multidimensional Numerical Measure – Dimension -- Attribute
15
Logical Model
Sales
Data
ProductCity
� Front-end operations◦ Pivot
◦ Rollup
◦ Drilldown
◦ Slice and dice
16
Logical Model
17
Syllabus
Motivation
Definition
Why not RDMS & OLTP
Typical OLAP Architecture
Logical Model
Schemas
Materialized Views
Metadata Requirements and Conclution
�Star Schema
18
Database Design Methodology
Fact Table(dimensions)
Dimension Table(attributes)
Dimension Table(attributes)
Dimension Table(attributes)
Dimension Table(attributes)
Foreign Key
4
19
Star Schema Example�Problem of Star Schema�Un-normalized (redundancy)
�No attribute hierarchy
20
Database Design Methodology
Ex: (Vancouver, BC, Canada, North America)
(Victoria, BC, Canada, North America)
Normalize dimension tables����
Snowflake Schema
21
Example Snowflake Schema
More Problems:�saving of space is negligible in comparison to the typical
magnitude of the fact table.
�Efficiency
22
Syllabus
Motivation
Definition
Why not RDMS & OLTP
Typical OLAP Architecture
Logical Model
Schemas
Materialized Views
Metadata Requirements and Conclution
� IndicesIndicesIndicesIndices
� Materialized ViewsMaterialized ViewsMaterialized ViewsMaterialized Views
� Answer queries using indices and viewsAnswer queries using indices and viewsAnswer queries using indices and viewsAnswer queries using indices and views
� Optimization of Complex queriesOptimization of Complex queriesOptimization of Complex queriesOptimization of Complex queries
� Parallel ProcessingParallel ProcessingParallel ProcessingParallel Processing
� Server Architecture:Server Architecture:Server Architecture:Server Architecture:◦ Specialized SQL ServersSpecialized SQL ServersSpecialized SQL ServersSpecialized SQL Servers
◦ ROLAPROLAPROLAPROLAP
◦ MOLAPMOLAPMOLAPMOLAP
� SQL Extensions: next paperSQL Extensions: next paperSQL Extensions: next paperSQL Extensions: next paper
23
Warehouse Servers� Bit map – One bit for each record
� Bitmap Compression
� Joint Indices
24
Indices
5
�Joins of the fact table with a subset of dimension tables, with the aggregation of one or more measures.
25
Materialized Views
Example:Total Sale of each product in each State
�What to Materialize?�Workload, cost of update, storage
�A greedy algorithm
Fact table
Product
State
� Selection, Rollup
� Generator(V: view Q: query):�Same dimensions
�Selection clause of Q implies that of V
�Group by of V is a subset of Q
� Minimal Generator M’:�A set of generators (could be more than one)
�Other generators generate some member of it
26
Use of Views
� Query: Total Sales of clothing in Washington
� Minimal Generators:
◦ Total sales by each state for each product
◦ Total sales by each city for each category
� Other Generators:
◦ Total sales by each city for each product
27 28
Syllabus
Motivation
Definition
Why not RDMS & OLTP
Typical OLAP Architecture
Logical Model
Schemas
Materialized Views
Metadata Requirements and Conclusion
�Administrative metadata�Information necessary for setting up and using a
warehouse
�Business metadata�Business terms; ownership; charging policy
�Operational metadata�Information collected during the operation of the
warehouse
29
Metadata requirements
30
Metadata requirements
Store and manage
all the metadata
associated with
the warehouse
6
31
Conclusion�Data Warehousing is really hard
�Data Cleansing�Physical Design�Management�Visualization
There’re a lot of
opportunities
waiting for you
guys~