Designing the Data Warehouse - Part 1

download Designing the Data Warehouse - Part 1

of 45

Transcript of Designing the Data Warehouse - Part 1

  • 8/12/2019 Designing the Data Warehouse - Part 1

    1/45

    Designing the data warehouse

    / data marts

    Methodologies and Techniques

  • 8/12/2019 Designing the Data Warehouse - Part 1

    2/45

    Basic principles

  • 8/12/2019 Designing the Data Warehouse - Part 1

    3/45

    Life cycle of the DW

    Operational Databases Warehouse Database

    First time load

    Refresh

    Refresh

    Refresh

    Purge or Archive

  • 8/12/2019 Designing the Data Warehouse - Part 1

    4/45

    Oracle Warehouse

    ComponentsRelational

    tools

    Applications/ Web

    Any Data Any AccessAny Source

    Externaldata

    Operational

    data

    OLAPtools

    Text, image

    Oracle Medi`

    Relational /Multidimensional

    Spatial

    Audio,videoWeb

  • 8/12/2019 Designing the Data Warehouse - Part 1

    5/45

    Oracle Intelligence Tools

    IS developsusers Views

    Oracle Reports

    Current

    Business users

    Oracle Discoverer

    Tactical

    Analysts

    Oracle Express

    Strategic

  • 8/12/2019 Designing the Data Warehouse - Part 1

    6/45

    Oracle Data Mart Suite

    Ware-housingEngines

    Data ModelingOracle Data Mart Designer

    DataManagement

    Oracle EnterpriseManager

    DataExtraction

    Oracle Data MartBuilder

    Data Access& AnalysisDiscoverer &

    Oracle Reports

    OLTP

    Engines

    OLTP

    DatabasesData Mart

    Database

    Oracle8

    SQL*PLUS

  • 8/12/2019 Designing the Data Warehouse - Part 1

    7/45

    Big Bang Approach:

    Advantages and

    Disadvantages Advantages: warehouse built as part of major project

    (eg: BPR) Having a big picture of the data

    warehouse before starting the data

    warehousing project

    Disadvantages:

    Involves a high risk, takes a longer time

    Runs the risk of needing to change

    requirements

  • 8/12/2019 Designing the Data Warehouse - Part 1

    8/45

    Incremental Approach to

    Warehouse Development Multiple iterations

    Shorterimplementations

    Validation of each

    phase

    Strategy

    Definition

    Analysis

    Design

    Build

    Production

  • 8/12/2019 Designing the Data Warehouse - Part 1

    9/45

    Benefits of an Incremental

    Approach Delivers a strategic data warehouse

    solution through incremental development

    efforts

    Provides extensible, scalable architecture

    Quickly provides business benefits and

    ensures a much earlier return of

    investment Allows a data warehouse to be built based

    on a subject or application area at a time

    Allows the construction of an integrateddata mart environment

  • 8/12/2019 Designing the Data Warehouse - Part 1

    10/45

    Data Mart

    A subset of a data warehouse thatsupports the requirements of a

    particular department or businessfunction.

    Characteristics include: Do not normally contain detailed operational

    data unlike data warehouses.

    May contain certain levels of aggregation

  • 8/12/2019 Designing the Data Warehouse - Part 1

    11/45

    MarketingSales

    FinanceHuman Resources

    Dependent Data Mart

    DataWarehouse

    Data Marts

    External Data

    Flat Files

    OperationalSystems Marketing

    Sales

    Finance

  • 8/12/2019 Designing the Data Warehouse - Part 1

    12/45

    Independent Data Mart

    Sales or Marketing

    External Data

    Flat FilesOperationalSystems

  • 8/12/2019 Designing the Data Warehouse - Part 1

    13/45

    Reasons for Creating a Data

    Mart

    To give users more flexible access to

    the data they need to analyse most

    often.

    To provide data in a form that matches

    the collective view of a group of users

    To improve end-user response time.

    Potential users of a data mart are

    clearly defined and can be targeted for

    support

  • 8/12/2019 Designing the Data Warehouse - Part 1

    14/45

    Reasons for Creating a Data

    Mart To provide appropriately structured data as

    dictated by the requirements of the end-user

    access tools.

    Building a data mart is simpler compared with

    establishing a corporate data warehouse.

    The cost of implementing data marts is farless than that required to establish a data

    warehouse.

  • 8/12/2019 Designing the Data Warehouse - Part 1

    15/45

    Data Marts Issues

    Data mart functionality

    Data mart size

    Data mart load performance

    Users access to data in multiple datamarts

    Data mart Internet / Intranet access

    Data mart administration Data mart installation

  • 8/12/2019 Designing the Data Warehouse - Part 1

    16/45

    Example of DW tool OLAP

    Rotate and drill down to successivelevels of detail.

    Create and examine calculated data

    interactively on large volumes of data. Determine comparative or relative

    differences.

    Perform exception and trend analysis. Perform advanced analytical functions

    for example forecasting, modeling, and

    regression analysis

  • 8/12/2019 Designing the Data Warehouse - Part 1

    17/45

    Original OLAP Rules

    1. Multidimensional conceptual view

    2. Transparency

    3. Accessibility4. Consistent reporting performance

    5. Client-server architecture

  • 8/12/2019 Designing the Data Warehouse - Part 1

    18/45

    Original OLAP Rules

    6. Multiuser support

    7. Unrestricted cross-dimensional

    operations8. Intuitive data manipulation

    9. Flexible reporting

    10. Unlimited dimensions andaggregation levels

  • 8/12/2019 Designing the Data Warehouse - Part 1

    19/45

    1001

    1007

    1010

    1020

    Relational Database Model

    31

    42

    22

    32

    F

    M

    M

    F

    Anderson

    Green

    Lee

    Ramos

    Attribute 1Name

    Attribute 2Age

    Attribute 3Gender

    Row 1

    Row 2

    Row 3

    Row 4

    The table above illustrates the employee relation.

    Attribute 4Emp No.

  • 8/12/2019 Designing the Data Warehouse - Part 1

    20/45

    Multidimensional Database

    Model

    The data is found at the intersection of

    dimensions.

    Store

    GL_Line

    Time

    FINANCE

    Store

    Product

    Time

    SALES

    Customer

  • 8/12/2019 Designing the Data Warehouse - Part 1

    21/45

    Two dimensions

  • 8/12/2019 Designing the Data Warehouse - Part 1

    22/45

    Three dimensions

  • 8/12/2019 Designing the Data Warehouse - Part 1

    23/45

    Specialised Multidimensional tool Benefits:

    Quick access to very large volumes of data

    Extensive and comprehensive libraries of

    complex functions

    analysis

    Strong modeling and forecasting capabilities

    Can access multidimensional and relational

    database structures

    Caters for calculated fields

    Disadvantages:

    Difficulty of changing model

    Lack of support for very large volumes of data

    May require significant processing power

  • 8/12/2019 Designing the Data Warehouse - Part 1

    24/45

    MOLAP Server The application layer

    stores data in amultidimensional structure

    The presentation layer

    provides the

    multidimensional view MOLAPEngine

    DSS client

    Applicationlayer

    Warehouse

    Efficient storage and processing

    Complexity hidden from the

    user Analysis using preaggregated

    summaries and precalculated

    measures

  • 8/12/2019 Designing the Data Warehouse - Part 1

    25/45

  • 8/12/2019 Designing the Data Warehouse - Part 1

    26/45

  • 8/12/2019 Designing the Data Warehouse - Part 1

    27/45

    ROLAP

    ExpressServer

    ExpressuserWarehouse

    Datacache

    Livefetch

    Cache

    Query

    Data

    Also Hybrid (HOLAP)

  • 8/12/2019 Designing the Data Warehouse - Part 1

    28/45

    Choosing a Reporting

    Architecture Business needs

    Potential for growth

    interface

    enterprise architecture

    Network architecture

    Speed of access Openness

    MOLAP

    ROLAP

    Simple Complex

    Query

    Performance

    Good

    OK

    Analysis

  • 8/12/2019 Designing the Data Warehouse - Part 1

    29/45

    Data Acquisition

    Identify, extract, transform, and transportsource data

    Consider internal and external data

    Perform gap analysis between source dataand target database objects

    Plan move of data between sources and target

    Define first-time load and refresh strategy

    Define tool requirements

    Build, test, and execute data acquisition

    modules

  • 8/12/2019 Designing the Data Warehouse - Part 1

    30/45

    Modeling Warehouses differ from operational

    structures:

    Analytical requirements

    Subject orientation

    Data must map to subject orientedinformation:

    Identify business subjects

    Define relationships between subjects

    Name the attributes of each subject

    Modeling is iterative

    Modeling tools are available

  • 8/12/2019 Designing the Data Warehouse - Part 1

    31/45

    1. Defining the businessmodel

    2. Creating the dimensional

    model3. Modeling summaries

    4. Creating the physical model

    Physical model

    1

    2, 3

    4

    Select abusinessprocess

    Modeling the Data Warehouse

  • 8/12/2019 Designing the Data Warehouse - Part 1

    32/45

    Identifying Business Rules

    Product

    Type Monitor Status

    PC 15 inch New

    Server 17 inch Rebuilt19 inch CustomNone

    Location

    Geographic proximity

    0 - 1 miles1 - 5 miles> 5 miles

    Store

    Store > District > Region

    Time

    Month > Quarter > Year

  • 8/12/2019 Designing the Data Warehouse - Part 1

    33/45

    Creating the Dimensional Model

    Identify fact tables Translate business measures into facttables

    Analyze source system information for

    additional measures Identify base and derived measures

    Document additivity of measures

    Identify dimension tables

    Link fact tables to the dimensiontables

    Create views for users

  • 8/12/2019 Designing the Data Warehouse - Part 1

    34/45

    Dimension Tables

    Dimension tables have the following

    characteristics:

    Contain textual information that

    represents the attributes of the business Contain relatively static data

    Are joined to a fact table through a

    foreign key reference Product ChannelFacts(units,price)

    Customer Time

  • 8/12/2019 Designing the Data Warehouse - Part 1

    35/45

    Fact Tables

    Fact tables have the following characteristics: Contain numeric measures (metrics) of the

    business

    May contain summarized (aggregated) data

    May contain date-stamped data Are typically additive

    Have key value that is typically a concatenatedkey composed of the primary keys of the

    dimensions Joined to dimension tables through foreign

    keys that reference primary keys in thedimension tables

  • 8/12/2019 Designing the Data Warehouse - Part 1

    36/45

    Dimensional Model (Star

    Schema)

    Product Channel

    Facts(units,price)

    Customer Time

    Dimension tables

    Fact table

  • 8/12/2019 Designing the Data Warehouse - Part 1

    37/45

    Star Schema Model

    Central fact table

    Radiating dimensions

    Denormalized model

    Store TableStore_idDistrict_id...

    Item TableItem_idItem_desc...

    Time TableDay_idMonth_idPeriod_id

    Year_id

    Product TableProduct_idProduct_desc

    Sales Fact TableProduct_id

    Store_idItem_idDay_idSales_dollarsSales_units...

  • 8/12/2019 Designing the Data Warehouse - Part 1

    38/45

    Star Schema Model

    Easy for users to understand

    Fast response to queries

    Simple metadata Supported by many front end tools

    Less robust to change

    Slower to build

    Does not support history

  • 8/12/2019 Designing the Data Warehouse - Part 1

    39/45

    Snowflake Schema Model

    Time TableWeek_idPeriod_idYear_id

    Dept TableDept_id

    Dept_descMgr_id

    Mgr TableDept_idMgr_id

    Mgr_name

    Product TableProduct_id

    Product_desc

    Item TableItem_id

    Item_descDept_id

    Sales Fact TableItem_idStore_id

    Sales_dollarsSales_units

    Store TableStore_id

    Store_descDistrict_id

    District TableDistrict_id

    District_desc

  • 8/12/2019 Designing the Data Warehouse - Part 1

    40/45

  • 8/12/2019 Designing the Data Warehouse - Part 1

    41/45

    Using Summary Data

    Provides fast access to precomputed

    data

    Reduces use of I/O, CPU, and memory

    Is distilled from source systems and

    precalculated summaries

    Usually exists in summary fact tables

    Phase 3: Modeling summaries

  • 8/12/2019 Designing the Data Warehouse - Part 1

    42/45

    Designing Summary Tables

    Units Sales() Store

    Product ATotal

    Product B

    TotalProduct CTotal

    Average

    Maximum

    Total

    Percentage

  • 8/12/2019 Designing the Data Warehouse - Part 1

    43/45

    Summary Tables Example

    SALES FACTSSales Region Month10,000 North Jan 9912,000 South Feb 9911,000 North Jan 99

    15,000 West Mar 9918,000 South Feb 9920,000 North Jan 9910,000 East Jan 992,000 West Mar 99

    SALES BY MONTH/REGIONMonth Region Tot_Sales$Jan 99 North 41,000Jan 99 East 10,000Feb 99 South 40,000

    Mar 99 West 17,000

    SALES BY MONTH

    Month Tot_SalesJan 99 51,000Feb 99 40,000Mar 99 17,000

  • 8/12/2019 Designing the Data Warehouse - Part 1

    44/45

    Summary Management

    in Oracle8i

    Product

    Region

    Time

    Salessummary

    City

    Sales

    State

    Summaryusage

    Summary advisorSpace

    requirementsSummaryrecommendations

  • 8/12/2019 Designing the Data Warehouse - Part 1

    45/45

    The Time Dimension

    How and where should it be stored?

    TimedimensionSales fact

    Time is critical to the data warehouse.

    A consistent representation of time is

    required for extensibility.