2 Final DW Concepts

download 2 Final DW Concepts

of 84

Transcript of 2 Final DW Concepts

  • 7/31/2019 2 Final DW Concepts

    1/84

    Data Warehousing Concepts

    Module 1

    http://www.pdfonline.com/easypdf/?gad=CLjUiqcCEgjbNejkqKEugRjG27j-AyCw_-AP
  • 7/31/2019 2 Final DW Concepts

    2/84

    Agenda

    Data warehousing

    overview

    Data warehouse Vs OLTP

    Data warehouse Vs DataMart

    Data Modeling

    http://www.pdfonline.com/easypdf/?gad=CLjUiqcCEgjbNejkqKEugRjG27j-AyCw_-AP
  • 7/31/2019 2 Final DW Concepts

    3/84

    Decision Support

    In order to make correct decisions,accurate, meaningful informationinformation

    about business environments,external issues, and internalworkings must be available in atimely fashion.

    http://www.pdfonline.com/easypdf/?gad=CLjUiqcCEgjbNejkqKEugRjG27j-AyCw_-AP
  • 7/31/2019 2 Final DW Concepts

    4/84

    OLTP Systems

    Capable of answering questions of aspecific nature and time frame.

    How many items do I have instock today?

    How many tickets were sold on aspecific date?

    What is the current price of anitem?

    http://www.pdfonline.com/easypdf/?gad=CLjUiqcCEgjbNejkqKEugRjG27j-AyCw_-AP
  • 7/31/2019 2 Final DW Concepts

    5/84

    OLTP Systems

    Transaction based systems experiencegreat difficulty in answering analytical anddecision support questions.

    Analysis takes a long time, interfering with:

    transaction performance

    daily operations

    The nature of the data is dynamic anddispersed.

    http://www.pdfonline.com/easypdf/?gad=CLjUiqcCEgjbNejkqKEugRjG27j-AyCw_-AP
  • 7/31/2019 2 Final DW Concepts

    6/84

    OLTP Systems

    Most organizations have created a spiderweb of systems and data sources.

    http://www.pdfonline.com/easypdf/?gad=CLjUiqcCEgjbNejkqKEugRjG27j-AyCw_-AP
  • 7/31/2019 2 Final DW Concepts

    7/84

    Which are ourlowest/highest margin

    customers ?

    Which are ourlowest/highest margin

    customers ?

    Who are my customers

    and what productsare they buying?

    Who are my customers

    and what productsare they buying?

    Which customersare most likely to goto the competition ?

    Which customers

    are most likely to goto the competition ?

    What impact willnew products/services

    have on revenue

    and margins?

    What impact willnew products/services

    have on revenueand margins?

    What product prom-

    -otions have the biggestimpact on revenue?

    What product prom-

    -otions have the biggestimpact on revenue?

    What is the mosteffective distribution

    channel?

    What is the mosteffective distribution

    channel?

    A producer wants to know.

    http://www.pdfonline.com/easypdf/?gad=CLjUiqcCEgjbNejkqKEugRjG27j-AyCw_-AP
  • 7/31/2019 2 Final DW Concepts

    8/84

    Data, Data everywhereyet ...

    I cant find the data I need

    data is scattered over the network

    many versions, subtle differences

    I cant get the data I needneed an expert to get the data

    I cant understand the data Ifound

    available data poorly documented

    I cant use the data I found

    results are unexpected

    data needs to be transformed

    from one form to other

    http://www.pdfonline.com/easypdf/?gad=CLjUiqcCEgjbNejkqKEugRjG27j-AyCw_-AP
  • 7/31/2019 2 Final DW Concepts

    9/84

    What is a Data Warehouse?

    A single, complete andconsistent store of dataobtained from a variety

    of different sourcesmade available to endusers in a what theycan understand and use

    in a business context.

    [Barry Devlin]

    http://www.pdfonline.com/easypdf/?gad=CLjUiqcCEgjbNejkqKEugRjG27j-AyCw_-AP
  • 7/31/2019 2 Final DW Concepts

    10/84

    What are the users saying...

    Data should be integratedacross the enterprise

    Summary data has a realvalue to the organization

    Historical data holds thekey to understanding data

    over timeWhat-if capabilities are

    required

    http://www.pdfonline.com/easypdf/?gad=CLjUiqcCEgjbNejkqKEugRjG27j-AyCw_-AP
  • 7/31/2019 2 Final DW Concepts

    11/84

    What is Data Warehousing?

    A process of

    transforming data into

    information andmaking it available tousers in a timelyenough manner to

    make a difference

    Data

    Information

    http://www.pdfonline.com/easypdf/?gad=CLjUiqcCEgjbNejkqKEugRjG27j-AyCw_-AP
  • 7/31/2019 2 Final DW Concepts

    12/84

    Evolution

    60s: Batch reportshard to find and analyze information

    inflexible and expensive, reprogram every newrequest

    70s: Terminal-based DSS and EIS (executiveinformation systems)

    still inflexible, not integrated with desktop tools

    80s: Desktop data access and analysis tools

    query tools, spreadsheets, GUIs

    easier to use, but only access operational databases 90s till now: Data warehousing with

    integrated OLAP engines and tools, real timeDW

    http://www.pdfonline.com/easypdf/?gad=CLjUiqcCEgjbNejkqKEugRjG27j-AyCw_-AP
  • 7/31/2019 2 Final DW Concepts

    13/84

    Data Warehousing --It is a process

    Technique for assembling andmanaging data from varioussources for the purpose of

    answering business questions.Thus making decisions that werenot previous possible

    A decision support database

    maintained separately from theorganizations operationaldatabase

    http://www.pdfonline.com/easypdf/?gad=CLjUiqcCEgjbNejkqKEugRjG27j-AyCw_-AP
  • 7/31/2019 2 Final DW Concepts

    14/84

    Data Warehouse

    A data warehouse is a

    subject-oriented

    integratedtime-varying

    non-volatile

    collection of data that is used primarily in

    organizational decision making.

    -- Bill Inmon, Building the Data Warehouse 1996

    http://www.pdfonline.com/easypdf/?gad=CLjUiqcCEgjbNejkqKEugRjG27j-AyCw_-AP
  • 7/31/2019 2 Final DW Concepts

    15/84

    Explorers, Farmers and Tourists

    Explorers: Seek out the unknown andpreviously unsuspected rewards hiding inthe detailed data

    Farmers: Harvest informationfrom known access paths

    Tourists: Browse informationharvested by farmers

    http://www.pdfonline.com/easypdf/?gad=CLjUiqcCEgjbNejkqKEugRjG27j-AyCw_-AP
  • 7/31/2019 2 Final DW Concepts

    16/84

    Data Warehouse Architecture

    Data Warehouse

    Engine

    Optimized Loader

    ExtractionCleansing

    Analyze

    Query

    Metadata Repository

    Relational

    Databases

    Legacy

    Data

    Purchased

    Data

    ERPSystems

    http://www.pdfonline.com/easypdf/?gad=CLjUiqcCEgjbNejkqKEugRjG27j-AyCw_-AP
  • 7/31/2019 2 Final DW Concepts

    17/84

    Data Warehouse for DecisionSupport & OLAP

    Putting Information technology to help the

    knowledge worker make faster and better

    decisionsWhich of my customers are most likely to go

    to the competition?

    What product promotions have the biggest

    impact on revenue?How did the share price of software

    companies correlate with profits over last 10

    years?

    http://www.pdfonline.com/easypdf/?gad=CLjUiqcCEgjbNejkqKEugRjG27j-AyCw_-AP
  • 7/31/2019 2 Final DW Concepts

    18/84

    Decision Support

    Used to manage and control business

    Data is historical or point-in-time

    Optimized for inquiry rather than update

    Use of the system is loosely defined and

    can be ad-hoc

    Used by managers and end-users tounderstand the business and make

    judgements

    http://www.pdfonline.com/easypdf/?gad=CLjUiqcCEgjbNejkqKEugRjG27j-AyCw_-AP
  • 7/31/2019 2 Final DW Concepts

    19/84

    Data Mining works with WarehouseData

    Data Warehousingprovides the Enterprisewith a memory

    Data Mining providesthe Enterprise withintelligence

    http://www.pdfonline.com/easypdf/?gad=CLjUiqcCEgjbNejkqKEugRjG27j-AyCw_-AP
  • 7/31/2019 2 Final DW Concepts

    20/84

    We want to know ... Given a database of 100,000 names, which persons are the

    least likely to default on their credit cards?

    Which types of transactions are likely to be fraudulentgiven the demographics and transactional history of aparticular customer?

    If I raise the price of my product by Rs. 2, what is theeffect on my ROI?

    If I offer only 2,500 airline miles as an incentive topurchase rather than 5,000, how many lost responses willresult?

    If I emphasize ease-of-use of the product as opposed to itstechnical capabilities, what will be the net effect on myrevenues?

    Which of my customers are likely to be the most loyal?

    Data Mining helps extract such information

    http://www.pdfonline.com/easypdf/?gad=CLjUiqcCEgjbNejkqKEugRjG27j-AyCw_-AP
  • 7/31/2019 2 Final DW Concepts

    21/84

    Application Areas

    Industry Application

    Finance Credit Card Analysis

    Insurance Claims, Fraud AnalysisTelecommunication Call record analysis

    Transport Logistics management

    Consumer goods promotion analysis

    Data Service providers Value added data

    Utilities Power usage analysis

    http://www.pdfonline.com/easypdf/?gad=CLjUiqcCEgjbNejkqKEugRjG27j-AyCw_-AP
  • 7/31/2019 2 Final DW Concepts

    22/84

    What makes data mining possible?

    Advances in the following areas aremaking data mining deployable:

    data warehousingbetter and more data (i.e., operational,

    behavioral, and demographic)

    the emergence of easily deployed data

    mining tools andthe advent of new data mining

    techniques. -- Gartner Group

    http://www.pdfonline.com/easypdf/?gad=CLjUiqcCEgjbNejkqKEugRjG27j-AyCw_-AP
  • 7/31/2019 2 Final DW Concepts

    23/84

    Why Separate Data Warehouse?

    PerformanceOperational database designed & tuned for known transactions &

    workloads.Complex OLAP queries would degrade performance. for op

    transactions.

    Special data organization, access & implementation methodsneeded for multidimensional views & queries.

    FunctionMissing data: Decision support requires historical data, which

    Operational database do not typically maintain.

    Data consolidation: Decision support requires consolidation(aggregation, summarization) of data from many heterogeneoussources: operational databases, external sources.

    Data quality: Different sources typically use inconsistent datare resentations codes and formats which have to be reconciled.

    http://www.pdfonline.com/easypdf/?gad=CLjUiqcCEgjbNejkqKEugRjG27j-AyCw_-APhttp://www.pdfonline.com/easypdf/?gad=CLjUiqcCEgjbNejkqKEugRjG27j-AyCw_-AP
  • 7/31/2019 2 Final DW Concepts

    24/84

    Benefits of a Data WarehouseReliable reporting

    Rapid access to data

    Integrated dataRapid access to data

    Flexible presentation of data

    Better decision making

    http://www.pdfonline.com/easypdf/?gad=CLjUiqcCEgjbNejkqKEugRjG27j-AyCw_-AP
  • 7/31/2019 2 Final DW Concepts

    25/84

    What are Operational Systems?

    They are OLTP systems

    Run mission criticalapplications

    Need to work withstringent performancerequirements forroutine tasks

    Used to run abusiness!

    http://www.pdfonline.com/easypdf/?gad=CLjUiqcCEgjbNejkqKEugRjG27j-AyCw_-AP
  • 7/31/2019 2 Final DW Concepts

    26/84

    RDBMS used for OLTP

    Database Systems have been usedtraditionally for OLTP

    clerical data processing tasksdetailed, up to date data

    structured repetitive tasks

    read/update a few records

    isolation, recovery and integrity arecritical

    http://www.pdfonline.com/easypdf/?gad=CLjUiqcCEgjbNejkqKEugRjG27j-AyCw_-AP
  • 7/31/2019 2 Final DW Concepts

    27/84

    Operational Systems

    Run the business in real time

    Based on up-to-the-second data

    Optimized to handle large

    numbers of simple read/writetransactions

    Optimized for fast response topredefined transactions

    Used by people who deal with

    customers, products -- clerks,salespeople etc.

    They are increasingly used bycustomers

    http://www.pdfonline.com/easypdf/?gad=CLjUiqcCEgjbNejkqKEugRjG27j-AyCw_-AP
  • 7/31/2019 2 Final DW Concepts

    28/84

    Examples of Operational Data

    Data Industry Usage Technology Volumes

    CustomerFile

    All TrackCustomerDetails

    Legacy application, flatfiles, main frames

    Small-medium

    AccountBalance

    Finance Controlaccountactivities

    Legacy applications,hierarchical databases,mainframe

    Large

    Point-of-Sale data

    Retail Generatebills, managestock

    ERP, Client/Server,relational databases

    Very Large

    CallRecord

    Telecomm-unications

    Billing Legacy application,hierarchical database,mainframe

    Very Large

    ProductionRecord

    Manufact-uring

    ControlProduction

    ERP,relational databases,AS/400

    Medium

    http://www.pdfonline.com/easypdf/?gad=CLjUiqcCEgjbNejkqKEugRjG27j-AyCw_-AP
  • 7/31/2019 2 Final DW Concepts

    29/84

    So, whats different?

    http://www.pdfonline.com/easypdf/?gad=CLjUiqcCEgjbNejkqKEugRjG27j-AyCw_-AP
  • 7/31/2019 2 Final DW Concepts

    30/84

    Application-Orientation vs.Subject-Orientation

    Application-Orientation

    Operational

    Database

    LoansCreditCard

    Trust

    Savings

    Subject-Orientation

    Data

    Warehouse

    Customer

    Vendor

    Product

    Activity

    http://www.pdfonline.com/easypdf/?gad=CLjUiqcCEgjbNejkqKEugRjG27j-AyCw_-AP
  • 7/31/2019 2 Final DW Concepts

    31/84

    OLTP vs. Data Warehouse

    OLTP systems are tuned for knowntransactions and workloads while workloadis not known a priori in a data warehouse

    Special data organization, access methodsand implementation methods are needed tosupport data warehouse queries (typicallymultidimensional queries)

    e.g., average amount spent on phone callsbetween 9AM-5PM in Pune during themonth of December

    http://www.pdfonline.com/easypdf/?gad=CLjUiqcCEgjbNejkqKEugRjG27j-AyCw_-AP
  • 7/31/2019 2 Final DW Concepts

    32/84

    OLTP vs Data Warehouse

    OLTPApplication

    Oriented

    Used to runbusiness

    Detailed data

    Current up to date

    Isolated Data

    Repetitive access

    Clerical User

    Warehouse (DSS)Subject Oriented

    Used to analyze

    businessSummarized and

    refined

    Snapshot data

    Integrated Data

    Ad-hoc access

    Knowledge User(Manager)

    http://www.pdfonline.com/easypdf/?gad=CLjUiqcCEgjbNejkqKEugRjG27j-AyCw_-AP
  • 7/31/2019 2 Final DW Concepts

    33/84

    OLTP vs Data Warehouse

    OLTPPerformance

    Sensitive

    Few Recordsaccessed at a time(tens)

    Read/Update Access

    No data redundancyDatabase Size

    100MB -100 GB

    Data Warehouse

    Performance relaxed

    Large volumes accessed

    at a time(millions)Mostly Read (Batch

    Update)

    Redundancy present

    Database Size

    100 GB - few terabytes

    http://www.pdfonline.com/easypdf/?gad=CLjUiqcCEgjbNejkqKEugRjG27j-AyCw_-AP
  • 7/31/2019 2 Final DW Concepts

    34/84

    OLTP vs Data Warehouse

    OLTP

    Transactionthroughput is the

    performance metricThousands of users

    Managed inentirety

    Data Warehouse

    Query throughputis the performance

    metricHundreds of users

    Managed bysubsets

    http://www.pdfonline.com/easypdf/?gad=CLjUiqcCEgjbNejkqKEugRjG27j-AyCw_-AP
  • 7/31/2019 2 Final DW Concepts

    35/84

    To summarize ...

    OLTP Systems areused to runabusiness

    The DataWarehouse helpsto optimizethebusiness

    http://www.pdfonline.com/easypdf/?gad=CLjUiqcCEgjbNejkqKEugRjG27j-AyCw_-AP
  • 7/31/2019 2 Final DW Concepts

    36/84

    Why Now?

    Data is being produced

    ERP provides clean data

    The computing power is available

    The computing power is affordable

    The competitive pressures are strong

    Commercial products are available

    http://www.pdfonline.com/easypdf/?gad=CLjUiqcCEgjbNejkqKEugRjG27j-AyCw_-AP
  • 7/31/2019 2 Final DW Concepts

    37/84

    Data Warehouses:Architecture, Design & Construction

    DW Architecture

    Loading, refreshing

    Structuring/Modeling

    DWs and Data Marts

    http://www.pdfonline.com/easypdf/?gad=CLjUiqcCEgjbNejkqKEugRjG27j-AyCw_-AP
  • 7/31/2019 2 Final DW Concepts

    38/84

    Data Warehouse ArchitecturesGeneric Two-Level Architecture

    Independent Data Mart

    Dependent Data Mart andOperational Data Store

    All involve some form ofextraction, transformation and loading(ETLETL)

    http://www.pdfonline.com/easypdf/?gad=CLjUiqcCEgjbNejkqKEugRjG27j-AyCw_-AP
  • 7/31/2019 2 Final DW Concepts

    39/84

    Generic two-level architecture

    E

    T

    LOne,company-

    wide

    warehouse

    Periodic extraction data is not completely current in warehouse

    http://www.pdfonline.com/easypdf/?gad=CLjUiqcCEgjbNejkqKEugRjG27j-AyCw_-AP
  • 7/31/2019 2 Final DW Concepts

    40/84

    Independent Data MartData marts:Data marts:Mini-warehouses, limited in scope

    E

    T

    L

    Separate ETL for each

    independentdata mart

    Data access complexity

    due to multiple data marts

    http://www.pdfonline.com/easypdf/?gad=CLjUiqcCEgjbNejkqKEugRjG27j-AyCw_-AP
  • 7/31/2019 2 Final DW Concepts

    41/84

    Dependentdata mart with operational data store

    E

    T

    L

    Single ETL for

    enterprise data warehouse

    (EDW)(EDW)

    Simpler data access

    ODSODSprovides option for

    obtaining currentdata

    Dependentdata marts

    loaded from EDW

    http://www.pdfonline.com/easypdf/?gad=CLjUiqcCEgjbNejkqKEugRjG27j-AyCw_-AP
  • 7/31/2019 2 Final DW Concepts

    42/84

    The ETL Process

    Capture

    Scrub or data cleansing

    Transform

    Load

    ETL = Extract, transform, and load

    http://www.pdfonline.com/easypdf/?gad=CLjUiqcCEgjbNejkqKEugRjG27j-AyCw_-AP
  • 7/31/2019 2 Final DW Concepts

    43/84

    Steps in data reconciliation

    Static extractStatic extract = capturing a

    snapshot of the source data at

    a point in time

    Incremental extractIncremental extract =

    capturing changes that have

    occurred since the last static

    extract

    Capture = extractobtaining a snapshot

    of a chosen subset of the source data for

    loading into the data warehouse

    http://www.pdfonline.com/easypdf/?gad=CLjUiqcCEgjbNejkqKEugRjG27j-AyCw_-AP
  • 7/31/2019 2 Final DW Concepts

    44/84

    Steps in data reconciliation (continued)

    Scrub = cleanseuses pattern

    recognition and AI techniques to

    upgrade data quality

    Fixing errors:Fixing errors: misspellings,erroneous dates, incorrect field usage,

    mismatched addresses, missing data,

    duplicate data, inconsistencies

    Also:Also: decoding, reformatting, timestamping, conversion, key generation,

    merging, error detection/logging,

    locating missing data

    http://www.pdfonline.com/easypdf/?gad=CLjUiqcCEgjbNejkqKEugRjG27j-AyCw_-AP
  • 7/31/2019 2 Final DW Concepts

    45/84

    Steps in data reconciliation (continued)

    Transform = convert data from format

    of operational system to format of data

    warehouse

    RecordRecord--level:level:Selection data partitioning

    Joining data combining

    Aggregation data summarization

    FieldField--level:level:single-field from one field to one field

    multi-field from many fields to one, or

    one field to many

    http://www.pdfonline.com/easypdf/?gad=CLjUiqcCEgjbNejkqKEugRjG27j-AyCw_-AP
  • 7/31/2019 2 Final DW Concepts

    46/84

    Steps in data reconciliation (continued)

    Load/Index= place transformed data

    into the warehouse and create indexes

    Refresh mode:Refresh mode:bulk rewriting oftarget data at periodic intervals

    Update mode:Update mode: only changes insource data are written to data

    warehouse

    http://www.pdfonline.com/easypdf/?gad=CLjUiqcCEgjbNejkqKEugRjG27j-AyCw_-AP
  • 7/31/2019 2 Final DW Concepts

    47/84

    Data Warehouse vs. Data Marts

    What comes first ?

    http://www.pdfonline.com/easypdf/?gad=CLjUiqcCEgjbNejkqKEugRjG27j-AyCw_-AP
  • 7/31/2019 2 Final DW Concepts

    48/84

    Data Mart

    Data mart is:

    A functional segmentfunctional segmentof an enterpriserestricted for purposes of security, locality,

    performance, or business necessity usingmodeling and information deliverytechniques identical to data warehousing.

    http://www.pdfonline.com/easypdf/?gad=CLjUiqcCEgjbNejkqKEugRjG27j-AyCw_-AP
  • 7/31/2019 2 Final DW Concepts

    49/84

    Data Mart

    Why build a data mart?

    Allows an organization to visualize the large but focuson the small and attainable.

    Provides a platform for rapid delivery of an operationalsystem.

    Minimizes risk.

    A corporate warehouse can be constructed from theunion of the enterprise data marts.

    http://www.pdfonline.com/easypdf/?gad=CLjUiqcCEgjbNejkqKEugRjG27j-AyCw_-AP
  • 7/31/2019 2 Final DW Concepts

    50/84

    Data Mart

    Data

    Warehouse

    Data From

    Transaction Sources

    Financial

    Data Mart

    Logistics

    Data Mart

    Contract

    DataMart

    Update From the

    Warehouse

    The data warehouse

    populates

    the data marts.

    http://www.pdfonline.com/easypdf/?gad=CLjUiqcCEgjbNejkqKEugRjG27j-AyCw_-AP
  • 7/31/2019 2 Final DW Concepts

    51/84

    Data Mart

    DataWarehouse

    Data FromTransaction Sources

    FinancialData Mart

    LogisticsData Mart

    ContractData Mart

    Update From the

    Data Marts

    The data marts populatethe data warehouse.

    http://www.pdfonline.com/easypdf/?gad=CLjUiqcCEgjbNejkqKEugRjG27j-AyCw_-AP
  • 7/31/2019 2 Final DW Concepts

    52/84

    Data Mart- Approach

    Physical data warehouse (physical)

    Data warehouse --> data marts

    Data marts --> data warehouse

    Parallel data warehouse and data marts

    http://www.pdfonline.com/easypdf/?gad=CLjUiqcCEgjbNejkqKEugRjG27j-AyCw_-AP
  • 7/31/2019 2 Final DW Concepts

    53/84

    Top-down

    SOURCE DATA

    External

    Data

    Operational Data

    Staging Area

    Data Warehouse Data Marts

    Physical Data Warehouse:Data Warehouse --> Data Marts

    http://www.pdfonline.com/easypdf/?gad=CLjUiqcCEgjbNejkqKEugRjG27j-AyCw_-AP
  • 7/31/2019 2 Final DW Concepts

    54/84

    Bottom-up approach

    SOURCE DATA

    External

    Data

    Operational Data

    Staging Area

    Data Warehouse

    Data Marts

    Physical Data Warehouse:

    Data Marts --> Data Warehouse

    http://www.pdfonline.com/easypdf/?gad=CLjUiqcCEgjbNejkqKEugRjG27j-AyCw_-AP
  • 7/31/2019 2 Final DW Concepts

    55/84

    Hybrid

    SOURCE DATA

    External

    Data

    Operational Data

    Staging Area

    Data Warehouse

    Data Marts

    Physical Data Warehouse:

    Parallel Data Warehouse & Data Marts

    http://www.pdfonline.com/easypdf/?gad=CLjUiqcCEgjbNejkqKEugRjG27j-AyCw_-AP
  • 7/31/2019 2 Final DW Concepts

    56/84

    Data Modeling

    WHAT IS A DATA MODELING?A data model is an abstraction of some aspect of the real world

    (system).

    WHY A DATA MODEL?

    Helps to visualise the business

    A model is a means of communication.

    Model helps elicit and document requirements.

    Models reduce the cost of change.

    http://www.pdfonline.com/easypdf/?gad=CLjUiqcCEgjbNejkqKEugRjG27j-AyCw_-AP
  • 7/31/2019 2 Final DW Concepts

    57/84

    What is Data ModelingWhat is Data Modeling

    Data-oriented activity!

    Part art, part science

    Highly detailed, iterative process

    Uses basic objects to deliver pictorial

    image of requirementsEntities (ERD &DDM)

    Attributes (ERD & DDM)

    Relationships (ERD & DDM)

    Uses Metadata to supplement datarequirements described by pictorial image

    http://www.pdfonline.com/easypdf/?gad=CLjUiqcCEgjbNejkqKEugRjG27j-AyCw_-AP
  • 7/31/2019 2 Final DW Concepts

    58/84

    Based on use and enforcement of DataStandards

    Depends on knowledgeable, committedparticipants for its ultimate success

    Business subject matter experts (SME)

    Information technology professionals

    Foundation for future business data

    requirements

    More important now than ever!

    What is Data ModelingWhat is Data Modeling

    http://www.pdfonline.com/easypdf/?gad=CLjUiqcCEgjbNejkqKEugRjG27j-AyCw_-AP
  • 7/31/2019 2 Final DW Concepts

    59/84

    Why do you need one?Why do you need one?

    To identify the basic informationstructure of the business and its systems.

    To develop and promote a standard,unified vocabulary for data.

    To provide a firm foundation for

    delivering high-quality systems that meetbusiness needs.

    To facilitate data integration.

    http://www.pdfonline.com/easypdf/?gad=CLjUiqcCEgjbNejkqKEugRjG27j-AyCw_-AP
  • 7/31/2019 2 Final DW Concepts

    60/84

    Why do you need one?Why do you need one?

    To eliminate ambiguity of datadefinitions

    To reduce software development cost

    To reduce database development cost

    To provide standardised methods ofhandling data

    To facilitates application integrationTo position the corporation for future

    database technology

    To preserve data context.

    http://www.pdfonline.com/easypdf/?gad=CLjUiqcCEgjbNejkqKEugRjG27j-AyCw_-AP
  • 7/31/2019 2 Final DW Concepts

    61/84

    Why do you need one?

    The quality of a database is

    in its initial design!

    It is a means, not an end!

    http://www.pdfonline.com/easypdf/?gad=CLjUiqcCEgjbNejkqKEugRjG27j-AyCw_-AP
  • 7/31/2019 2 Final DW Concepts

    62/84

    What happens if you donWhat happens if you dont havet haveone?one?

    Individual Data Store

    http://www.pdfonline.com/easypdf/?gad=CLjUiqcCEgjbNejkqKEugRjG27j-AyCw_-AP
  • 7/31/2019 2 Final DW Concepts

    63/84

    What happens if you dont haveone?

    Corporate Data Store

    http://www.pdfonline.com/easypdf/?gad=CLjUiqcCEgjbNejkqKEugRjG27j-AyCw_-AP
  • 7/31/2019 2 Final DW Concepts

    64/84

    What happens if you donWhat happens if you dont havet haveone?one?

    You Loose Data!

    Individuals know where to find data.

    Data is modified when moved.

    Looses meaning and value.

    http://www.pdfonline.com/easypdf/?gad=CLjUiqcCEgjbNejkqKEugRjG27j-AyCw_-AP
  • 7/31/2019 2 Final DW Concepts

    65/84

    Business

    ProcessConceptual

    Logical

    Model PhysicalModel

    Levels of modeling

    http://www.pdfonline.com/easypdf/?gad=CLjUiqcCEgjbNejkqKEugRjG27j-AyCw_-AP
  • 7/31/2019 2 Final DW Concepts

    66/84

    Levels of Data ModelingLevels of Data Modeling

    Logical Modeling

    Focused on business

    requirements

    Independent of technical

    platform

    Normalized - The Key,

    the whole Key and

    nothing but the Key, so

    help me Codd!

    Physical Modeling

    Focused on technical

    requirements

    Dependent on a

    technical platform and

    DBMS

    De-normalized

    (Optimized) to

    enhance performance

    Conceptual Modeling

    The 50,000 foot view of the businessrequirements

    The precursor to Logical Modeling

    http://www.pdfonline.com/easypdf/?gad=CLjUiqcCEgjbNejkqKEugRjG27j-AyCw_-AP
  • 7/31/2019 2 Final DW Concepts

    67/84

    What Data Modeling isWhat Data Modeling is notnot

    A waste of time!

    A one time effort

    The ultimate IT application development

    cure

    A quick process

    A function solely performed and

    understood by and for IT professionals

    http://www.pdfonline.com/easypdf/?gad=CLjUiqcCEgjbNejkqKEugRjG27j-AyCw_-AP
  • 7/31/2019 2 Final DW Concepts

    68/84

    Where Data Models are usedWhere Data Models are used

    Operational SystemsTraditional Applications designed to run the day-to-day business of the Enterprise

    External Systems ***Data used within an Enterprise that is obtained from

    outside sources

    Staging Areas ***Created to aid in the collection and transformation of

    data that is targeted for a Data Warehouse

    Operational Data Store ***W. H. Inmon and Claudia Imhoff definition: Asubject-oriented, integrated, volatile, current valueddata store containing only corporate detailed data.

    *** - Not discussed here

    http://www.pdfonline.com/easypdf/?gad=CLjUiqcCEgjbNejkqKEugRjG27j-AyCw_-AP
  • 7/31/2019 2 Final DW Concepts

    69/84

    Data Warehouse (DW)

    W. H. Inmon definition: A subject-oriented, integrated,non-volatile, time-variant collection of data organizedto support management needs.

    Data Mart (DM)

    TDWI definition: A data structure that is optimized foraccess. It is designed to facilitate end-user analysis ofdata. It typically supports a single analytic applicationused by a distinct set of workers.

    Where Data Models are usedWhere Data Models are used

    http://www.pdfonline.com/easypdf/?gad=CLjUiqcCEgjbNejkqKEugRjG27j-AyCw_-AP
  • 7/31/2019 2 Final DW Concepts

    70/84

    Sample conceptual model

    Products

    Customer

    Invoices

    CustomerAddresses

    Customers

    GeographicBoundaries

    Sales Reps

    Sample

    Conceptual

    Model

    http://www.pdfonline.com/easypdf/?gad=CLjUiqcCEgjbNejkqKEugRjG27j-AyCw_-AP
  • 7/31/2019 2 Final DW Concepts

    71/84

    Sample logical model

    PRODUCT#PRODUCT CODE

    .PRODUCT DESCRIPTION

    CUSTOMER#CUSTOMER ID

    #SNAPSHOT DATE.CUSTOMER NAME

    CUSTOMER INVOICE

    #INVOICE ID#LINE ITEM SEQ

    .INVOICE DATE

    SALES REP

    #SALES REP ID

    CUSTOMER ADDRESS#CUSTOMER ID

    #ADDRESS ID

    GEOGRAPHIC

    BOUNDARY#GEO CODE

    the bill for

    purchased

    by

    the bill sent to

    purchased at

    the bill purchased by

    purchased by

    the general location of

    located withinfor the

    customer

    sold to by

    the salesman

    for

    the sales

    manager for

    for the

    customer

    managed by

    the salesman

    for

    sold by

    Sample Logical Model

    http://www.pdfonline.com/easypdf/?gad=CLjUiqcCEgjbNejkqKEugRjG27j-AyCw_-AP
  • 7/31/2019 2 Final DW Concepts

    72/84

    Physical Model

    A Physical data model may include

    Referential Integrity

    Indexes

    Views

    Alternate keys and other constraints

    Tablespaces and physical storageobjects.

    http://www.pdfonline.com/easypdf/?gad=CLjUiqcCEgjbNejkqKEugRjG27j-AyCw_-AP
  • 7/31/2019 2 Final DW Concepts

    73/84

    PRODUCTS

    # PRODUCT_CODE

    PRODUCT_DESCRIPTIONCATEGORY_CODECATEGORY_DESCRIPTION

    CUSTOMER_INVOICES

    #INVOICE_ID#LINE_ITEM_SEQ

    INVOICE_DATECUSTOMER_ID

    BILL_TO_ADDRESS_ID

    SALES_REP_IDMANAGER_REP_IDORGANIZATION_ID

    ORG_ADDRESS_ID

    PRODUCT_CODEQUANTITY

    UNIT_PRICEAMOUNT

    oPRODUCT_COSTLOAD_DATE

    CUSTOMERS

    #CUSTOMER_ID#SNAPSHOT_DATE

    CUSTOMER_NAMEoAGE

    oMARITAL_STATUS

    CREDIT_RATING

    CUSTOMER_ADDRESSES

    #CUSTOMER_ID

    #ADDRESS_IDADDRESS_LINE1

    oADDRESS_LINE2oPOSTAL_CODE

    SALES_REP_IDGEO_CODE

    LOAD_DATE

    GEOGRAPHIC_BOUNDARIES

    #GEO_CODE

    CITY_NAMESTATE_NAME

    COUNTRY_NAMEoCITY_ABBRV

    oSTATE_ABBRVoCOUNTRY_ABBRV

    SALES_REPS

    #SALES_REP_IDLAST_NAMEFIRST_NAME

    oMANAGER_FIRST_NAME

    oMANAGER_LAST_NAME

    Sample Physical

    Model

    Physical Model

    http://www.pdfonline.com/easypdf/?gad=CLjUiqcCEgjbNejkqKEugRjG27j-AyCw_-AP
  • 7/31/2019 2 Final DW Concepts

    74/84

    Types of Data ModelsTypes of Data Models

    Multiple varieties

    Entity Relationship Model (ERM)

    Dimensional Data Model (DDM)

    Each serves specific purpose

    Common to OLTP applications:

    Entity Relationship Model

    Common to OLAP applications:

    Entity Relationship Model

    Dimensional Data Model

    http://www.pdfonline.com/easypdf/?gad=CLjUiqcCEgjbNejkqKEugRjG27j-AyCw_-AP
  • 7/31/2019 2 Final DW Concepts

    75/84

    Entity Relationship ModelEntity Relationship Model

    is rented underreports to

    completes

    employs

    receives

    makes

    rents under

    is made on

    is rented as

    CREDIT CARD

    credit card number

    credit card exp

    credit card type

    CHECK

    check bank number

    check number

    CUSTOMERcustomer number

    name

    address

    phone

    credit card

    credit card exp

    status code

    MOVIEmovie number

    name

    director

    description

    star

    rating

    genre

    rental rate

    STOREstore number

    manager

    address

    phone

    EMPLOYEEemployee number

    name

    address

    phone

    ss#

    hire_date

    salary

    supervisor (FK)

    PAYMENTpayment transaction number

    type

    amount

    date

    status

    MOVIE RENTAL RECORDrental record date

    rental date

    due date

    rental status

    rental rate

    overdue charge

    MOVIE COPYmovie copy number

    general condition

    IN STOCK MOVIEmovie-number

    store-number

    quantity

    http://www.pdfonline.com/easypdf/?gad=CLjUiqcCEgjbNejkqKEugRjG27j-AyCw_-AP
  • 7/31/2019 2 Final DW Concepts

    76/84

    Dimensional Modeling

    Database organizationmust look like business

    must be recognizable by business user

    approachable by business userMust be simple

    De-normalized Schema TypesStar Schema

    Snowflake schema

    Fact Constellation Schema

    C t f t ht h m

    http://www.pdfonline.com/easypdf/?gad=CLjUiqcCEgjbNejkqKEugRjG27j-AyCw_-AP
  • 7/31/2019 2 Final DW Concepts

    77/84

    Components of a star schemastar schema

    Fact tables containfactual or quantitative

    data

    Dimension tables contain

    descriptions about thesubjects of the business

    1:N relationship

    between dimensiontables and fact tables

    Excellent for ad-hoc queries,

    but bad for online transaction processing

    Dimension tables are

    denormalized to

    maximizeperformance

    http://www.pdfonline.com/easypdf/?gad=CLjUiqcCEgjbNejkqKEugRjG27j-AyCw_-AP
  • 7/31/2019 2 Final DW Concepts

    78/84

    Star schema example

    Fact tableprovides statistics for sales

    broken down by product, period and storedimensions

    http://www.pdfonline.com/easypdf/?gad=CLjUiqcCEgjbNejkqKEugRjG27j-AyCw_-AP
  • 7/31/2019 2 Final DW Concepts

    79/84

    Star schema with sample data

    http://www.pdfonline.com/easypdf/?gad=CLjUiqcCEgjbNejkqKEugRjG27j-AyCw_-AP
  • 7/31/2019 2 Final DW Concepts

    80/84

    Snowflake schema

    Represent dimensional hierarchy directly bynormalizing tables.

    Easy to maintain and saves storage

    Ti

    m

    e

    prod

    cust

    city

    fa

    ct

    date, custno, prodno, cityname, ...

    regio

    n

    http://www.pdfonline.com/easypdf/?gad=CLjUiqcCEgjbNejkqKEugRjG27j-AyCw_-APhttp://www.pdfonline.com/easypdf/?gad=CLjUiqcCEgjbNejkqKEugRjG27j-AyCw_-AP
  • 7/31/2019 2 Final DW Concepts

    81/84

    Fact Constellation

    Fact Constellation

    Multiple fact tables that share manydimension tables

    Booking and Checkout may share manydimension tables in the hotel industry

    Hotels

    Travel Agents

    Promotion

    Room Type

    Customer

    Booking

    Checkout

    http://www.pdfonline.com/easypdf/?gad=CLjUiqcCEgjbNejkqKEugRjG27j-AyCw_-AP
  • 7/31/2019 2 Final DW Concepts

    82/84

    DATA MODELING - FOR WHICH COMPONENT?

    STAGING AREA

    YES ! (maybe multiple data models arerequired)

    ODS

    YES !

    DATAWAREHOUSEYES!

    http://www.pdfonline.com/easypdf/?gad=CLjUiqcCEgjbNejkqKEugRjG27j-AyCw_-AP
  • 7/31/2019 2 Final DW Concepts

    83/84

    Where to use what?

    Normalized Data ModelOLTP

    Flat Table withoutconstraints

    StagingArea

    Normalized modelODS

    Dimensional modelData marts

    Types of ModelStages

    http://www.pdfonline.com/easypdf/?gad=CLjUiqcCEgjbNejkqKEugRjG27j-AyCw_-AP
  • 7/31/2019 2 Final DW Concepts

    84/84

    DW and role of E/R Modelling

    Ralph Kimball says.

    ER Models are too

    complicated for end users to

    understand

    ER Modeling/normalisingonly suitable for OLTP or in

    data staging area since it

    eliminates redundancy

    Results in too many tables to

    be easy to query ER models are optimised for

    update activity not high

    performance querying

    Bill Inmon says.

    ER Model is suitable for data

    warehouses because it is

    stable, and supports

    consistency and flexibility Normalised data is ideal

    basis for the design of the

    Data Warehouse and the

    ODS

    May not be suitable for thedata mart, which deals

    heavily with regular query

    activity and time-variant

    analysis