Data Warehousing Final

download Data Warehousing Final

of 31

Transcript of Data Warehousing Final

  • 7/31/2019 Data Warehousing Final

    1/31

  • 7/31/2019 Data Warehousing Final

    2/31

    TEAM

    Rachana Kola 31

    Tasneem Taj 30

    Vijay Kumar 29

    Shilpa Kasani 28

    Ankita Golchha 27

    Bharat Jain 26

  • 7/31/2019 Data Warehousing Final

    3/31

    Data warehousing

    Data Warehousing is a database used for reporting &analysis.

    It focuses on data storage.

    Essential components of Data warehouse system.

    Data warehouse can be subdivided into data marts.

  • 7/31/2019 Data Warehousing Final

    4/31

    Characteristics of data warehouse

    Conceptualview.

    Unlimiteddimensions.

    Dynamicsparse matrix

    handling.

    Client /server

    architecture.

    Accessibility&

    transparency.

  • 7/31/2019 Data Warehousing Final

    5/31

    OLTP & OLAP

    OLTP : Online transaction processing It is characterizedby a large no. of short online transactions. ( Insert ,Update , Delete )

    OLAP : Online Analytical Processing It is characterizedby relatively low volume of transactions.

  • 7/31/2019 Data Warehousing Final

    6/31

    OLTP v/s OLAP

    OLTP (Operational

    system)

    OLAP (Data warehouse

    system)

    Sources of data Operational data. Consolidation data.

    Purposes of data Control and runfundamental businesstasks.

    Planning , problemsolving & decisionmaking.

    What the data reveals Snapshot . Multi dimensional view.

    Inserts & updates Short and fast. Periodic long running.

    Queries Simple queries. Complex queries

    Processing speed Very fast. Depends on the amount

    of data involved.Space requirements Relatively small. Large due to aggregation

    and history data.

    Database design Highly normalized. De normalized.

    Backup & recovery Backup essential

    regularly.

    Reloading OLTP data as a

    recovery method.

  • 7/31/2019 Data Warehousing Final

    7/31

  • 7/31/2019 Data Warehousing Final

    8/31

    ARCHITECTURE

    External datasources

    EXTRACT

    CLEANTRANSFORM

    LOAD

    REFRESH

    MetadataRepository

    Data warehouse

    Reports

    OLAP

    Data Mining

    Operationalsystems

    Serves

  • 7/31/2019 Data Warehousing Final

    9/31

    COMPONENTS

    3 main systems required : Source systems

    Data staging area

    Presentation servers

    Operational data :

    Internal data

    External data

    Load manager :

    Simple transformation of data to prepare the data for entryinto the warehouse.

  • 7/31/2019 Data Warehousing Final

    10/31

    CONTD..

    Warehouse manager : Analysis of data. Transformation & merging of data. Backing up & archiving of data.

    Detailed summarized archived data.

    Meta data : Meta data means data about data. Extraction & loading process. Warehouse management process. Query management process.

    End user access tools.

  • 7/31/2019 Data Warehousing Final

    11/31

    ETL PROCESS

    Extract

    Transform

    Cleansing

    Loading

  • 7/31/2019 Data Warehousing Final

    12/31

    IMPORTANT TERMS

    Drill down

    Roll up

    Aggregation

    Granularity

  • 7/31/2019 Data Warehousing Final

    13/31

  • 7/31/2019 Data Warehousing Final

    14/31

    DIMENSIONAL DATA MODEL Dimensional data model is most often used in data warehousing

    systems.

    The objective of dimensional modeling is to represent a set of business

    measurements in a standard framework that is easily understandable by

    end users.

    The main components of a Dimensional Model are Fact Tables and

    Dimension Tables.

    A fact table is a table that contains the measures of interest. A dimension is a structure usually composed of one or more hierarchies

    that categorizes data.

  • 7/31/2019 Data Warehousing Final

    15/31

    Example of dimensional model

  • 7/31/2019 Data Warehousing Final

    16/31

    STAR SCHEMA The star schema is also called star-join

    schema, data cube, or multi-dimensional

    schema.

    It is the simplest style of data

    warehouse schema.

    The star schema consists of one or more

    fact tables referencing any number

    of dimension tables.

    A star schema classifies the attributes of

    an event into facts (measured

    numeric/time data), and descriptive

    dimension attributes (product id,

    customer name, sale date) that give the

    facts a context.

  • 7/31/2019 Data Warehousing Final

    17/31

    SAMPLE STAR SCHEMA

  • 7/31/2019 Data Warehousing Final

    18/31

    Advantages of star schema

    The main advantages of star schemas are that they:

    Provide a direct and intuitive mapping between the business entities

    being analyzed by end users and the schema design.

    Provide highly optimized performance for typical star queries.

    Are widely supported by a large number of business intelligence

    tools, which may anticipate or even require that the data-warehouse

    schema contain dimension tables

  • 7/31/2019 Data Warehousing Final

    19/31

    SNOW FLAKE SCHEMA

    That dimensional table is normalized into multiple lookuptables each representing a level in the dimensional hierarchy.

    Each point of the star explodes into more points.

    Extension of the star schema.

  • 7/31/2019 Data Warehousing Final

    20/31

    SAMPLE SNOWFLAKE SCHEMA

  • 7/31/2019 Data Warehousing Final

    21/31

  • 7/31/2019 Data Warehousing Final

    22/31

    FACT CONSTELLATION SCHEMA

  • 7/31/2019 Data Warehousing Final

    23/31

    FACT CONSTELLATION SCHEMA Shaped like a constellation of stars

    More complex than star or snowflake

    With each star schema it is possible to construct factconstellation schema

    By splitting the original star schema into more starschemes each of them describes facts on another level ofdimension hierarchies

  • 7/31/2019 Data Warehousing Final

    24/31

  • 7/31/2019 Data Warehousing Final

    25/31

    PERSONAL PRODUCTIVITY

    APPLICATIONS Useful for manipulating and presenting data on

    individual PCs.

    Developed for a standalone environment Address applications requiring only small volumes of

    warehouse data.

  • 7/31/2019 Data Warehousing Final

    26/31

    DATA QUERY & REPORTING

    Data access through simple, list-oriented queries, and thegeneration of basic reports.

    Provide a view of historical data .

    Do not address the enterprise need for in-depth analysisand planning.

  • 7/31/2019 Data Warehousing Final

    27/31

    PLANNING & ANALYSIS

    Address such essential business requirements.

    Referred to as on-line analytical processing (OLAP)applications.

    Mandates that the organization look not only at pastperformance but, more importantly, at the futureperformance of the business.

    The combined analysis of historical data with futureprojections is critical to the success of today'scorporation.

  • 7/31/2019 Data Warehousing Final

    28/31

    Advantages of data warehousingProvides business

    users a customer centric view of the

    companysheterogeneous data.

    Added value tocompanys customersthrough better access

    to information.

    Historicalinformation.

    Enhanced dataquality.

    Supplements disasterrecovery plans.

    One stop shop.

    Provides saving inbilling processes,

    reduces fraud lossesetc.

  • 7/31/2019 Data Warehousing Final

    29/31

    Disadvantages of data warehousing

    Not optimal for unstructured data.

    Data warehouses get outdated relatively quickly.

    Duplicate functionality between data warehouses &operational systems.

    Extremely expensive. Costs :

    Time spent in careful analysis.

    Design & implementation.

    Hardware costs. Software costs.

    On going support & maintenance.

  • 7/31/2019 Data Warehousing Final

    30/31

    Conclusion

    Data warehousing is necessary to analyze the businessneeds, integrate data from several sources, model thedata in an appropriate manner to present the businessinformation in the form of dashboards and reports.

  • 7/31/2019 Data Warehousing Final

    31/31