Data-Warehousing [Compatibility Mode] - Copy

download Data-Warehousing [Compatibility Mode] - Copy

of 38

Transcript of Data-Warehousing [Compatibility Mode] - Copy

  • 8/3/2019 Data-Warehousing [Compatibility Mode] - Copy

    1/38

    Introduction toData Warehousing

  • 8/3/2019 Data-Warehousing [Compatibility Mode] - Copy

    2/38

    The Importance of Data

    Warehousing Provide a single version of the truth

    Improve decision making

    Support key corporate initiatives such asperformance management, B2C and B2Be-commerce, and customer relationshipmanagement

    Estimated to be a $113.5 billion market in2002 for systems, software, services, andin-house expenditures (Palo AltoManagement Group)

  • 8/3/2019 Data-Warehousing [Compatibility Mode] - Copy

    3/38

    A Simple Definition

    A data warehouse is a collection of

    data created to support decision-

    making applications.

  • 8/3/2019 Data-Warehousing [Compatibility Mode] - Copy

    4/38

    Data Warehouse

    Characteristics Subject oriented -- data are organized

    around sales, products, etc.

    Integrated -- data are integrated toprovide a comprehensive view

    Time variant -- historical data aremaintained

    Nonvolatile -- data are not updated byusers

  • 8/3/2019 Data-Warehousing [Compatibility Mode] - Copy

    5/38

    Another Definition

    Data warehousing is the entire

    process of data extraction,

    transformation, and loading of data tothe warehouse and the access of the

    data by end users and applications.

  • 8/3/2019 Data-Warehousing [Compatibility Mode] - Copy

    6/38

    Data Mart

    A data mart stores data for a limited number of

    subject areas, such as marketing and sales data. It is

    used to support specific applications.

    An independent data mart is created directly from

    source systems.

    A dependent data mart is populated from a data

    warehouse.

  • 8/3/2019 Data-Warehousing [Compatibility Mode] - Copy

    7/38

    Operational Data Store

    An operational data store consolidates data from

    multiple source systems and provides a near real-

    time, integrated view of volatile, current data.

    Its purpose is to provide integrated data for

    operational purposes. It has add, change, and delete

    functionality.

    It may be created to avoid a full blown ERP

    implementation.

  • 8/3/2019 Data-Warehousing [Compatibility Mode] - Copy

    8/38

    Prod

    Mkt

    HR

    Fin

    Acctg

    Data Sources

    Transaction Data

    IBM

    IMS

    VSAM

    Oracle

    Sybase

    ETL Software Data Stores Data AnalysisTools and

    Applications

    Users

    Other Internal Data

    ERP SAP

    Clickstream Informix

    Web Data

    External Data

    Demographic Harte-

    Hanks

    S

    T

    A

    GI

    NG

    AR

    EA

    O

    P

    ER

    AT

    IO

    NA

    L

    D

    AT

    A

    ST

    OR

    E

    Ascential

    Extract

    Sagent

    SAS

    Clean/Scrub

    TransformFirstlogic

    Load

    Informatica

    Data MartsTeradataIBM

    DataWarehouse

    MetaData

    Finance

    Marketing

    Sales

    Essbase

    Microsoft

    ANALYSTS

    MANAGERS

    EXECUTIVES

    OPERATIONAL

    PERSONNEL

    CUSTOMERS/

    SUPPLIERS

    SQL

    Cognos

    SAS

    Queries,Reporting,

    DSS/EIS,

    Data Mining

    Micro Strategy

    Siebel

    Business

    Objects

    Web

    Browser

  • 8/3/2019 Data-Warehousing [Compatibility Mode] - Copy

    9/38

    Two Data Warehousing

    Strategies Enterprise-wide warehouse, top

    down, the Inmon methodology

    Data mart, bottom up, the Kimballmethodology

    When properly executed, both result

    in an enterprise-wide datawarehouse, but with differentarchitectures

  • 8/3/2019 Data-Warehousing [Compatibility Mode] - Copy

    10/38

    The Data Mart Strategy The most common approach

    Begins with a single mart and architected marts

    are added over time for more subject areas Relatively inexpensive and easy to implement

    Can be used as a proof of concept for datawarehousing

    Can perpetuate the silos of informationproblem

    Can postpone difficult decisions and activities

    Requires an overall integration plan

  • 8/3/2019 Data-Warehousing [Compatibility Mode] - Copy

    11/38

    The Enterprise-wide Strategy A comprehensive warehouse is built

    initially

    An initial dependent data mart is builtusing a subset of the data in thewarehouse

    Additional data marts are built usingsubsets of the data in the warehouse

    Like all complex projects, it is expensive,time consuming, and prone to failure

    When successful, it results in anintegrated, scalable warehouse

  • 8/3/2019 Data-Warehousing [Compatibility Mode] - Copy

    12/38

    Data Sources and Types Primarily from legacy, operational

    systems

    Almost exclusively numerical data at thepresent time

    External data may be included, oftenpurchased from third-party sources

    Technology exists for storing unstructureddata and expect this to become moreimportant over time

  • 8/3/2019 Data-Warehousing [Compatibility Mode] - Copy

    13/38

    Extraction, Transformation,

    and Loading (ETL) Processes

    The plumbing work of datawarehousing

    Data are moved from source totarget data bases

    A very costly, time consuming part

    of data warehousing

  • 8/3/2019 Data-Warehousing [Compatibility Mode] - Copy

    14/38

    Data Extraction Often performed by COBOL routines

    (not recommended because of high

    program maintenance and noautomatically generated meta data)

    Sometimes source data is copied to thetarget database using the replicationcapabilities of standard RDMS (not

    recommended because of dirty data inthe source systems)

    Increasing performed by specialized ETLsoftware

  • 8/3/2019 Data-Warehousing [Compatibility Mode] - Copy

    15/38

    Sample ETL Tools DataStage from Ascential Software

    SAS System from SAS Institute

    Power Mart/Power Center fromInformatica

    Sagent Solution from Sagent

    Software Hummingbird Genio Suite from

    Hummingbird Communications

  • 8/3/2019 Data-Warehousing [Compatibility Mode] - Copy

    16/38

    Reasons for Dirty Data Dummy Values Absence of Data Multipurpose Fields Cryptic Data Contradicting Data Inappropriate Use of Address Lines Violation of Business Rules Reused Primary Keys, Non-Unique Identifiers Data Integration Problems

  • 8/3/2019 Data-Warehousing [Compatibility Mode] - Copy

    17/38

    Data Cleansing Source systems contain dirty data that

    must be cleansed

    ETL software contains rudimentary datacleansing capabilities

    Specialized data cleansing software isoften used. Important for performingname and address correction andhouseholding functions

    Leading data cleansing vendors includeVality (Integrity), Harte-Hanks (Trillium),and Firstlogic (i.d.Centric)

  • 8/3/2019 Data-Warehousing [Compatibility Mode] - Copy

    18/38

    Data Staging Often used as an interim step between data

    extraction and later steps

    Accumulates data from asynchronous sources

    using native interfaces, flat files, FTP sessions,or other processes

    At a predefined cutoff time, data in the stagingfile is transformed and loaded to the warehouse

    There is usually no end user access to thestaging file

    An operational data store may be used for datastaging

  • 8/3/2019 Data-Warehousing [Compatibility Mode] - Copy

    19/38

    Data Transformation Transforms the data in accordance

    with the business rules and

    standards that have beenestablished

    Example include: format changes,

    deduplication, splitting up fields,replacement of codes, derivedvalues, and aggregates

  • 8/3/2019 Data-Warehousing [Compatibility Mode] - Copy

    20/38

    Data Loading Data are physically moved to the

    data warehouse

    The loading takes place within aload window

    The trend is to near real time

    updates of the data warehouse asthe warehouse is increasingly usedfor operational applications

  • 8/3/2019 Data-Warehousing [Compatibility Mode] - Copy

    21/38

    Meta Data Data about data

    Needed by both information technology

    personnel and users IT personnel need to know data sources

    and targets; database, table and columnnames; refresh schedules; data usagemeasures; etc.

    Users need to know entity/attributedefinitions; reports/query tools available;report distribution information; help deskcontact information, etc.

  • 8/3/2019 Data-Warehousing [Compatibility Mode] - Copy

    22/38

    Database Vendors High end (i.e., terabyte plus)

    vendors include IBM (DB2) and

    NCR-Teradata (Teradata) Oracle (8i) and Microsoft (SQL

    Server 7) are major players for

    smaller databases

  • 8/3/2019 Data-Warehousing [Compatibility Mode] - Copy

    23/38

    On-line Analytical

    Processing (OLAP) A set of functionality that facilitates

    multidimensional analysis

    Allows users to analyze data in waysthat are natural to them

    Comes in many varieties -- ROLAP,

    MOLAP, DOLAP, etc.

  • 8/3/2019 Data-Warehousing [Compatibility Mode] - Copy

    24/38

    ROLAP Relational OLAP

    Uses a RDBMS to implement and OLAP

    environment Typically involves a star schema to

    provide the multidimensional capabilities

    OLAP tool manipulates RDBMS star

    schema data

    Called slowlap by MOLAP vendors

  • 8/3/2019 Data-Warehousing [Compatibility Mode] - Copy

    25/38

    MOLAP Multidimensional OLAP

    Uses a MDDBS (e.g., Essbase) to

    store and access data

    Usually requires proprietary(non SQL) data access tools

    Provides exceptionally fast responsetimes

  • 8/3/2019 Data-Warehousing [Compatibility Mode] - Copy

    26/38

    Star Schema Creates non-normalized data

    structures

    Easier for users to understand Optimized for OLAP

    Uses fact (facts or measures in thebusiness) and dimension(establishes the context of the facts)tables

  • 8/3/2019 Data-Warehousing [Compatibility Mode] - Copy

    27/38

    OLAP Tools Products come from vendors such as Brio, Cognos,

    Hyperion, and BusinessObjects

    Typically available as a fat or thin (i.e., brow ser) client

    In a web environment, the browser communicates

    w ith a web server, which talks to an application

    server, which connects to backend databases

    The application server provides query, reporting, and

    OLAP analysis functionality over the web

    Java applets or downloaded components augment the

    thin client

    A broadcast server may be used to schedule, run,

    publish, and broadcast reports, alerts, and responses

    over the LAN, email, or personal digital assistant.

  • 8/3/2019 Data-Warehousing [Compatibility Mode] - Copy

    28/38

    Claim

    # Physician ID

    # Patient ID

    # Service Code

    # Payer ID# Claim Number

    # Line Item Number

    # Claim Date

    Date of Services

    Amount of Charge

    Unit of Services

    Service

    #Service Code

    Service Description

    #Category Code

    Time Periods

    #Claim DateYear

    Month

    Quarter

    Week

    Payer

    #Payer ID

    Name

    Address

    Phone Number

    EDI Number

    Star Schema

    Patient

    #Patient ID

    Patient Name

    Address

    Age

    SexInsurance ID

    Physician

    #Physician ID

    Physician Name

    Specialty ID

    Credential ID

  • 8/3/2019 Data-Warehousing [Compatibility Mode] - Copy

    29/38

    Dimension Table Examples Retail -- store name, zip code, product

    name, product category, day of week

    Telecommunications -- call origin, calldestination

    Banking -- customer name, accountnumber, branch, account officer

    Insurance -- policy type, insured party

  • 8/3/2019 Data-Warehousing [Compatibility Mode] - Copy

    30/38

    Fact Table Examples Retail -- number of units sold, sales

    amount

    Telecommunications -- length ofcall in minutes, average number ofcalls

    Banking -- average monthly

    balance Insurance -- claims amount

  • 8/3/2019 Data-Warehousing [Compatibility Mode] - Copy

    31/38

    Warehouse Users Analysts

    Managers

    Executives

    Operational personnel

    Customers and suppliers

  • 8/3/2019 Data-Warehousing [Compatibility Mode] - Copy

    32/38

    Warehouse Tools and

    Applications SQL queries

    Managed query environments

    Structured and ad hoc reports DSS/EIS

    Portals

    Data mining Packaged applications

    Custom-built applications

  • 8/3/2019 Data-Warehousing [Compatibility Mode] - Copy

    33/38

    Owens & Minor Owens&Minor -- data warehousing has

    supported integration along the supply chain.Winner of the 1999 TDWI Leadership Award

    the nation's leading distributor of name-brandmedical and surgical supplies

    has transformed its business model byintegrating supply chain management, e-business, data warehousing, and Internet

    technologies as part of this initiative, WISDOM

    (WebIntelligence Supporting Decisions fromOwens & Minor) has been especially valuable

  • 8/3/2019 Data-Warehousing [Compatibility Mode] - Copy

    34/38

    Raw MaterialsSuppliers

    Manufacturer Provider PatientOwens & Minor

    PRODUCT

    INFORMATION

    Raw MaterialsSuppliers

    Manufacturer Provider PatientOwens & Minor

    PRODUCT

    INFORMATION

    + 1,400 manufacturers + 4,000 Acute Care Facilities

  • 8/3/2019 Data-Warehousing [Compatibility Mode] - Copy

    35/38

    WISDOM a Web-based decision support system

    that provides information to OMsemployees, suppliers and customers

    accesses data from a data warehousethat maintains supplier and customertransaction data

    sold to trading partners as a value added

    product WISDOM II provides data about the

    transactions that suppliers and customershave with all of their trading partners

  • 8/3/2019 Data-Warehousing [Compatibility Mode] - Copy

    36/38

    Sample Applications Supports reporting and queries for

    internal personnel

    Supports an EIS for senior management Suppliers can determine their market

    share in specific hospitals

    Hospitals can identify which products arebeing bought off contract

    WISDOM II extends data warehousing totrading partners through an outsourcingarrangement

  • 8/3/2019 Data-Warehousing [Compatibility Mode] - Copy

    37/38

    Questions

  • 8/3/2019 Data-Warehousing [Compatibility Mode] - Copy

    38/38