DATA Warehousing Concepts2

download DATA Warehousing Concepts2

of 22

Transcript of DATA Warehousing Concepts2

  • 8/4/2019 DATA Warehousing Concepts2

    1/22

    4/14/2012 1

    The Goals of a Data Warehouse

    The data warehouse is a place where people can

    access their data.

    The goals of a data warehouse are as follows

    Access warehouse retrievals must be fast

    The Data in a data warehouse is consistentUsers must be able to slice and dice the data

    A warehouse must use easy to use browsing tools

    The data warehouse is a place where we publish used

    dataThe Quality of the Data in the data warehouse is adriver of business reengineering.

  • 8/4/2019 DATA Warehousing Concepts2

    2/22

    4/14/2012 2

    Two Different Worlds

    On-line transaction processing (OLTP)is

    profoundly different from Dimensional datawarehousing (DDW)

    The Users, data content, data structures, thehardware, the software, the administration, themanagement and the daily rhythms are different.

    OLTP design techniques and methods areinappropriate for and even destructive forinformation warehousing.

  • 8/4/2019 DATA Warehousing Concepts2

    3/22

    4/14/2012 3

    Consistency

    Both OLTP and data warehouse systems are

    greatly concerned with data consistency.

    OLTP consistency is microscopic. The point oftransaction processing is to process a very large

    number of tiny, atomic transactions with outloosing any of them.

    In a data warehouse, consistency is measured

    globally. We dont care about individualtransactions. But we care enormously that thecurrent load of new data is a full and consistentset of data.

  • 8/4/2019 DATA Warehousing Concepts2

    4/22

    4/14/2012 4

    What is a Transaction

    A serious OLTP System processes thousands or even

    millions of transactions per day.

    A serious data warehouse often will process only onetransaction per day. But this transaction contains millionsof records Called a Production Data Load.

    What we care about is the consistent state of the systemwe started before the production data load.

    If we are forced to stop the production data load before itwas complete we will not roll back the inserted records.We will rather overwrite the entire system with a snapshotof the system taken before the production data load.

  • 8/4/2019 DATA Warehousing Concepts2

    5/22

    4/14/2012 5

    Users and Managers

    The Users of the OLTP System turn the wheels of

    an organization where as The Users of a Datawarehouse watch the wheels of the organization

    Users of an OLTP system almost always deal with

    one account at a time

    OLTP users perform the same tasks many , manytimes.

    Performance is the absolute king of an OLTPsystem. NO optional activity is allowed to slowdown an OLTP System.

  • 8/4/2019 DATA Warehousing Concepts2

    6/22

    4/14/2012 6

    Dimensions in Data Analysis

    In the world of data warehousing, a summarizable

    numerical value that you use to monitor your businessis called a Measure

    When looking for numeric information your firstquestion will be What measure U want to see?

    You could look at lets say, ales units, sales dollars,defects etc.

    Suppose that U ask to see a report of your companysUnits Sold.

    Heres what u get:

    113

  • 8/4/2019 DATA Warehousing Concepts2

    7/224/14/2012 7

    Fact Table

    A Fact Table is a table in the relational data

    warehouse that stores the detailed values formeasures, or facts.

    Example a fact table that stores Dollars and Units

    by state, by product and by Month has fivecolumns.

    The first 3 columns are Key columns, theremaining two are measure values.

    State Product Month Units Dollars

  • 8/4/2019 DATA Warehousing Concepts2

    8/224/14/2012 8

    Fact Table

    Each column in the fact table should be either a key or

    a measure.

    The fact table must contain a column for each measure.

    The fact table must contain rows at the lowest level ofdetail you might want to retrieve for a measure.

    A fact table almost always uses an integer key for eachmember rather than a descriptive name.

    The key column for a date dimension might be either aninteger key or a date.

  • 8/4/2019 DATA Warehousing Concepts2

    9/224/14/2012 CHRIS 9

    Dimension Tables

    A dimension table contains one row for each leaf

    level member of the dimension.Ex. A product dimension table with 3 products willhave 3 rows.

    In most cases a dimension table also contains onecolumn containing a numeric key columns thatuniquely identifies each member.

    This column that contains the unique value is theprimary key and references the foreign key in thefact table.

  • 8/4/2019 DATA Warehousing Concepts2

    10/224/14/2012 CHRIS 10

    Dimension Tables

    If the dimension is involved in a balanced hierarchy it

    will have an additional column that gives the parentfor each member.Ex.if you have 3 products in a dimension table thatbelong to a particular product Subcategory your table

    will look like this.

    PROD_ID Prod_Name SubCategory

    589

    592

    1218

    Sweet Muffins

    Coconut Muffins

    Salt Bread

    Muffins

    Muffins

    Bread

  • 8/4/2019 DATA Warehousing Concepts2

    11/224/14/2012 CHRIS 11

    Star Schema

    When each dimension is stored in a single table,

    the databases organization is called a starSchema Design.

    When a Database Dimensions are stored in a

    chain of tables, the databases design is called aSnowflake Design.

    A relational database must perform time

    consuming joins each time a report executes, anda star design for a dimension requires fewer joinsthan a snowflake design.

  • 8/4/2019 DATA Warehousing Concepts2

    12/224/14/2012 CHRIS 12

    Basic Elements - Data Warehouse

    Source System- An operational system of record

    whose function it is to capture the transactions ofthe business

    Data Staging Area- A Storage area and set of

    processes that clean, transform, combine, de-duplicate, household, archive and prepare sourcedata for use in the data warehouse.

    Presentation Server - The target physicalmachine on which the data warehouse data isorganized and stored for direct querying by endusers, report writers, and other applications.

  • 8/4/2019 DATA Warehousing Concepts2

    13/224/14/2012 CHRIS 13

    Basic Elements - Data Warehouse

    Dimensional Model A specific discipline for

    modeling data that is an alternative to entityrelationship (E/R) modeling.

    Business Process A coherent set of business

    activities that make sense to the business users ofour data warehouses

    Data Mart A logical subset of the complete

    data warehouse.

    Data Warehouse - The queryable source ofdata in the enterprise.

  • 8/4/2019 DATA Warehousing Concepts2

    14/224/14/2012 CHRIS 14

    Basic Elements - Data Warehouse

    Operational Data Store(ODS) Has

    taken too many definitions to be useful tothe data warehouse.

    OLAP (On-line Analytic Processing)

    The general activity of querying andpresenting text and number data from datawarehouses, as well as a specifically

    dimensional style of querying andpresenting that is exemplified by a numberof OLAP vendors

  • 8/4/2019 DATA Warehousing Concepts2

    15/224/14/2012 CHRIS 15

    Basic Elements - Data Warehouse

    ROLAP ( Relational OLAP ) A storage option

    or set of user interfaces and applications that givea relational database a dimensional flavor.

    MOLAP ( Multidimensional OLAP) -A storageoption or set of user interfaces and applicationsand proprietary database technology that have astrongly dimensional flavor.

    HOLAP ( Hybrid OLAP) -A storage option ofboth relational and proprietary structure.

  • 8/4/2019 DATA Warehousing Concepts2

    16/224/14/2012 CHRIS 16

    Basic Elements - Data Warehouse

    End User Application - A collection of tools

    that query, analyze, and present informationtargeted to support a business need.

    End User Data Access Tool - A client of thedata warehouse.

    Ad Hoc Query Tool A specific kind of end user

    data access tool that invites the user to form theirown queries by directly manipulating relationaltables and their joins.

  • 8/4/2019 DATA Warehousing Concepts2

    17/224/14/2012 CHRIS 17

    Basic Elements - Data Warehouse

    Modeling Applications A sophisticated kind of

    data warehouse client with analytic capabilitiesthat transform or digest the out put from the datawarehouse.Modeling applications include :

    Forecasting modelsBehavior scoring models

    Allocation models

    Data mining tools

    Metadata All the information in the datawarehouse environment that is not the actualdata itself.

  • 8/4/2019 DATA Warehousing Concepts2

    18/22

    4/14/2012 CHRIS 18

    Basic Processes - Data Warehouse

    Extracting The first step of getting Data into

    the data warehouse.

    Transformation Once data extracted into thedata staging area, many possible transformation

    steps, including Cleaning the data, correctingmisspelling, purging selected fields, CreatingSurrogate keys for each dimension, Building

    Aggregates etc.

    Loading and Indexing Loading in the datawarehouse.

  • 8/4/2019 DATA Warehousing Concepts2

    19/22

    4/14/2012 CHRIS 19

    Basic Processes - Data Warehouse

    Quality Assurance Checking Quality

    assurance can be checked by running acomprehensive exception report over the entirenew set of newly loaded data.

    Release/Publishing - The User communitymust be notified that the new data is ready.

    Updating Modern data marts may well be

    updated, sometimes frequently. Changes inlabels, changes in hierarchies, changes in status,and changes in corporate ownership.

  • 8/4/2019 DATA Warehousing Concepts2

    20/22

    4/14/2012 CHRIS 20

    Basic Processes - Data Warehouse

    Querying Querying is abroad term that

    encompasses all the activities of requesting datafrom a data mart.

    Data Feedback/Feeding in Reverse The

    data can also flow in the opposite direction uphill from the traditional flow we have discussed.

    Auditing At times it is critically important to

    know where the data came from and what werethe calculations performed. For this you cancreate special audit records.

  • 8/4/2019 DATA Warehousing Concepts2

    21/22

    4/14/2012 21

    Basic Processes - Data Warehouse

    Securing - Every data warehouse has an

    exquisite dilemma: Publishing the data as widelyto as many users as possible with the easiest ofuser interfaces, at the same time protect the datafrom misuse and snoopers.

    Backing Up and Recovering Since datawarehouse data is a flow of data from the legacysystem on through to the data marts and

    eventually onto the users desktops, a realquestion arises about where to take the necessarysnapshots.

  • 8/4/2019 DATA Warehousing Concepts2

    22/22

    4/14/2012 22

    Steps in the Design Process

    Choose a business process to model

    Choose the grain of the business process

    Choose the dimensions that will apply foreach business process and theattributes/members for each dimension

    Choose the measured facts that willpopulate each fact table record.