DWH Fundamentals (Training Material)

download DWH Fundamentals (Training Material)

of 21

Transcript of DWH Fundamentals (Training Material)

  • 8/17/2019 DWH Fundamentals (Training Material)

    1/21

    Fundamentals of Data Warehousing28/02/2009

    Kambali Chandrakanth Gowd

    Chandrakanthgowd.k @tcs.com

    mailto:[email protected]:[email protected]

  • 8/17/2019 DWH Fundamentals (Training Material)

    2/21

    Introduction:

    A data warehouse is basically a storage area where all an organization's informationor data is stored and managed in a manner that will allow all users in the organization

    to use that data in their decision-making process.

    Before and early 1980's the data was evaluated in the form of ManagementInfromation System reports and there are many difficulties in this method.Fortunately, in the late 1980’s, data warehousing concepts was intended to providean architectural model for the flow of data from operational systems to decisionsupport environments. Further, data warehouses have been designed and built asseparate technology entities from operational and transactional systems and havebecome the primary repositories for performing business intelligence.

    There are four advances in data warehouse technology that has allowed it to evolve.These advances are offline operational databases, offline data warehouses, real timedata warehouses and the integrated data warehouses.

    • Offline Operational Databases - Data warehouses in this initial stage aredeveloped by simply copying the database of an operational system to an off-line server where the processing load of reporting does not impact on theoperational system’s performance.

    • Offline Data Warehouse - Data warehouses in this stage of evolution areupdated on a regular time cycle from the operational systems and the data isstored in an integrated reporting-oriented data structure.

    Real Time Data Warehouse - Data warehouses at this stage are updated ona transaction or event basis, every time an operational system performs atransaction

    • Integrated Data Warehouse - Data warehouses at this stage are used togenerate activity or transactions that are passed back into the operationalsystems for use in the daily activity of the organization.

  • 8/17/2019 DWH Fundamentals (Training Material)

    3/21

    History of Data warehousing:

    The executives and managers who are responsible for keeping the enterprisecompetitive need information to make proper decisions. They need information toformulate the business strategies, establish goals, set objectives and monitor results.

    In spite of a lot of data accumulated by enterprises over the past decades, everyenterprise is caught in the middle of an information crisis. Information needed for thestrategic decision making is not readily available. Companies are desperate forstrategic information to extend market share and improve profitability.

    Strategic information is not for running the day-to-day operations of the business. It ismore important for continued health and survival of the corporation. Critical businessdecisions depend on the availability of proper strategic information in an enterprise.

    Analysts, executives and managers use strategic information interactively to analyzeand spot business trends.

    Information needed for strategic decision making has to be available in an interactivemanner. User must be able to query online, get results and query some more. Theinformation must be in a format suitable for analysis.

    All the past attempts by Information Technology (IT) to provide strategic informationhave been failures. This was mainly because IT has been trying to provide strategicinformation from operational systems. Operational systems could not provide

    strategic information. The operational computer systems provide information to runday-to-day operations.

    Only specially designed decision support systems can provide strategic information.Specially designed decision support systems are not meant to run the core businessprocesses. They are used to watch how the business runs and then make strategicdecisions to improve the business. Decision support systems are developed to getstrategic information out of the database, but operational systems are designed toput the data into the database.

    Data warehousing is only the viable solution for providing strategic information. Thisis not to generate fresh data, but to make use of large volumes of existing data and

    to transform it into forms suitable for providing strategic information. The concept ofdata warehousing is take all the data which already exists in the organization, cleanand transform it then provide useful strategic information.

  • 8/17/2019 DWH Fundamentals (Training Material)

    4/21

    Data Warehouse:

    A Data Warehouse is a subject oriented, integrated, time variant and nonvolatilecollection of data in support of management's decisions.

    Subject or iented Data:

    In operational systems data is stored by individual applications. Data sets have toprovide data for the specific applications to perform the specific functions efficiently.Therefore data sets for each application need to be organized around that specificapplication.

    In Data warehouses data is not stored by operational applications, but by Businesssubjects. Business subjects differ from enterprise to enterprise and they are criticalfor the enterprise.

    Integrated Data:

    All the relevant data from various applications must pull together for proper decisionmaking. The data in the data warehouse comes from several operational systems.Sources data are in different databases, files and data segments.

    Data inconsistencies are removed and process of transformation, consolidation andintegration of the source data are followed before the data is stored in a datawarehouse.

    Nonvolatile Data:

    The data in the data warehouse is primarily for query and analysis and not intendedto run the day-to-day business. The data in a data warehouse is not as volatile as thedata in an operational database is.

    Time-variant Data:

    All data in the data warehouse is identified with a particular time period.

    The time-variant nature of the data in a data warehouse

    • Allows for analysis of the past

    • Relates information to the present

    • Enables forecasts for the future

  • 8/17/2019 DWH Fundamentals (Training Material)

    5/21

    Benefits of Data warehousing:

    • The primary focus of Data warehousing environments are optimal analysisand speed retrieval of data rather than efficient creation and modification ofdata.

    • Implementations of data warehouses have been found to provide substantialcost savings for organizations and have positive affects towards anorganization’s financial “bottom line.”

    • The consistent data exists in the data warehouse.

    • Business users will be able to query data directly with less informationtechnology support.

    • Data warehouses enhance the value of operational business applications.

    • Decision makers will be able to retrieve highly organized information.

    Operational systems Versus Data warehousing systems:

    Operational systems Data Warehousing systemsOperational systems are generallyconcerned with current data.

    Data warehousing systems are generallyconcerned with historical data.

    Data is updated regularly according to

    need.

    Data is generally read-only.

    Operational systems are generallyprocess-oriented (focused on specificbusiness processes or tasks)

    Data warehousing systems are generallysubject-oriented

    Operational systems are generallydesigned to support high-volumetransaction processing with minimalback-end reporting.

    Data warehousing systems are generallydesigned to support high-volume analyticalprocessing and subsequent, often elaboratereport generation.

    Operational systems are generallyoptimized to perform fast inserts andupdates of relatively small volumes ofdata.

    Data warehousing systems are generallyoptimized to perform fast retrievals ofrelatively large volumes of data.

    Operational systems generally requirea non-trivial level of computing skillsamongst the end-user community.

    Data warehousing systems generallyappeal to an end-user community with awide range of computing skills, from noviceto expert users.

  • 8/17/2019 DWH Fundamentals (Training Material)

    6/21

    OLTP and OLAP:

    OLTP stands for on-line transaction processing.OLTP is a class of program that facilitates and manages transaction-orientedapplications.OLTPs are designed for optimal transaction speed. The main purpose of OLTP is tocontrol and run fundamental business tasks.

    OLAP stands for On-Line Analytical Processing.OLAP has been growing in popularity due to the increase in data volumes and therecognition of the business value of analytics. Until the mid-nineties, performingOLAP analysis was an extremely costly process mainly restricted to largerorganizations. OLAP allows business users to slice and dice data at will. Normallydata in an organization is distributed in multiple data sources and are incompatible

    with each other. Part of the OLAP implementation process involves extracting datafrom the various data repositories and making them compatible. Making datacompatible involves ensuring that the meaning of the data in one repository matchesall other repositories. OLAPs are designed to give an overview analysis of whathappened.

    OLAP Characteristics:

    • OLAP facilitate interactive query and complex analysis for the users.• Allow users to drill down for greater details or roll up for aggregations of

    metrics along a single business dimension or across multiple dimensions.• Provide ability to perform intricate calculations and comparisons• Present results in a number of meaningful ways, including charts and graphs.

    Types of OLAP models:

    There are different types of OLAP models• Multidimensional OLAP (MOLAP)• Relational OLAP (ROALP)• Hybrid OLAP (HOALP)

    Multidimensional OLAP (MOLAP):

    In MOLAP model, data for analysis is stored in specialized multidimensionaldatabases. MOLAP is the fastest option for data retrieval (cubes are built for fast dataretrieval), but requires the most disk space. Disk space is less of a concern thesedays with lowering storage and processing cost. MOLAP can handle only moderatevolumes of data, because all calculations are performed when the cube is built, it isnot possible to include a large amount of data in the cube itself. Data analysis is easyirrespective of number of dimensions.

  • 8/17/2019 DWH Fundamentals (Training Material)

    7/21

    Relational OLAP (ROALP):In the ROLAP model, data is stored as rows and columns in relational form andpresents data to users in the form of business dimensions. It handles very large

    amount of data, but data retrieval is slow. There are limitations on complex dataanalysis functions. ROLAP is best suited for smaller data warehousingimplementations.

    Hybrid OLAP (HOALP):

    HOLAP technologies attempt to combine the advantages of MOLAP and ROLAP.HOLAP can "drill through" from the cube into the underlying relational data andleverages cube technology for faster performance.

    OLTP versus OLAP:

    OLTP System OLAP SystemSource ofdata

    OLTPs are the original source of thedata.(Operational data)

    OLAP data comes from thevarious OLTP Databases(Consolidation data)

    Purpose ofdata

    To control and run fundamentalbusiness tasks

    To help with planning,problem solving, and decisionsupport

    Inserts andUpdates

    Short and fast inserts and updatesinitiated by end users

    Periodic long-running batch jobs refresh the data

    Queries Relatively standardized and simplequeries and returning relatively fewrecords

    Often complex queriesinvolving aggregations

    ProcessingSpeed

    Typically very fast Depends on the amount ofdata involved. Batchdata refreshes and complexqueries may take many hours.Query speed can beimproved by creating indexes

    DatabaseDesign

    Highly normalized with many tables Typically de-normalized withfewer tables; use of

    star and/or snowflakeschemasBackup andRecovery

    Backup religiously. Operational datais critical to run the business, dataloss is likely to entail significantmonetary loss and legal liability

    Instead of regular backups,some environmentsmay consider simplyreloading the OLTP data as arecovery method

  • 8/17/2019 DWH Fundamentals (Training Material)

    8/21

    Data warehouse Architecture:

    Data warehouses and their architectures vary depending upon the specifics of anorganization's situation. The common architecture is:

    Data Warehouse Architecture

    Operational systems, ERP, CRM and Flat files are the different types of datasources. ETL (Extraction, Transformation and Loading) is a process of pulling dataout from the source systems and placing it into a data warehouse.

    Extraction:

    Data from different source systems is converted into one consolidated datawarehouse format which is ready for transformation processing.

  • 8/17/2019 DWH Fundamentals (Training Material)

    9/21

    Transformation:

    In transforming the data, the following tasks may involve.

    • Applying business rules (for example calculating new measures anddimensions)

    • Cleaning (for example Mapping “NULL” to “0” or "Male" to "M" and "Female"to "F" etc)

    • Filtering (for example selecting only certain columns to load),• Splitting a column into multiple columns and vice versa• Joining together data from multiple sources (for example lookup, merge)• Transposing rows and columns• Applying any kind of simple or complex data validation (for example if the first

    three columns in a row are empty then reject the row from processing)

    Loading:

    Loading data into the data warehouse.

    The data is loaded into the Data warehouse database. The metadata and raw data ofa traditional OLTP (on-line transaction processing) system is present, as is anadditional type of data, summary data. Summaries are very valuable in datawarehouses because they pre-compute long operations in advance. End usersdirectly access data derived from several source systems through the datawarehouse

    OLAP (Online Analytical Processing) are being used aggressively by organizations to

    discover valuable business trends from data marts and data warehouses. OLAPprovides a historical view of data, although useful when used by itself, OLAP analysisbecomes truly powerful when combined with predictive analysis from Data Mining .

    Data Mining:

    Data mining, the extraction of hidden predictive information from large databases, isthe process of analyzing data from different perspectives and summarizing it intouseful information Data mining tools predict future trends and behaviors, allowingbusinesses to make proactive, knowledge-driven decisions. It allows users to analyzedata from many different dimensions or angles, categorize it, and summarize therelationships identified. Technically, data mining is the process of finding correlationsor patterns among dozens of fields in large relational databases.

    The main difference between the database architecture and Data Warehousearchitecture is that the system’s relational model is usually de-normalized intodimension and fact tables which are typical to a data warehouse database design.

    http://www.dwreview.com/Data_mining/index.htmlhttp://www.dwreview.com/Data_mining/index.html

  • 8/17/2019 DWH Fundamentals (Training Material)

    10/21

    ER and Dimensional Modeling:

    Entity-relationship modeling is a logical design technique that seeks to remove theredundancy in data. Entity-relationship (ER) modeling is a powerful technique fordesigning transaction processing systems in relational environments. By helping toautomate the normalization of physical data structures, ER has greatly contributed tothe phenomenal success of getting large amounts of data into relational databases.However, ER models do not contribute to the user’s ability to query the data.ER isvery useful for the transaction capture and the data administration phases ofconstructing a data warehouse, but it should be avoided for end-user delivery.

    To understand dimensional Modeling, let’s define some of the terms commonly used:

    At tr ibute:

    A unique level within a dimension.

    For example, Month is an attribute in the Time Dimension.

    Fact Table:

    A fact table is a table that stores facts that measure the business, such as sales, costof goods, or profit. Fact tables also contain foreign keys to the dimension tables.These foreign keys relate each row of data in the fact table to its correspondingdimensions and levels.

    Dimension Table:

    A dimension table is a table that stores attributes that describe aspects of adimension. For example, a time table stores the various aspects of time such asyear, quarter, month, and day. A foreign key of a fact table references the primarykey in a dimension table in a many-to-one relationship.

    Primary Key:

    Each row in a dimensional table is identified by a unique value of an attributedesignated as the primary key of the dimension

    Surrogate Key:

    A Surrogate key is a system-generated sequence number which do not have anybuilt-in meaning.

    Dimensional modeling (DM) is the name of a logical design technique often used fordata warehouses.DM is the only viable technique for databases that are designed tosupport end-user queries in a data warehouse. Data warehouses are typicallydeveloped using dimensional models rather than the traditional Entity-relationshipmodels associated with conventional relational databases.

  • 8/17/2019 DWH Fundamentals (Training Material)

    11/21

    The Strengths of Dimensional modeling:

    The dimensional model has a number of important data warehouse advantages thatthe ER model lacks.

    • The dimensional model is a predictable, standard framework. Report writers,query tools, and user interfaces can all make strong assumptions about thedimensional model to make the user interfaces more understandable and tomake processing more efficient.

    • The dimensional model withstands unexpected changes in user behavior• The dimensional model is gracefully extensible to accommodate unexpected

    new data elements and new design decisions.• The dimensional model is a body of standard approaches for handling Slowly

    Changing Dimensions• The dimensional model is the growing body of administrative utilities and

    software processes that manage and use aggregates

    ER modeling Versus Dimensional modeling :

    • An ER modeling focus on individual events whereas Dimensional modelingfocus on how managers view the business

    • The ER modeling is split as per the entities. A dimension model is split as perthe dimensions and facts.

    • An ER modeling has complex group of entities linked with each other,whereas the Dimensional model has logical grouped set of star-schemas.

    • In an ER modeling all attributes for an entity including textual as well asnumeric, belong to the entity table. Whereas a 'dimension' entity in dimensionmodel has mostly the textual attributes, and the 'fact' entity has mostlynumeric attributes.

    • An ER modeling has highly normalized model whereas dimensional modelaggregates most of the attributes and hierarchies of a dimension into a singleentity.

    Slowly Changing Dimensions (SCD):

    In Dimensional Modeling, Most Dimensions are generally constant, but they dochange over time. The product key of the source record does not change but thedescription and other attributes change slowly over time. For example, customer isconstant but demographical details of a customer might change several times during

    the year. In Dimensional modeling, Slowly Changing Dimensions can record thesetypes of changes. The “changing Dimensions” means the variation in dimensionalattributes over time.The Slowly Changing Dimensions can be categorized into three types

    • Type 1 SCD (Overwriting History)• Type 2 SCD (Preserving History)• Type 3 SCD (Preserving a version of History)

  • 8/17/2019 DWH Fundamentals (Training Material)

    12/21

    Type 1 SCD:

    A “Type 1” change overwrites an existing dimensional attribute with new information.This updates only the attribute and doesn’t insert any new record.

    For example, if the customers address changes, the new address overwrites the oldaddress. Therefore old address is lost forever.

    Type 2 SCD:

    A “Type 2” change writes a record with the new attribute information and preserves arecord of the old dimensional data. The new record is inserted with a new surrogatekey.For example, if the customers address changes, the new address is added.Therefore, both old and new address will be present. The new address is insertedusing surrogate key.

    Type 3 SCD:

    A “Type 3” change places a value for the change in the original dimensional record,instead of creating a new dimensional record to hold the attribute change.For example, if the customers address changes then the old address, new addressand effective date of change is captured. Therefore old, new address and effectivedate of change will be present.

    Type 3 will not be able to keep all history where an attribute is changed more thanonce. Type 3 is rarely used in actual practice.

    Dimensional Model Schemas:Data Warehouse environment usually transforms the relational data model into somespecial architecture. There are many schema models designed for data warehousingbut the most commonly used are:

    • Star schema • Snowflake schema • Fact constellation schema

    The determination of which schema model should be used for a data warehouseshould be based upon the analysis of project requirements, accessible tools andproject team preferences.

    Star schema:

    The arrangement of the collection of fact and dimension tables in the dimensionaldata model, resembling a star formation, with the fact table placed in the middle

    http://www.datawarehouse4u.info/Data-warehouse-schema-architecture-star-schema.htmlhttp://www.datawarehouse4u.info/Data-warehouse-schema-architecture-snowflake-schema.htmlhttp://www.datawarehouse4u.info/Data-warehouse-schema-architecture-snowflake-schema.htmlhttp://www.datawarehouse4u.info/Data-warehouse-schema-architecture-star-schema.html

  • 8/17/2019 DWH Fundamentals (Training Material)

    13/21

    surrounded by the dimension tables. Usually the fact tables in a star schema are inthird normal form (3NF) whereas dimensional tables are de-normalized.

    Snowflake schema:

    “Snowflaking” is a method of normalizing the dimension tables in a STAR schema.Snowflake schemas normalize dimensions to eliminate redundancy. The dimensiondata has been grouped into multiple tables instead of one large table, so thesnowflake schema is a more complex schema than the star schema

    The following figure shows a snowflake schema with two dimensions, each havingthree levels. A snowflake schema can have any number of dimensions and eachdimension can have any number of levels.

  • 8/17/2019 DWH Fundamentals (Training Material)

    14/21

    Fact cons tellation schema:

    For each star schema or snowflake schema it is possible to construct a factconstellation schema. This schema is more complex than star or snowflake schemabecause it contains multiple fact tables. This allows dimension tables to be shared

    amongst many fact tables. That solution is very flexible but it may be hard to manageand support.

    The main disadvantage of the fact constellation schema is a more complicateddesign because many variants of aggregation must be considered.

    In a fact constellation schema, different fact tables are explicitly assigned to thedimensions, which are for given facts relevant. This may be useful in cases whensome facts are associated with a given dimension level and other facts with a deeperdimension level.

    Data Mart:

    A collection of related data from internal and external sources, transformed,integrated and stored for the purpose of providing strategic information for a specificset of users in an enterprise.

    The data mart contains only a small amount of historical information and is granularonly to the point that it suits the needs of the department. The data mart is typically

  • 8/17/2019 DWH Fundamentals (Training Material)

    15/21

    housed in multidimensional technology which is great for flexibility of analysis but isnot optimal for large amounts of data. Data found in data marts is highly indexed.

    There are two kinds of data marts

    • Dependent Data mart• Independent Data mart.

    All dependent data marts has data warehouse as a source. Dependent data martsare architecturally and structurally sound.

    An independent data mart is one whose source is the legacy applicationsenvironment. Each independent data mart is fed uniquely and separately by thelegacy applications environment. Independent data marts are unstable andarchitecturally unsound. The problem with independent data marts is that theirdeficiencies do not make themselves manifest until the organization has built multiple

    independent data marts.

    Operational Data Store (ODS):

    An Operational Data Store (ODS) is an integrated database of operational data. Itssources include legacy systems and it contains current or near term data.

    An operational data store is basically a database that is used for being an interimarea for a data warehouse. It works with a data warehouse but unlike a datawarehouse, an operational data store does not contain static data. Instead, anoperational data store contains data which are constantly updated through the course

    of the business operations.

    Data Warehouse Methodologies:

    The two major design methodologies of data warehousing are from Ralph Kimballand Bill Inmom. Both Inmom and Kimball view data warehousing as separate fromOLTP and Legacy applications.

    Inmon beliefs in creating a data warehouse on a subject-by-subject area basis.Hence the development of the data warehouse can start with data from the onlinestore. Other subject areas can be added to the data warehouse as their needs arise.Point-of-sale (POS) data can be added later if management decides it is necessary.

    The data mart is the creation of a data warehouse's subject area.

  • 8/17/2019 DWH Fundamentals (Training Material)

    16/21

    Inmon's Data Warehouse Design Methodology

    Kimball views data warehousing as a constituency of data marts. Data marts arefocused on delivering business objectives for departments in the organization. Andthe data warehouse is a conformed dimension of the data marts. Hence a unifiedview of the enterprise can be obtained from the dimension modeling on a localdepartmental level.

    Kimball's Data Warehousing Design Methodology

  • 8/17/2019 DWH Fundamentals (Training Material)

    17/21

    The Life cycle of a Data Warehouse project:

    There are different phrases involved in a Data Warehousing project life cycle

    • Requirement Gathering• Physical Environment Setup• Data Modeling• ETL• OLAP Cube Design• Front End Development• Performance Tuning• Quality Assurance• Rollout To Production• Production Maintenance• Incremental Enhancements

    Requirement Gathering:

    The main objective of this phrase is to identify objects necessary for the Reportingand Analysis requirements. During this phrase, Business managers will play the vitalrole and there will be a direct discussion with the end users. The various datasources are identified (Operational systems, ERP, CRM and Flat files etc).

    The deliverables in this phrase are

    • A list of reports/cubes to be delivered to the end users by the end of thiscurrent phase.

    • An updated project plan that clearly identifies resource loads and milestonedelivery dates.

    Physical Environment Setup:

    After Requirements gathering phrase is completed, physical environment has to beset up by installing database and maintaining the physical servers.

    The usual sets of servers include 3 sets of instances.

    • Development Instance

    • Test Instance

    • Production Instance

    Development Instance: In this instance developers work on the database anddevelop objects then move that code for testing.

    Test Instance: In this instance Testers will test the objects developed by developers.

    http://www.1keydata.com/datawarehousing/enhancement.htmlhttp://www.1keydata.com/datawarehousing/enhancement.html

  • 8/17/2019 DWH Fundamentals (Training Material)

    18/21

    Production Instance: After testing, the objects are moved into the productioninstance.

    Along with the above instances, there will be separate database servers for ETL,OLAP and Reporting tools. The Network admin and database Administrators will play

    the key role in setup of the servers and they submit the detailed document about theservers to project managers.

    The deliverables in this phrase are

    • Hardware/Software setup document for all of the environments includinghardware specifications and scripts/settings for the software.

    Data Modeling:

    A Data model is a conceptual representation of data structures (tables) required for adatabase and is very powerful in expressing and communicating the business

    requirements. A data model represents the nature of data, business rules governingthe data and how it will be organized in the database. There are three levels of datamodeling. They are

    • Conceptual Data Model• Logical Data Model• Physical Data Model

    Conceptual Data Model: At this level, the data modeler attempts to identify thehighest-level relationships among the different entities.

    Logical Data Model: At this level, the data modeler attempts to describe the data indetail, without regard to how they will be physically implemented in the database.In data warehousing, it is common for the conceptual data model and the logical datamodel to be combined into a single step

    The steps for designing the logical data model are as follows:

    • Identify all entities.• Specify primary keys for all entities.• Find the relationships between different entities.• Find all attributes for each entity.•

    Resolve many-to-many relationships.• Normalization.

    Physical Data Model: At this level, the data modeler will specify how the logical datamodel will be realized in the database schema.

  • 8/17/2019 DWH Fundamentals (Training Material)

    19/21

    The steps for physical data model design are as follows:

    • Convert entities into tables.• Convert relationships into foreign keys.• Convert attributes into columns.• Modify the physical data model based on physical constraints / requirements.

    The deliverables in this phrase are

    • Identification of data sources.• Logical data model.• Physical data model.

    ETL (Extraction, Transformation and Loading):

    The ETL phrase typically takes time to develop and the reason for this is that it takestime to get the source data, understand the necessary columns, understand thebusiness rules, and understand the logical and physical data models.

    The deliverables in this phrase are

    • Data Mapping Document• ETL Script/ETL Package in the ETL tool

    OLAP Cube Design:

    OLAP databases provide aggregated summary information quickly using a schemathat is easily understood by end users. The cube consists of two primary concepts:measures and dimensions. The measures are the numeric values that providesummaries at various different levels of aggregation. The dimensions are the way inwhich the numeric values are summarized. Within the cube, measures are organizedwithin measure groups. A measure group is associated with a single fact or eventthat is tracked by the OLAP database. Also, the measures can be summarized byvarious dimensions, some of which are common across the various measure groups.Data warehousing is an iterative process. It’s difficult to get all the requirements atonce

    The deliverables in this phrase are

    • Documentation specifying the OLAP Cube dimensions and measures.• Actual OLAP Cube/report.

  • 8/17/2019 DWH Fundamentals (Training Material)

    20/21

    Front End Development:

    Front end development is an important part of a data warehousing initiative. If thereports are not bringing any value to the end user, then the efforts to build the OLAPcube is wasted. It is the trend to have reports seen through a standard web browser

    like Internet explorer. It is not a good idea to install report viewing software on eachand every machine of the end user. So it’s very important to think about the endreports and timely delivery of reports to the end user.

    The deliverables in this phrase are

    • Front End Deployment Documentation

    Performance Tuning:

    There are three major areas where a data warehousing system can use a littleperformance tuning.

    • ETL• Query Processing• Report Delivery

    ETL: Since loading data is very time consuming, it’s best to put that activity in a nightload job. The ETL process needs to be tuned more, because often the jobs do notget started on-time due to factors that is beyond the control of the data warehousing

    team.

    Query Processing: Query performance is a big issue in cases where the reports arerun directly against Relationship database. (Especially in the ROLAP environment).Hence ideal for the data warehousing team to invest some time to tune the query

    Report Delivery: End users can experience delays in receiving their reports due tofactors other than the query performance. For example, network traffic, server setupand the Reporting tool used. It is significant for the data warehouse team to look intothese areas for performance tuning.

    The deliverables in this phrase are

    • Performance tuning document - Goal and Result

    Quality Assurance (QA):

    After the Date warehouse team completes the development work then the QA teamwhich is from the Client side starts doing testing.

  • 8/17/2019 DWH Fundamentals (Training Material)

    21/21

    The deliverables in this phrase are

    • QA Test Plan• QA verification that the data warehousing system is ready to go to production

    Rollout to Production:

    Once the QA team gives thumbs up (signoff document), it is time for the datawarehouse system to go live.

    The deliverables in this phrase are• Delivery of the data warehousing system to the end users.

    Production Maintenance:

    Once the data warehouse goes production, it needs to be maintained. Tasks liketaking backup on regular time period and crisis management become very importantand needs to be planned well in advance.

    The deliverables in this phrase are

    • Consistent availability of the data warehousing system to the end users.

    Incremental Enhancements :

    Once the data warehousing system goes live, there are often needs for incrementalenhancements. The task can be as simple as to do the changes in the productionenvironment, but is highly risky to do on live (production) systems. Do the changeson the Development and roll out the changes in the production systems.

    The deliverables in this phrase are

    • Change management documentation• Actual change to the data warehousing system

    http://www.1keydata.com/datawarehousing/maintenance.htmlhttp://www.1keydata.com/datawarehousing/enhancement.htmlhttp://www.1keydata.com/datawarehousing/enhancement.htmlhttp://www.1keydata.com/datawarehousing/maintenance.html