Business Intelligence - Data Warehouse Implementation

Post on 07-Apr-2018

224 views 0 download

Transcript of Business Intelligence - Data Warehouse Implementation

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 1/157

Business

Intelligence &Data Warehousing

ANAND.T,Business Intelligence, Citicards,Tata Consultancy Services Ltd.,

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 2/157

Lecture I

Basics and Concepts

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 3/157

Motivation

Aims of information technology:To help workers in their everyday business activity andimprove their productivity – clerical data processingtasks

To help knowledge workers (executives, managers,analysts) make faster and better decisions – decisionsupport systems

Two types of applications:

Operational applicationsAnalytical applications

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 4/157

The Architecture of Data

Operational data

Metadata

Database schema

Summary data

Business

rules

What’s has beenlearned from data

Logical model

physical layout of data

who,

what,when, where,

summaries

by who,what, when,where,...

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 5/157

Business Intelligence

“Business Intelligence is a technology basedon customer and profit oriented models thatreduces operating costs and provideincreased profitability by improvingproductivity, sales, service and helps to makedecision making capabilities at no time.”

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 6/157

BI Cycle

BusinessIntelligence

A N A L Y S I S

INSIGHT

A C T I ON

MEASUREMENT

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 7/157

Uses of BusinessIntelligence

Operational EfficiencyERP ReportingKPI TrackingProduct ProfitabilityRisk ManagementBalanced ScorecardActivity Based Costing

Global SourcingLogistics

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 8/157

Uses of BusinessIntelligence

Customer InteractionSales AnalysisSales Forecasting

SegmentationCross-sellingCRM AnalyticsCampaign PlanningCustomer Profitability

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 9/157

MarketResearch

TelephoneSurveys

OnlineSurveys

FocusGroups

MysteryShopping

CustomPanels

Online FocusGroups

One-on-ones

EnvironmentalScanning

AC NeilsonReports

AssociationStats

GovernmentReports

MediaMonitoring Economic

Reports

SyndicatedStudies

Data Mining

PredictiveModelling

SegmentationMining Customer

Records

POS SystemsCRM

LibrarySciences

CompetitiveIntelligence

Google

InternalScanning

News ScanningServices

Ad Scanning/Tracking Mystery

Shopping

Website

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 10/157

BI ToolsThese tools will illustrate business intelligence in the areas of customer

profiling, customer support, market research, market segmentation, product profitability, statistical analysis, inventory and distributionanalysis.

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 11/157

Evolution

60’s: Batch reportshard to find and analyze informationinflexible and expensive, reprogram every new request

70’s: Terminal-based DSS and EIS (executive informationsystems)

still inflexible, not integrated with desktop tools80’s: Desktop data access and analysis tools

query tools, spreadsheets, GUIseasier to use, but only access operational databases

90’s: Data warehousing with integrated OLAP engines andtools

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 12/157

Data Warehousing Market

Hardware: servers, storage, clientsWarehouse DBMSToolsMarket growing from

$2B in 1995 to $8 B in 1998 (Meta Group)Systems integration & ConsultingAlready deployed in many industries: manufacturing,retail, financial, insurance, transportation, telecom,utilities, healthcare.

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 13/157

What is a Data

Warehouse“A data warehouse is a subject-oriented,integrated, time-variant, and nonvolatilecollection of data in support ofmanagement’s decision-making process.” ---

W. H. InmonCollection of data that is used primarily inorganizational decision makingA decision support database that is maintained

separately from the organization’s operationaldatabase

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 14/157

How Many Matches?

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 15/157

How Many Matches Now?

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 16/157

Data Warehouse - SubjectOriented

Subject oriented: oriented to the major subjectareas of the corporation that have been definedin the data model.

E.g. for an insurance company: customer, product,

transaction or activity, policy, claim, account, andetc.

Operational DB and applications may be

organized differentlyE.g. based on type of insurance's: auto, life,medical, fire, ...

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 17/157

Data Warehouse –Integrated

Lack consistency in encoding, namingconventions, …, among different data sourcesHeterogeneous data sources

When data is moved to the warehouse, it isconverted.

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 18/157

Data Warehouse - Non-

VolatileOperational data is regularly accessed andmanipulated a record at a time, and update isdone to data in the operational environment.

Warehouse Data is loaded and accessed.Update of data does not occur in the datawarehouse environment.

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 19/157

Data Warehouse - Time

VarianceThe time horizon for the data warehouse issignificantly longer than that of operationalsystems.

Operational data: current value data.

Data warehouse data : nothing more than asophisticated series of snapshots, taken of atsome moment in time.

The key structure of operational data may or may not contain some element of time. Thekey structure of the data warehouse alwayscontains some element of time.

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 20/157

Why Separate DataWarehouse?

PerformanceSpecial data organization, access methods, andimplementation methods are needed to supportmultidimensional views and operations typical of OLAPComplex OLAP queries would degrade performancefor operational transactions

Concurrency control and recovery modes of OLTPare not compatible with OLAP analysis

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 21/157

Why Separate DataWarehouse?

FunctionMissing data: Decision support requires historical datawhich operational DBs do not typically maintainData consolidation: DS requires consolidation

(aggregation, summarization) of data from heterogeneoussources: operational DBs, external sourcesData quality: different sources typically use inconsistentdata representations, codes and formats which have to bereconciled.

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 22/157

Advantages of Warehousing

High query performanceQueries not visible outside warehouseLocal processing at sources unaffected

Can operate when sources unavailableCan query data not stored in a DBMSExtra information at warehouse

Modify, summarize (store aggregates)Add historical information

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 23/157

Advantages of MediatorSystems

No need to copy dataless storageno need to purchase data

More up-to-date dataQuery needs can be unknownOnly query interface needed at sources

May be less draining on sources

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 24/157

Requirements for DataWarehousing

Load performanceLoad processingData quality management

Query perfomanceTerabyte scalabilityMass user scalability

Networked data warehouse

Warehouse administrationIntegrated dimensional analysisAdvanced query funtionality

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 25/157

ExtractTransformLoadRefresh

Data Warehouse

Metadatarepository

Datamartso/p

OLAPserver

OLAP Data miningReports

Operationaldatabases

External datasources

The Architectureof Data Warehousing

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 26/157

Operationaldata source1

Warehouse Manager

DBMS

Operational

data source 2

Meta-dataHigh

summarized data

Detailed data

Lightlysummarized

data

Operationaldata store (ods)

Operationaldata source n

Archive/backupdata

LoadManager

End-useraccess tools

Typical data warehouse – Three Tier architecture

Operational data store (ODS)

QueryManager

summarizeddata(Relational database)

Summarized data(Multi-dimension database)

Data Mart

(First Tier) (Third Tier)

(Second Tier)

Warehouse Manager

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 27/157

Data Sources

Data sources are often the operational systems, providing the lowest level of data.

Data sources are designed for operational use, not for decision support, and the data reflect this fact.

Multiple data sources are often from different systems,run on a wide range of hardware and much of thesoftware is built in-house or highly customized.

Multiple data sources introduce a large number of issues -- semantic conflicts.

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 28/157

Creating and Maintaining

a WarehouseData warehouse needs several tools that automateor support tasks such as:

Data extraction from different external data sources,operational databases, files of standard applications(e.g. Excel, COBOL applications), and other documents (Word, WWW).Data cleaning (finding and resolving inconsistencyin the source data)Integration and transformation of data (betweendifferent data formats, languages, etc.)

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 29/157

Creating and Maintaininga Warehouse

Data loading (loading the data into the datawarehouse)Data replication (replicating source database into

the data warehouse)Data refreshmentData archivingChecking for data qualityAnalyzing metadata

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 30/157

Physical Structure of DataWarehouse

There are three basic architectures for constructing a data warehouse:

Centralized

Distributed/FederatedTiered

The data warehouse is distributed for: load balancing, scalability and higher availability

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 31/157

Physical Structure of DataWarehouse

CentralDataWarehouse

Client Client Client

Source Source

Centralized architecture

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 32/157

Physical Structure of DataWarehouse

Source Source

EndUsers

MarketingFinancialDistribution

LogicalData

Warehouse

LocalData

Marts

Federated architecture

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 33/157

Physical Structure of DataWarehouse

PhysicalData

Warehouse

LocalDataMarts

Workstations(highly summarizeddata)

Source SourceTiered architecture

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 34/157

Physical Structure of DataWarehouse

Federated architectureThe logical data warehouse is only virtual

Tiered architectureThe central data warehouse is physicalThere exist local data marts on different tiers

which store copies or summarization of theprevious tier.

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 35/157

Want to know more about datawarehousing schemas?

YES NO

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 36/157

Related Concepts

Decision Support SystemBusiness ModelingOLTP/OLAPData ModelingETLReportingData Mining

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 37/157

Decision Support System(DSS)One of the powerful tools of BI

Information technology to help knowledge workers(executives, managers, analysts) make faster and better decisions:

what were the sales volumes by region and by product category in the last year?how did the share price of computer manufacturerscorrelate with quarterly profits over the past 10 years?will a 10% discount increase sales volumesufficiently?

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 38/157

Business Modeling

Depicts the overall picture of a businessSub-categories

Business Process Modeling

Business processes are visually represented as diagrams of simple box with arrow graphicsand text labels

Process Flow Modeling

Describe the various processes that happen in an organization and therelationships between them

Data Flow Modeling

Focuses on the flow of data between various Business Processes

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 39/157

Business Modeling Tools

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 40/157

Data Processing ModelsThere are two basic data processing models:

OLTP – Online Transaction ProcessingDescribes processing at operational sitesthe main aim of OLTP is reliable and efficient processingof a large number of transactions and ensuring dataconsistency.

OLAP – Online Analytical ProcessingDescribes processing at warehouse

the main aim of OLAP is efficient multidimensional processing of large data volumes.

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 41/157

OLTP vs. OLAP OLTP OLAP

Users Clerk, IT professional Knowledge worker Function Day To Day Operations Decision SupportDB Design Application-oriented Subject-orientedData Current, Up-to-date Historical, Summarized

Detailed, Flat Relational MultidimensionalIsolated Integrated, Consolidated

Usage Repetitive Ad-hocAccess Read/Write, Lots Of Scans

Index/Hash On Prim. KeyUnit Of Work Short, Simple Transaction Complex Query# RecordsAccessed Tens Millions#Users Thousands HundredsDB Size 100MB-GB 100GB-TBMetric Transaction Throughput Query Throughput, Response

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 42/157

OLAP MultidimensionalDatabases

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 43/157

Data Modeling

A Data model is a conceptual representationof data structures (tables) required for adatabase and is very powerful in expressing

and communicating the businessrequirements.Visually represents

Nature of dataBusiness rules governing the dataOrganization in database

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 44/157

Data Modeling

Types of data modelingConceptual Data ModelingEnterprise Data Modeling

Logical Data ModelingPhysical Data ModelingRelational Data Modeling

Dimensional Data Modeling

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 45/157

Data Modeling

MORE

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 46/157

ETL

ETL stands for Extraction, Transformation ,LoadingSteps involved

Mapping the data between source systems andtarget database (data warehouse or data mart)Cleansing of source data in staging area

Transforming cleansed source data and thenloading into the target system

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 47/157

ETL Tools

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 48/157

Reporting

Business Intelligence Reporting Tools providedifferent views of data by pivoting or rotating thedata across several dimensions.

Nowadays all OLAP tools support reporting.Excel sheets and Flat files are the standardreporting mediums.

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 49/157

Data Mining

Data Mining is a set of processes related to analyzing anddiscovering useful, actionable knowledge buried deep

beneath large volumes of data stores or data setsThis knowledge discovery involves finding patterns or

behaviors within the data that lead to some profitable business actionData Mining Life Cycle

Business problem Analysis

Knowledge DiscoveryImplementationResults Analysis

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 50/157

Typical Data Warehouse

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 51/157

Lecture IIDesign and Implementation

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 52/157

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 53/157

Database designmethodology for datawarehousesThere are many approaches that offer alternative routes to the

creation of a data warehouseTypical approach – decompose the design of the data warehouseinto manageable parts – data marts, At a later stage, the integration

of the smaller data marts leads to the creation of the enterprise-wide data warehouse.The methodology specifies the steps required for the design of adata mart, however, the methodology also ties together separatedata marts so that over time they merge together into a coherentoverall data warehouse.

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 54/157

Step 1: Choosing the process

The process (function) refers to the subject matter of a particular data marts. The first data mart to be builtshould be the one that is most likely to be delivered ontime, within budget, and to answer the most commerciallyimportant business questions.The best choice for the first data mart tends to be the onethat is related to ‘sales’

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 55/157

Step 2: Choosing the grain

Choosing the grain means deciding exactly what a fact table recordrepresents. For example, the entity ‘Sales’ may represent the factsabout each property sale. Therefore, the grain of the‘Property_Sales’ fact table is individual property sale.Only when the grain for the fact table is chosen we can identify thedimensions of the fact table.The grain decision for the fact table also determines the grain of each of the dimension tables. For example, if the grain for the‘Property_Sales’ is an individual property sale, then the grain of the ‘Client’ dimension is the detail of the client who bought a

particular property.

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 56/157

Step 3: Identifying andconforming the dimensions

Dimensions set the context for formulating queries about thefacts in the fact table.We identify dimensions in sufficient detail to describethings such as clients and properties at the correct grain.If any dimension occurs in two data marts, they must beexactly the same dimension, or one must be a subset of theother (this is the only way that two DM share one or moredimensions in the same application).When a dimension is used in more than one DM, thedimension is referred to as being conformed .

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 57/157

Step 4: Choosing the facts

The grain of the fact table determines which facts can beused in the data mart – all facts must be expressed at thelevel implied by the grain.In other words, if the grain of the fact table is an individual

property sale, then all the numerical facts must refer to this particular sale (the facts should be numeric and additive).

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 58/157

Step 5: Storing pre-calculationsin the fact table

Once the facts have been selected each should be re-examined to determine whether there areopportunities to use pre-calculations.

Common example: a profit or loss statementThese types of facts are useful since they are additivequantities, from which we can derive valuableinformation.

This is particularly true for a value that is fundamentalto an enterprise, or if there is any chance of a user calculating the value incorrectly.

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 59/157

Step 6: Rounding out thedimension tables

In this step we return to the dimension tables and addas many text descriptions to the dimensions as

possible.

The text descriptions should be as intuitive andunderstandable to the users as possible

h h d f

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 60/157

Step 7: Choosing the duration of the data warehouse

The duration measures how far back in time the fact table goes.For some companies (e.g. insurance companies) there may be alegal requirement to retain data extending back five or moreyears.

Very large fact tables raise at least two very significant datawarehouse design issues:The older data, the more likely there will be problems inreading and interpreting the old filesIt is mandatory that the old versions of the important

dimensions be used, not the most current versions (we willdiscuss this issue later on)

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 61/157

Step 8: Tracking slowlychanging dimensions

The changing dimension problem means that the proper descriptionof the old client and the old branch must be used with the old datawarehouse schema

Usually, the data warehouse must assign a generalized key to theseimportant dimensions in order to distinguish multiple snapshots of clients and branches over a period of timeThere are different types of changes in dimensions:

A dimension attribute is overwrittenA dimension attribute causes a new dimension record to be created,etc.,

S 9 D idi h

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 62/157

Step 9: Deciding the querypriorities and the query modes

In this step we consider physical design issues.The presence of pre-stored summaries and aggregatesIndices

Materialized viewsSecurity issueBackup issueArchive issue

D b d i h d l

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 63/157

Database design methodologyfor data warehouses - summary

At the end of this methodology, we have a design for a data mart that supports the requirements of a

particular business process and allows the easy

integration with other related data marts to ultimatelyform the enterprise-wide data warehouse.A dimensional model, which contains more than onefact table sharing one or more conformed dimension

tables, is referred to as a fact constellation.

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 64/157

Implementing aWarehouse

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 65/157

Implementing a Warehouse

Designing and rolling out a data warehouse is acomplex process, consisting of the followingactivities:

Define the architecture, do capacity planning, andselect the storage servers, database and OLAPservers (ROLAP vs MOLAP), and toolsIntegrate the servers, storage, and client tools

Design the warehouse schema and views

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 66/157

Implementing a Warehouse

Define the physical warehouse organization, data placement, partitioning, and access method

Connect the sources using gateways, ODBC drivers, or other wrappersDesign and implement scripts for data extraction,cleaning, transformation, load, and refresh

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 67/157

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 68/157

Implementing aWarehouse

Monitoring: Sending data from sourcesIntegrating: Loading, cleansing, ...Processing: Query processing, indexing, ...Managing: Metadata, Design, ...

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 69/157

Monitoring

Data ExtractionData extraction from external sources is usuallyimplemented via gateways and standard interfaces(such as Information Builders EDA/SQL, ODBC,JDBC, Oracle Open Connect, Sybase EnterpriseConnect, Informix Enterprise Gateway, etc.)

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 70/157

Monitoring Techniques

Detect changes to an information source thatare of interest to the warehouse:define triggers in a full-functionality DBMS

examine the updates in the log file

write programs for legacy systems

Polling (queries to source)

screen scraping

Propagate the change in a generic form to theintegrator

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 71/157

Integration

Integrator Receive changes from the monitorsmake the data conform to the conceptual schema used bythe warehouse

Integrate the changes into the warehousemerge the data with existing data already presentresolve possible update anomalies

Data CleaningData Loading

D t Cl i

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 72/157

Data Cleaning

Data cleaning is important to warehouse – there is high probability of errors andanomalies in the data:

inconsistent field lengths, inconsistent descriptions,inconsistent value assignments, missing entries andviolation of integrity constraints.optional fields in data entry are significant sourcesof inconsistent data.

D t Cl i T h i

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 73/157

Data Cleaning Techniques

Data migration : allows simple data transformationrules to be specified, e.g. „replace the string gender

by sex” (Warehouse Manager from Prism is anexample of this tool)

Data scrubbing : uses domain-specific knowledgeto scrub data (e.g. postal addresses) (Integrity andTrillum fall in this category)

Data auditing : discovers rules and relationships by

scanning data (detect outliers). Such tools may beconsidered as variants of data mining tools

D t L di

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 74/157

Data Loading

After extracting, cleaning and transforming, data must be loaded into the warehouse.Loading the warehouse includes some other

processing tasks: checking integrity constraints,

sorting, summarizing, etc.Typically, batch load utilities are used for loading. Aload utility must allow the administrator to monitor status, to cancel, suspend, and resume a load, and to

restart after failure with no loss of data integrity

d

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 75/157

Data Loading Issues

The load utilities for data warehouses have to deal withvery large data volumesSequential loads can take a very long time.

Full load can be treated as a single long batchtransaction that builds up a new database. Usingcheckpoints ensures that if a failure occurs during theload, the process can restart from the last checkpoint

D R f h

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 76/157

Data RefreshRefreshing a warehouse means propagating updateson source data to the data stored in the warehousewhen to refresh:

periodically (daily or weekly)

immediately (defered refresh and immediaterefresh) determined by usage, types of datasource,etc.

D R f h

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 77/157

Data Refresh

how to refreshdata shippingtransaction shipping

Most commercial DBMS provide replication serversthat support incremental techniques for propagatingupdates from a primary database to one or more

replicas. Such replication servers can be used toincrementally refresh a warehouse when sourceschange

Data Shipping

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 78/157

Data Shipping

Data Shipping : (e.g. Oracle Replication Server), atable in the warehouse is treated as a remote snapshotof a table in the source database. After_row trigger isused to update snapshot log table and propagate theupdated data to the warehouse

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 79/157

Transaction Shipping

Transaction Shipping : (e.g. Sybase Replication Server,Microsoft SQL Server), the regular transaction log isused. The transaction log is checked to detect updates onreplicated tables, and those log records are transferred to areplication server, which packages up the correspondingtransactions to update the replicas

D i d D

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 80/157

Derived Data

Derived Warehouse Dataindexesaggregatesmaterialized views

When to update derived data?The most difficult problem is how to refresh thederived data? The problem of constructing algorithms

incrementally updating derived data has been thesubject of much research!

Materialized Views

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 81/157

Materialized Views

Define new warehouse relations using SQLexpressions

sale prodId clientid date amtp1 c1 1 12p2 c1 1 11p1 c3 1 50

p2 c2 1 8p1 c1 2 44p1 c2 2 4

product id name pricep1 bolt 10p2 nut 5

joinTb prodId name price clientid date amtp1 bolt 10 c1 1 12p2 nut 5 c1 1 11

p1 bolt 10 c3 1 50p2 nut 5 c2 1 8p1 bolt 10 c1 2 44p1 bolt 10 c2 2 4

join of sale and product

P i

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 82/157

Processing

Index StructuresWhat to Materialize?Algorithms

I d St t

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 83/157

Index StructuresIndexing principle:

mapping key values to records for associative directaccess

Most popular indexing techniques in relationaldatabase: B+-treesFor multi-dimensional data, a large number of indexing techniques have been developed: R-trees

I d St t

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 84/157

Index Structures

Index structures applied in warehousesinverted lists

bit map indexes join indexestext indexes

MORE

What to Materialize?

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 85/157

What to Materialize?

Store in warehouse results useful for commonqueriesExample:

day 2 c1 c2 c3p1 44 4

p2 c1 c2 c3p1 12 50p2 11 8

day 1

c1 c2 c3p1 56 4 50p2 11 8

c1 c2 c3p1 67 12 50

c1p1 110p2 19

129

. . .

materialize

total sale

View and Materialized

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 86/157

View and MaterializedViewsView

derived relation defined in terms of base (stored)relations

Materialized viewsa view can be materialized by storing the tuples of the view in the databaseindex structures can be built on the materializedview

View and Materialized

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 87/157

View and MaterializedViews

Maintenance is an issue for materialized viewsrecomputationincremental updating

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 88/157

Managing

Metadata Repository

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 89/157

Metadata Repository

Administrative metadatasource database and their contentsgateway descriptionswarehouse schema, view and derived datadefinitionsdimensions and hierarchiespre-defined queries and reports

data mart locations and contents

Metadata Repository

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 90/157

Metadata Repository

Administrative metadatadata partitionsdata extraction, cleansing, transformationrules, defaultsdata refresh and purge rulesuser profiles, user groupssecurity: user authorization, access control

Metadata Repository

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 91/157

Metadata Repository

Businessbusiness terms & definitiondata ownership, charging

Operationaldata layoutdata currency (e.g., active, archived, purged)use statistics, error reports, audit trails

Importance of managing

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 92/157

Importance of managingmetadata

The integration of meta-data, that is ”data about data”Meta-data is used for a variety of purposes and the management of itis a critical issue in achieving a fully integrated data warehouseThe major purpose of meta-data is to show the pathway back towhere the data began, so that the warehouse administrators know the

history of any item in the warehouseThe meta-data associated with data transformation and loading mustdescribe the source data and any changes that were made to thedataThe meta-data associated with data management describes the dataas it is stored in the warehouseThe meta-data is required by the query manager to generateappropriate queries, also is associated with the user of queries

State of Commercial

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 93/157

State of CommercialPracticeProducts and Vendors Datamation, May 15, 1996; R.C. Barquin, H.A. Edelstein: Planning

and Designing the Data Warehouse. Prentice Hall. 1997]

Connectivity to sourcesApertus CA-Ingres GatewayInformation Builders EDA/SQLIBM Data JionerInformix Enterprise Gateway Microsoft ODBCOracle Open Connect Platinum InfohubSAS Connect Software AG EntireSybase Enterprise Connect Trinzic InfoHub

Data extract, clean, transform, refreshCA-Ingres Replicator Carleton PassportEvolutionary Tech Inc. ETI-Extract Harte-Hanks TrilliumIBM Data Joiner, Data Propagator Oracle 7Platinum InfoRefiner, InfroPump Praxis OmniReplicatorPrism Warehouse Manager Redbrick TMUSAS Access Software AG SouorcepointSybase Replication Server Trinzic InfoPump

State of Commercial

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 94/157

State of CommercialPractice

Multidimensional Database EnginesArbor Essbase Comshare Commander OLAPOracle IRI Express SAS SystemWarehouse Data Servers

CA-IngresIBM DB2

Information Builders Focus InformixOracle Praxiz Model 204Redbrick Software AG ADABASSybase MPP TandemTerdata

ROLAP ServersHP Intelligent Warehouse Information Advantage AsxysInformix Metacube MicroStrategy DSS Server

State of Commercial

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 95/157

State of CommercialPracticeQuery/Reporting Environments

Brio/Query Business ObjectsCognos Impromptu CA Visual ExpressIBM DataGuideInformation Builders Focus SixInformix ViewPoint Platinum Forest & TreesSAS Access Software AG EsperantMultidimensional AnalysisAndydne PabloArbor Essbase Analysis Server Business Objects Cognos PowerPlayDimensional Insight Cross Target Holistic Systems HOLOSInformation Advantage Decision Suite IQ Software IQ/VisionKenan System Acumate Lotus 123Microsoft ExcelMicrostrategy DSSPilot Lightship Platinum Forest & Trees

Prodea Beacon SAS OLAP ++Stanford Technology Group Metacube

State of Commercial

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 96/157

State of CommercialPractice

Metadata ManagementHP Intelligent Warehouse IBM Data GuidePlatinum Repository Prism Directory Manager

System ManagementCA Unicenter HP OpenViewIBM DataHub, NetView Information Builder Site Analyzer

Prism Warehouse Manager SAS CPETivoli Software AG Source PointRedbrick Enterprise Control and Coordination

Process ManagementAt& T TOPEND HP Intelligent WarehouseIBM FlowMark Platinum Repository

Prism Warehouse Manager Software AG Source PointSystems integration and consulting

Research

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 97/157

ResearchData cleaning

focus on data inconsistencies, not schema differencesdata mining techniques

Physical Designdesign of summary tables, partitions, indexes

tradeoffs in use of different indexesQuery processing

selecting appropriate summary tablesdynamic optimization with feedbackacid test for query optimization: cost estimation, use of transformations, search strategiespartitioning query processing between OLAP server and backend server.

Research

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 98/157

Research

Warehouse Managementdetecting runaway queriesresource managementincremental refresh techniques

computing summary tables during loadfailure recovery during load and refreshprocess management: scheduling queries,load and refreshuse of workflow technology for processmanagement

References

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 99/157

References

www.toug.org/files/tougpr200302_4.pptwww-db.stanford.edu/~hector/cs245/Notes12.pptwww.epa.gov/storet/conf/Wilson_Data_Warehouse.pptwww.learndatamodeling.comwww.learnbi.comwww.datawarehousing.ittoolbox.comwww.datawarehousing.com

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 100/157

Thank You

QUESTIONS?

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 101/157

APPENDIX AData warehouse Schemas

Star schema

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 102/157

saleorderId

datecustId

prodIdstoreIdqtyamt

customer custIdname

addresscity

productprodId

nameprice

storestoreId

city

A single object (fact table) in the middle connected to a number

of dimension tables

Star schema

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 103/157

customer custId name address city53 joe 10 main sfo81 fred 12 main sfo

111 sally 80 willow la

product prodId name pricep1 bolt 10p2 nut 5

s tore storeId cityc1 nycc2 sfoc3 la

sale oderId date custId prodId storeId qty amto100 1/7/97 53 p1 c1 1 12o102 2/7/97 53 p2 c1 2 11o105 3/8/97 111 p1 c3 5 50

Terms

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 104/157

TermsBasic notion: a measure (e.g. sales, qty, etc)Given a collection of numeric measures

Each measure depends on a set of dimensions (e.g.sales volume as a function of product, time, andlocation)

Terms

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 105/157

Relation, which relates the dimensions to themeasure of interest, is called the fact table (e.g.sale)Information about dimensions can be

represented as a collection of relations – calledthe dimension tables (product, customer, store)Each dimension can have a set of associated

attributes

Example of Star Schema

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 106/157

DateMonthYear

Date

CustIdCustNameCustCityCustCountry

Customer

Sales Fact Table

Date

Product

Store

Customer

unit_sales

dollar_sales

schilling_sales

Measurements

ProductNoProdNameProdDescCategoryQOH

Product

StoreIDCityStateCountryRegion

Store

Example of Star Schema

Dimension Hierarchies

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 107/157

Dimension HierarchiesFor each dimension, the set of associated attributes can

be structured as a hierarchy

storesType

city region

customer city state country

Dimension Hierarchies

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 108/157

Dimension Hierarchies

store storeId cityId tId mgr s5 sfo t1 joes7 sfo t2 freds9 la t1 nancy city cityId pop regIdsfo 1M north

la 5M south

region regId name

north cold regionsouth warm region

sType tId size locationt1 small downtownt2 large suburbs

Snowflake Schema

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 109/157

A refinement of star schema where thedimensional hierarchy is represented explicitly

by normalizing the dimension tables

ProductNoProdName

Product

Example of Snowflake Schema

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 110/157

Sales Fact Table

Date

Product

Store

Customer

unit_sales

dollar_sales

schilling_sales

ProdNameProdDescCategoryQOH

CustIdCustNameCustCityCustCountry

Cust

DateMonth

DateMonth

Year

Month

Year

Year

CityState

City

CountryRegion

Country

StateCountry

State

StoreIDCity

Store

Measurements

Fact constellations

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 111/157

Fact constellations

Fact constellations : Multiple fact tables sharedimension tables

BACK

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 112/157

APPENDIX BData Modeling & OLAP

Multidimensional Data

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 113/157

Model

sale Product Client Amtp1 c1 12

p2 c1 11p1 c3 50p2 c2 8

c1 c2 c3p1 12 50p2 11 8

Fact relation Two-dimensional cube

Sales of products may be represented in one dimension (as a fact relation) or in two dimensions, e.g. : clients and products

Multidimensional Data

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 114/157

Model

sale Product Client Date Amtp1 c1 1 12p2 c1 1 11p1 c3 1 50p2 c2 1 8p1 c1 2 44p1 c2 2 4

day 2 c1 c2 c3

p1 44 4p2 c1 c2 c3

p1 12 50p2 11 8

day 1

Fact relation 3-dimensional cube

Multidimensional Data

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 115/157

Model and Aggregates

Add up amounts for day 1In SQL: SELECT sum(Amt) FROM SALE

WHERE Date = 1

sale Product Client Date Amtp1 c1 1 12p2 c1 1 11p1 c3 1 50p2 c2 1 8p1 c1 2 44

p1 c2 2 4

81result

Multidimensional Data

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 116/157

Model and Aggregates Add up amounts by dayIn SQL: SELECT Date, sum(Amt)

FROM SALE GROUP BY Date

sale Product Client Date Amtp1 c1 1 12p2 c1 1 11p1 c3 1 50p2 c2 1 8

p1 c1 2 44p1 c2 2 4

Date sum1 812 48

result

Multidimensional Data

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 117/157

Model and Aggregates

Add up amounts by client, productIn SQL: SELECT client, product, sum(amt)

FROM SALEGROUP BY client, product

Multidimensional Data

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 118/157

Model and Aggregates

sale Product Client Date Amt

p1 c1 1 12p2 c1 1 11p1 c3 1 50

p2 c2 1 8p1 c1 2 44p1 c2 2 4

sale Product Client Sump1 c1 56p1 c2 4

p1 c3 50p2 c1 11p2 c2 8

Multidimensional Data

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 119/157

Model and Aggregates

In multidimensional data model togetherwith measure values usually we store

summarizing information (aggregates)

c1 c2 c3 Sump1 56 4 50 110

p2 11 8 19Sum 67 12 50 129

Aggregates

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 120/157

Operators: sum, count, max, min,median, ave

“Having” clauseUsing dimension hierarchy

average by region (within store)maximum by month (within date)

Cube Aggregation

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 121/157

gg g

day 2 c1 c2 c3p1 44 4p2 c1 c2 c3

p1 12 50p2 11 8

c1 c2 c3p1 56 4 50p2 11 8

c1 c2 c3sum 67 12 50

sump1 110p2 19

129

. . .Example: computing sums

day 1

Cube Operators

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 122/157

day 2 c1 c2 c3p1 44 4p2 c1 c2 c3

p1 12 50p2 11 8

c1 c2 c3p1 56 4 50p2 11 8

c1 c2 c3sum 67 12 50

sump1 110p2 19

129

. . .

sale(c1,*,*)

sale(*,*,*)sale(c2,p2,*)

day 1

Cube

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 123/157

c1 c2 c3 *p1 56 4 50 110p2 11 8 19* 67 12 50 129day 2 c1 c2 c3 *

p1 44 4 48p2* 44 4 48

c1 c2 c3 *p1 12 50 62p2 11 8 19* 23 8 50 81

day 1

*

sale(*,p2,*)

Aggregation Using

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 124/157

Hierarchies

day 2 c1 c2 c3p1 44 4p2 c1 c2 c3

p1 12 50p2 11 8

day 1

region Aregion Bp1 12 50

p2 11 8

customer

region

country

(customer c1 in Region A;customers c2, c3 in Region B)

Aggregation Using

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 125/157

Hierarchies

c1c2

c3c4

videoCamera

Chennai

Bangalore

CD

Date of sale

10121112

35

711

219715

aggregation withrespect to city

client

city

region

Video Camera CDCH 22 8 30BN 23 18 22

A Sample Data Cube

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 126/157

sum

sum

sum

USA

Canada

Mexico

Countr y

Date

P r o d

u c t

CDvideocamera

1Q 2Q 3Q 4Q

OLAP Servers

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 127/157

Relational OLAP (ROLAP): Extended relational DBMS that maps operations onmultidimensional data to standard relationsoperations

Store all information, including fact tables, asrelations

Multidimensional OLAP (MOLAP): Special purpose server that directly implementsmultidimensional data and operationsstore multidimensional datasets as arrays

OLAP Servers

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 128/157

Hybrid OLAP (HOLAP):Give users/system administrators freedom to selectdifferent partitions.

OLAP Queries

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 129/157

Roll up : summarize data along a dimensionhierarchy

If we are given total sales volume per city we canaggregate on the Location to obtain sales per states

OLAP Queries

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 130/157

c1c2

c3

c4

videoCamera

Chennai

Bangalore

CD

Date of sale

10121112

35

711

219715

aggregation withrespect to city

client

city

region

Video Camera CDCH 22 8 30BN 23 18 22

OLAP Queries

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 131/157

Roll down, drill down : go from higher levelsummary to lower level summary or detailed data

For a particular product category, find the detailedsales data for each salesperson by dateGiven total sales by state, we can ask for sales per city,or just sales by city for a selected state

OLAP Queries

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 132/157

day 2 c1 c2 c3p1 44 4p2 c1 c2 c3

p1 12 50p2 11 8

c1 c2 c3p1 56 4 50p2 11 8

c1 c2 c3sum 67 12 50

sump1 110p2 19

129

drill-down

rollup

day 1

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 133/157

OLAP Queries

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 134/157

Pivoting can be combined with aggregation

sale prodId clientid date amtp1 c1 1 12p2 c1 1 11p1 c3 1 50p2 c2 1 8p1 c1 2 44p1 c2 2 4

day 2 c1 c2 c3p1 44 4p2 c1 c2 c3

p1 12 50p2 11 8

day 1

c1 c2 c3 Sump1 56 4 50 110p2 11 8 19

Sum 67 12 50 129

c1 c2 c3 Sum1 23 8 50 812 44 4 48

Sum 67 12 50 129

OLAP Queries

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 135/157

Ranking: selection of first n elements (e.g. select 5best purchased products in July)Others: stored procedures, selection, etc.

Time functionse.g., time average

Cube Operation

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 136/157

SELECT date, product, customer, SUM (amount)

FROM SALES

CUBE BY date, product, customer

Need compute the following Group-Bys(date, product, customer),(date,product),(date, customer), (product,customer),

(date), (product), (customer)

Cuboid Lattice

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 137/157

Data cube can be viewed as a lattice of

cuboidsThe bottom-most cuboid is the base cube.

The top most cuboid contains only one cell.

(B)(A) (C) (D)

(B,C) (B,D) (C,D)(A,D)(A,C)

(A,B,D) (B,C,D)(A,C,D)

(A,B)

( all )

(A,B,C,D)

(A,B,C)

Cuboid Lattice

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 138/157

city, product, date

city, product city, date product, date

city product date

all

day 2 c1 c2 c3p1 44 4p2 c1 c2 c3

p1 12 50p2 11 8

day 1

c1 c2 c3p1 56 4 50p2 11 8

c1 c2 c3p1 67 12 50

129

use greedyalgorithm todecide whatto materialize

Efficient Data Cube

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 139/157

ComputationMaterialization of data cube

Materialize every (cuboid), none, or some.

Algorithms for selection of which cuboids tomaterialize:

size, sharing, and access frequency :Type/frequency of queriesQuery response timeStorage cost

Update cost

Dimension Hierarchies

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 140/157

Client hierarchy

region

state

city

cities city state regionc1 CA Eastc2 NY Eastc3 SF West

Dimension HierarchiesComp tation

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 141/157

Computation

city, product

city, product, date

city, date product, date

city product date

all

state, product, date

state, date

state, product

state

roll-up along clienthierarchy

Cube Computation - ArrayBased Algorithm

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 142/157

Based Algorithm

An MOLAP approach:the base cuboid is stored as multidimensionalarray.

read in a number of cells to compute partialcuboids

Cube computations

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 143/157

A

C

{ABC}{AB}{AC}{BC}{A}{B}{C}{ }

B

ALL

BACK

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 144/157

APPENDIX CIndex Structures

Inverted Lists

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 145/157

2023

1819

2021

22

232526

r4r18r34r35

r5r19r37r40

rId name ager4 joe 20

r18 fred 20r19 sally 21

r34 nancy 20r35 tom 20r36 pat 25r5 dave 21

r41 jeff 26

ageindex

invertedlists

datarecords

Inverted Lists

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 146/157

Query:Get people with age = 20 and name = “fred”

List for age = 20: r4, r18, r34, r35

List for name = “fred”: r18, r52

Answer is intersection: r18

Bitmap Indexes

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 147/157

Bitmap index: An indexing technique that hasattracted attention in multi-dimensional databaseimplementationtable

Customer City Car c1 Detroit Fordc2 Chicago Hondac3 Detroit Hondac4 Poznan Ford

c5 Paris BMWc6 Paris Nissan

Bitmap Indexes

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 148/157

The index consists of bitmaps:

Index on City:

ec1 Chicago Detroit Paris Poznan1 0 1 0 02 1 0 0 03 0 1 0 04 0 0 0 1

5 0 0 1 06 0 0 1 0

bitmaps

Bitmap Indexes

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 149/157

Index on Car:

ec1 BMW Ford Honda Nissan1 0 1 0 02 1 0 1 03 0 0 1 04 0 1 0 05 1 0 0 06 0 0 0 1

bitmaps

Bitmap Indexes

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 150/157

Index on a particular column

Index consists of a number of bit vectors - bitmapsEach value in the indexed column has a bit vector (bitmaps)The length of the bit vector is the number of recordsin the base tableThe i-th bit is set if the i-th row of the base tablehas the value for the indexed column

Bitmap Index

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 151/157

2023

1819

2021

22

232526

id name age1 joe 202 fred 203 sally 214 nancy 205 tom 206 pat 257 dave 218 jeff 26

ageindex

bitmaps

datarecords

1101100

00

00

10001011

Using Bitmap indexes

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 152/157

Query:Get people with age = 20 and name = “fred”

List for age = 20: 1101100000

List for name = “fred”: 0100000001

Answer is intersection: 0100000000

Good if domain cardinality smallBit vectors can be compressed

Using Bitmap indexes

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 153/157

They allow the use of efficient bit operations to

answer some queries“how many customers from Detroit have car ‘Ford’”

perform a bit-wise AND of two bitmaps: answer – c1“how many customers have a car ‘Honda’”count 1’s in the bitmap - answer - 2

Compression - bit vectors are usually sparse for largedatabases – the need for decompression

Bitmap Index – Summary

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 154/157

With efficient hardware support for bitmap operations(AND, OR, XOR, NOT), bitmap index offers better access methods for certain queries

e.g., selection on two attributes

Some commercial products have implemented bitmapindex

Works poorly for high cardinality domains since thenumber of bitmaps increases

Difficult to maintain - need reorganization whenrelation sizes change (new bitmaps)

Join“Combine” SALE PRODUCT relations

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 155/157

“Combine” SALE, PRODUCT relations

In SQL: SELECT * FROM SALE, PRODUCTsale prodId storeId date amt

p1 c1 1 12p2 c1 1 11p1 c3 1 50p2 c2 1 8p1 c1 2 44p1 c2 2 4

product id name pricep1 bolt 10p2 nut 5

joinTb prodId name price storeId date amtp1 bolt 10 c1 1 12p2 nut 5 c1 1 11p1 bolt 10 c3 1 50p2 nut 5 c2 1 8p1 bolt 10 c1 2 44p1 bolt 10 c2 2 4

Join Indexes

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 156/157

product id name price jIndexp1 bolt 10 r1,r3,r5,r6p2 nut 5 r2,r4

sale rId prodId storeId date amtr1 p1 c1 1 12r2 p2 c1 1 11r3 p1 c3 1 50r4 p2 c2 1 8

r5 p1 c1 2 44r6 p1 c2 2 4

join index

Join Indexes

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 157/157

Traditional indexes map the value to a list of record

ids. Join indexes map the tuples in the join result of two relations to the source tables.

In data warehouse cases, join indexes relate the valuesof the dimensions of a star schema to rows in the facttable.

For a warehouse with a Sales fact table and dimension city, a join index on city maintains for each distinct city a list of RIDs of the tuples recording the sales in the city

Join indexes can span multiple dimensions