8/6/2019 Business Intelligence - Data Warehouse Implementation
http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 1/157
Business
Intelligence &Data Warehousing
ANAND.T,Business Intelligence, Citicards,Tata Consultancy Services Ltd.,
8/6/2019 Business Intelligence - Data Warehouse Implementation
http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 2/157
Lecture I
Basics and Concepts
8/6/2019 Business Intelligence - Data Warehouse Implementation
http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 3/157
Motivation
Aims of information technology:To help workers in their everyday business activity andimprove their productivity – clerical data processingtasks
To help knowledge workers (executives, managers,analysts) make faster and better decisions – decisionsupport systems
Two types of applications:
Operational applicationsAnalytical applications
8/6/2019 Business Intelligence - Data Warehouse Implementation
http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 4/157
The Architecture of Data
Operational data
Metadata
Database schema
Summary data
Business
rules
What’s has beenlearned from data
Logical model
physical layout of data
who,
what,when, where,
summaries
by who,what, when,where,...
8/6/2019 Business Intelligence - Data Warehouse Implementation
http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 5/157
Business Intelligence
“Business Intelligence is a technology basedon customer and profit oriented models thatreduces operating costs and provideincreased profitability by improvingproductivity, sales, service and helps to makedecision making capabilities at no time.”
8/6/2019 Business Intelligence - Data Warehouse Implementation
http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 6/157
BI Cycle
BusinessIntelligence
A N A L Y S I S
INSIGHT
A C T I ON
MEASUREMENT
8/6/2019 Business Intelligence - Data Warehouse Implementation
http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 7/157
Uses of BusinessIntelligence
Operational EfficiencyERP ReportingKPI TrackingProduct ProfitabilityRisk ManagementBalanced ScorecardActivity Based Costing
Global SourcingLogistics
8/6/2019 Business Intelligence - Data Warehouse Implementation
http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 8/157
Uses of BusinessIntelligence
Customer InteractionSales AnalysisSales Forecasting
SegmentationCross-sellingCRM AnalyticsCampaign PlanningCustomer Profitability
8/6/2019 Business Intelligence - Data Warehouse Implementation
http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 9/157
MarketResearch
TelephoneSurveys
OnlineSurveys
FocusGroups
MysteryShopping
CustomPanels
Online FocusGroups
One-on-ones
EnvironmentalScanning
AC NeilsonReports
AssociationStats
GovernmentReports
MediaMonitoring Economic
Reports
SyndicatedStudies
Data Mining
PredictiveModelling
SegmentationMining Customer
Records
POS SystemsCRM
LibrarySciences
CompetitiveIntelligence
InternalScanning
News ScanningServices
Ad Scanning/Tracking Mystery
Shopping
Website
8/6/2019 Business Intelligence - Data Warehouse Implementation
http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 10/157
BI ToolsThese tools will illustrate business intelligence in the areas of customer
profiling, customer support, market research, market segmentation, product profitability, statistical analysis, inventory and distributionanalysis.
8/6/2019 Business Intelligence - Data Warehouse Implementation
http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 11/157
Evolution
60’s: Batch reportshard to find and analyze informationinflexible and expensive, reprogram every new request
70’s: Terminal-based DSS and EIS (executive informationsystems)
still inflexible, not integrated with desktop tools80’s: Desktop data access and analysis tools
query tools, spreadsheets, GUIseasier to use, but only access operational databases
90’s: Data warehousing with integrated OLAP engines andtools
8/6/2019 Business Intelligence - Data Warehouse Implementation
http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 12/157
Data Warehousing Market
Hardware: servers, storage, clientsWarehouse DBMSToolsMarket growing from
$2B in 1995 to $8 B in 1998 (Meta Group)Systems integration & ConsultingAlready deployed in many industries: manufacturing,retail, financial, insurance, transportation, telecom,utilities, healthcare.
8/6/2019 Business Intelligence - Data Warehouse Implementation
http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 13/157
What is a Data
Warehouse“A data warehouse is a subject-oriented,integrated, time-variant, and nonvolatilecollection of data in support ofmanagement’s decision-making process.” ---
W. H. InmonCollection of data that is used primarily inorganizational decision makingA decision support database that is maintained
separately from the organization’s operationaldatabase
8/6/2019 Business Intelligence - Data Warehouse Implementation
http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 14/157
How Many Matches?
8/6/2019 Business Intelligence - Data Warehouse Implementation
http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 15/157
How Many Matches Now?
8/6/2019 Business Intelligence - Data Warehouse Implementation
http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 16/157
Data Warehouse - SubjectOriented
Subject oriented: oriented to the major subjectareas of the corporation that have been definedin the data model.
E.g. for an insurance company: customer, product,
transaction or activity, policy, claim, account, andetc.
Operational DB and applications may be
organized differentlyE.g. based on type of insurance's: auto, life,medical, fire, ...
8/6/2019 Business Intelligence - Data Warehouse Implementation
http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 17/157
Data Warehouse –Integrated
Lack consistency in encoding, namingconventions, …, among different data sourcesHeterogeneous data sources
When data is moved to the warehouse, it isconverted.
8/6/2019 Business Intelligence - Data Warehouse Implementation
http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 18/157
Data Warehouse - Non-
VolatileOperational data is regularly accessed andmanipulated a record at a time, and update isdone to data in the operational environment.
Warehouse Data is loaded and accessed.Update of data does not occur in the datawarehouse environment.
8/6/2019 Business Intelligence - Data Warehouse Implementation
http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 19/157
Data Warehouse - Time
VarianceThe time horizon for the data warehouse issignificantly longer than that of operationalsystems.
Operational data: current value data.
Data warehouse data : nothing more than asophisticated series of snapshots, taken of atsome moment in time.
The key structure of operational data may or may not contain some element of time. Thekey structure of the data warehouse alwayscontains some element of time.
8/6/2019 Business Intelligence - Data Warehouse Implementation
http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 20/157
Why Separate DataWarehouse?
PerformanceSpecial data organization, access methods, andimplementation methods are needed to supportmultidimensional views and operations typical of OLAPComplex OLAP queries would degrade performancefor operational transactions
Concurrency control and recovery modes of OLTPare not compatible with OLAP analysis
8/6/2019 Business Intelligence - Data Warehouse Implementation
http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 21/157
Why Separate DataWarehouse?
FunctionMissing data: Decision support requires historical datawhich operational DBs do not typically maintainData consolidation: DS requires consolidation
(aggregation, summarization) of data from heterogeneoussources: operational DBs, external sourcesData quality: different sources typically use inconsistentdata representations, codes and formats which have to bereconciled.
8/6/2019 Business Intelligence - Data Warehouse Implementation
http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 22/157
Advantages of Warehousing
High query performanceQueries not visible outside warehouseLocal processing at sources unaffected
Can operate when sources unavailableCan query data not stored in a DBMSExtra information at warehouse
Modify, summarize (store aggregates)Add historical information
8/6/2019 Business Intelligence - Data Warehouse Implementation
http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 23/157
Advantages of MediatorSystems
No need to copy dataless storageno need to purchase data
More up-to-date dataQuery needs can be unknownOnly query interface needed at sources
May be less draining on sources
8/6/2019 Business Intelligence - Data Warehouse Implementation
http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 24/157
Requirements for DataWarehousing
Load performanceLoad processingData quality management
Query perfomanceTerabyte scalabilityMass user scalability
Networked data warehouse
Warehouse administrationIntegrated dimensional analysisAdvanced query funtionality
8/6/2019 Business Intelligence - Data Warehouse Implementation
http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 25/157
ExtractTransformLoadRefresh
Data Warehouse
Metadatarepository
Datamartso/p
OLAPserver
OLAP Data miningReports
Operationaldatabases
External datasources
The Architectureof Data Warehousing
8/6/2019 Business Intelligence - Data Warehouse Implementation
http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 26/157
Operationaldata source1
Warehouse Manager
DBMS
Operational
data source 2
Meta-dataHigh
summarized data
Detailed data
Lightlysummarized
data
Operationaldata store (ods)
Operationaldata source n
Archive/backupdata
LoadManager
End-useraccess tools
Typical data warehouse – Three Tier architecture
Operational data store (ODS)
QueryManager
summarizeddata(Relational database)
Summarized data(Multi-dimension database)
Data Mart
(First Tier) (Third Tier)
(Second Tier)
Warehouse Manager
8/6/2019 Business Intelligence - Data Warehouse Implementation
http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 27/157
Data Sources
Data sources are often the operational systems, providing the lowest level of data.
Data sources are designed for operational use, not for decision support, and the data reflect this fact.
Multiple data sources are often from different systems,run on a wide range of hardware and much of thesoftware is built in-house or highly customized.
Multiple data sources introduce a large number of issues -- semantic conflicts.
8/6/2019 Business Intelligence - Data Warehouse Implementation
http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 28/157
Creating and Maintaining
a WarehouseData warehouse needs several tools that automateor support tasks such as:
Data extraction from different external data sources,operational databases, files of standard applications(e.g. Excel, COBOL applications), and other documents (Word, WWW).Data cleaning (finding and resolving inconsistencyin the source data)Integration and transformation of data (betweendifferent data formats, languages, etc.)
8/6/2019 Business Intelligence - Data Warehouse Implementation
http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 29/157
Creating and Maintaininga Warehouse
Data loading (loading the data into the datawarehouse)Data replication (replicating source database into
the data warehouse)Data refreshmentData archivingChecking for data qualityAnalyzing metadata
8/6/2019 Business Intelligence - Data Warehouse Implementation
http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 30/157
Physical Structure of DataWarehouse
There are three basic architectures for constructing a data warehouse:
Centralized
Distributed/FederatedTiered
The data warehouse is distributed for: load balancing, scalability and higher availability
8/6/2019 Business Intelligence - Data Warehouse Implementation
http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 31/157
Physical Structure of DataWarehouse
CentralDataWarehouse
Client Client Client
Source Source
Centralized architecture
8/6/2019 Business Intelligence - Data Warehouse Implementation
http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 32/157
Physical Structure of DataWarehouse
Source Source
EndUsers
MarketingFinancialDistribution
LogicalData
Warehouse
LocalData
Marts
Federated architecture
8/6/2019 Business Intelligence - Data Warehouse Implementation
http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 33/157
Physical Structure of DataWarehouse
PhysicalData
Warehouse
LocalDataMarts
Workstations(highly summarizeddata)
Source SourceTiered architecture
8/6/2019 Business Intelligence - Data Warehouse Implementation
http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 34/157
Physical Structure of DataWarehouse
Federated architectureThe logical data warehouse is only virtual
Tiered architectureThe central data warehouse is physicalThere exist local data marts on different tiers
which store copies or summarization of theprevious tier.
8/6/2019 Business Intelligence - Data Warehouse Implementation
http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 35/157
Want to know more about datawarehousing schemas?
YES NO
8/6/2019 Business Intelligence - Data Warehouse Implementation
http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 36/157
Related Concepts
Decision Support SystemBusiness ModelingOLTP/OLAPData ModelingETLReportingData Mining
8/6/2019 Business Intelligence - Data Warehouse Implementation
http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 37/157
Decision Support System(DSS)One of the powerful tools of BI
Information technology to help knowledge workers(executives, managers, analysts) make faster and better decisions:
what were the sales volumes by region and by product category in the last year?how did the share price of computer manufacturerscorrelate with quarterly profits over the past 10 years?will a 10% discount increase sales volumesufficiently?
8/6/2019 Business Intelligence - Data Warehouse Implementation
http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 38/157
Business Modeling
Depicts the overall picture of a businessSub-categories
Business Process Modeling
Business processes are visually represented as diagrams of simple box with arrow graphicsand text labels
Process Flow Modeling
Describe the various processes that happen in an organization and therelationships between them
Data Flow Modeling
Focuses on the flow of data between various Business Processes
8/6/2019 Business Intelligence - Data Warehouse Implementation
http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 39/157
Business Modeling Tools
8/6/2019 Business Intelligence - Data Warehouse Implementation
http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 40/157
Data Processing ModelsThere are two basic data processing models:
OLTP – Online Transaction ProcessingDescribes processing at operational sitesthe main aim of OLTP is reliable and efficient processingof a large number of transactions and ensuring dataconsistency.
OLAP – Online Analytical ProcessingDescribes processing at warehouse
the main aim of OLAP is efficient multidimensional processing of large data volumes.
8/6/2019 Business Intelligence - Data Warehouse Implementation
http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 41/157
OLTP vs. OLAP OLTP OLAP
Users Clerk, IT professional Knowledge worker Function Day To Day Operations Decision SupportDB Design Application-oriented Subject-orientedData Current, Up-to-date Historical, Summarized
Detailed, Flat Relational MultidimensionalIsolated Integrated, Consolidated
Usage Repetitive Ad-hocAccess Read/Write, Lots Of Scans
Index/Hash On Prim. KeyUnit Of Work Short, Simple Transaction Complex Query# RecordsAccessed Tens Millions#Users Thousands HundredsDB Size 100MB-GB 100GB-TBMetric Transaction Throughput Query Throughput, Response
8/6/2019 Business Intelligence - Data Warehouse Implementation
http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 42/157
OLAP MultidimensionalDatabases
8/6/2019 Business Intelligence - Data Warehouse Implementation
http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 43/157
Data Modeling
A Data model is a conceptual representationof data structures (tables) required for adatabase and is very powerful in expressing
and communicating the businessrequirements.Visually represents
Nature of dataBusiness rules governing the dataOrganization in database
8/6/2019 Business Intelligence - Data Warehouse Implementation
http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 44/157
Data Modeling
Types of data modelingConceptual Data ModelingEnterprise Data Modeling
Logical Data ModelingPhysical Data ModelingRelational Data Modeling
Dimensional Data Modeling
8/6/2019 Business Intelligence - Data Warehouse Implementation
http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 45/157
Data Modeling
MORE
8/6/2019 Business Intelligence - Data Warehouse Implementation
http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 46/157
ETL
ETL stands for Extraction, Transformation ,LoadingSteps involved
Mapping the data between source systems andtarget database (data warehouse or data mart)Cleansing of source data in staging area
Transforming cleansed source data and thenloading into the target system
8/6/2019 Business Intelligence - Data Warehouse Implementation
http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 47/157
ETL Tools
8/6/2019 Business Intelligence - Data Warehouse Implementation
http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 48/157
Reporting
Business Intelligence Reporting Tools providedifferent views of data by pivoting or rotating thedata across several dimensions.
Nowadays all OLAP tools support reporting.Excel sheets and Flat files are the standardreporting mediums.
8/6/2019 Business Intelligence - Data Warehouse Implementation
http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 49/157
Data Mining
Data Mining is a set of processes related to analyzing anddiscovering useful, actionable knowledge buried deep
beneath large volumes of data stores or data setsThis knowledge discovery involves finding patterns or
behaviors within the data that lead to some profitable business actionData Mining Life Cycle
Business problem Analysis
Knowledge DiscoveryImplementationResults Analysis
8/6/2019 Business Intelligence - Data Warehouse Implementation
http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 50/157
Typical Data Warehouse
8/6/2019 Business Intelligence - Data Warehouse Implementation
http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 51/157
Lecture IIDesign and Implementation
8/6/2019 Business Intelligence - Data Warehouse Implementation
http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 52/157
8/6/2019 Business Intelligence - Data Warehouse Implementation
http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 53/157
Database designmethodology for datawarehousesThere are many approaches that offer alternative routes to the
creation of a data warehouseTypical approach – decompose the design of the data warehouseinto manageable parts – data marts, At a later stage, the integration
of the smaller data marts leads to the creation of the enterprise-wide data warehouse.The methodology specifies the steps required for the design of adata mart, however, the methodology also ties together separatedata marts so that over time they merge together into a coherentoverall data warehouse.
8/6/2019 Business Intelligence - Data Warehouse Implementation
http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 54/157
Step 1: Choosing the process
The process (function) refers to the subject matter of a particular data marts. The first data mart to be builtshould be the one that is most likely to be delivered ontime, within budget, and to answer the most commerciallyimportant business questions.The best choice for the first data mart tends to be the onethat is related to ‘sales’
8/6/2019 Business Intelligence - Data Warehouse Implementation
http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 55/157
Step 2: Choosing the grain
Choosing the grain means deciding exactly what a fact table recordrepresents. For example, the entity ‘Sales’ may represent the factsabout each property sale. Therefore, the grain of the‘Property_Sales’ fact table is individual property sale.Only when the grain for the fact table is chosen we can identify thedimensions of the fact table.The grain decision for the fact table also determines the grain of each of the dimension tables. For example, if the grain for the‘Property_Sales’ is an individual property sale, then the grain of the ‘Client’ dimension is the detail of the client who bought a
particular property.
8/6/2019 Business Intelligence - Data Warehouse Implementation
http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 56/157
Step 3: Identifying andconforming the dimensions
Dimensions set the context for formulating queries about thefacts in the fact table.We identify dimensions in sufficient detail to describethings such as clients and properties at the correct grain.If any dimension occurs in two data marts, they must beexactly the same dimension, or one must be a subset of theother (this is the only way that two DM share one or moredimensions in the same application).When a dimension is used in more than one DM, thedimension is referred to as being conformed .
8/6/2019 Business Intelligence - Data Warehouse Implementation
http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 57/157
Step 4: Choosing the facts
The grain of the fact table determines which facts can beused in the data mart – all facts must be expressed at thelevel implied by the grain.In other words, if the grain of the fact table is an individual
property sale, then all the numerical facts must refer to this particular sale (the facts should be numeric and additive).
8/6/2019 Business Intelligence - Data Warehouse Implementation
http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 58/157
Step 5: Storing pre-calculationsin the fact table
Once the facts have been selected each should be re-examined to determine whether there areopportunities to use pre-calculations.
Common example: a profit or loss statementThese types of facts are useful since they are additivequantities, from which we can derive valuableinformation.
This is particularly true for a value that is fundamentalto an enterprise, or if there is any chance of a user calculating the value incorrectly.
8/6/2019 Business Intelligence - Data Warehouse Implementation
http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 59/157
Step 6: Rounding out thedimension tables
In this step we return to the dimension tables and addas many text descriptions to the dimensions as
possible.
The text descriptions should be as intuitive andunderstandable to the users as possible
h h d f
8/6/2019 Business Intelligence - Data Warehouse Implementation
http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 60/157
Step 7: Choosing the duration of the data warehouse
The duration measures how far back in time the fact table goes.For some companies (e.g. insurance companies) there may be alegal requirement to retain data extending back five or moreyears.
Very large fact tables raise at least two very significant datawarehouse design issues:The older data, the more likely there will be problems inreading and interpreting the old filesIt is mandatory that the old versions of the important
dimensions be used, not the most current versions (we willdiscuss this issue later on)
8/6/2019 Business Intelligence - Data Warehouse Implementation
http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 61/157
Step 8: Tracking slowlychanging dimensions
The changing dimension problem means that the proper descriptionof the old client and the old branch must be used with the old datawarehouse schema
Usually, the data warehouse must assign a generalized key to theseimportant dimensions in order to distinguish multiple snapshots of clients and branches over a period of timeThere are different types of changes in dimensions:
A dimension attribute is overwrittenA dimension attribute causes a new dimension record to be created,etc.,
S 9 D idi h
8/6/2019 Business Intelligence - Data Warehouse Implementation
http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 62/157
Step 9: Deciding the querypriorities and the query modes
In this step we consider physical design issues.The presence of pre-stored summaries and aggregatesIndices
Materialized viewsSecurity issueBackup issueArchive issue
D b d i h d l
8/6/2019 Business Intelligence - Data Warehouse Implementation
http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 63/157
Database design methodologyfor data warehouses - summary
At the end of this methodology, we have a design for a data mart that supports the requirements of a
particular business process and allows the easy
integration with other related data marts to ultimatelyform the enterprise-wide data warehouse.A dimensional model, which contains more than onefact table sharing one or more conformed dimension
tables, is referred to as a fact constellation.
8/6/2019 Business Intelligence - Data Warehouse Implementation
http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 64/157
Implementing aWarehouse
8/6/2019 Business Intelligence - Data Warehouse Implementation
http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 65/157
Implementing a Warehouse
Designing and rolling out a data warehouse is acomplex process, consisting of the followingactivities:
Define the architecture, do capacity planning, andselect the storage servers, database and OLAPservers (ROLAP vs MOLAP), and toolsIntegrate the servers, storage, and client tools
Design the warehouse schema and views
8/6/2019 Business Intelligence - Data Warehouse Implementation
http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 66/157
Implementing a Warehouse
Define the physical warehouse organization, data placement, partitioning, and access method
Connect the sources using gateways, ODBC drivers, or other wrappersDesign and implement scripts for data extraction,cleaning, transformation, load, and refresh
8/6/2019 Business Intelligence - Data Warehouse Implementation
http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 67/157
8/6/2019 Business Intelligence - Data Warehouse Implementation
http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 68/157
Implementing aWarehouse
Monitoring: Sending data from sourcesIntegrating: Loading, cleansing, ...Processing: Query processing, indexing, ...Managing: Metadata, Design, ...
8/6/2019 Business Intelligence - Data Warehouse Implementation
http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 69/157
Monitoring
Data ExtractionData extraction from external sources is usuallyimplemented via gateways and standard interfaces(such as Information Builders EDA/SQL, ODBC,JDBC, Oracle Open Connect, Sybase EnterpriseConnect, Informix Enterprise Gateway, etc.)
8/6/2019 Business Intelligence - Data Warehouse Implementation
http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 70/157
Monitoring Techniques
Detect changes to an information source thatare of interest to the warehouse:define triggers in a full-functionality DBMS
examine the updates in the log file
write programs for legacy systems
Polling (queries to source)
screen scraping
Propagate the change in a generic form to theintegrator
8/6/2019 Business Intelligence - Data Warehouse Implementation
http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 71/157
Integration
Integrator Receive changes from the monitorsmake the data conform to the conceptual schema used bythe warehouse
Integrate the changes into the warehousemerge the data with existing data already presentresolve possible update anomalies
Data CleaningData Loading
D t Cl i
8/6/2019 Business Intelligence - Data Warehouse Implementation
http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 72/157
Data Cleaning
Data cleaning is important to warehouse – there is high probability of errors andanomalies in the data:
inconsistent field lengths, inconsistent descriptions,inconsistent value assignments, missing entries andviolation of integrity constraints.optional fields in data entry are significant sourcesof inconsistent data.
D t Cl i T h i
8/6/2019 Business Intelligence - Data Warehouse Implementation
http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 73/157
Data Cleaning Techniques
Data migration : allows simple data transformationrules to be specified, e.g. „replace the string gender
by sex” (Warehouse Manager from Prism is anexample of this tool)
Data scrubbing : uses domain-specific knowledgeto scrub data (e.g. postal addresses) (Integrity andTrillum fall in this category)
Data auditing : discovers rules and relationships by
scanning data (detect outliers). Such tools may beconsidered as variants of data mining tools
D t L di
8/6/2019 Business Intelligence - Data Warehouse Implementation
http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 74/157
Data Loading
After extracting, cleaning and transforming, data must be loaded into the warehouse.Loading the warehouse includes some other
processing tasks: checking integrity constraints,
sorting, summarizing, etc.Typically, batch load utilities are used for loading. Aload utility must allow the administrator to monitor status, to cancel, suspend, and resume a load, and to
restart after failure with no loss of data integrity
d
8/6/2019 Business Intelligence - Data Warehouse Implementation
http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 75/157
Data Loading Issues
The load utilities for data warehouses have to deal withvery large data volumesSequential loads can take a very long time.
Full load can be treated as a single long batchtransaction that builds up a new database. Usingcheckpoints ensures that if a failure occurs during theload, the process can restart from the last checkpoint
D R f h
8/6/2019 Business Intelligence - Data Warehouse Implementation
http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 76/157
Data RefreshRefreshing a warehouse means propagating updateson source data to the data stored in the warehousewhen to refresh:
periodically (daily or weekly)
immediately (defered refresh and immediaterefresh) determined by usage, types of datasource,etc.
D R f h
8/6/2019 Business Intelligence - Data Warehouse Implementation
http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 77/157
Data Refresh
how to refreshdata shippingtransaction shipping
Most commercial DBMS provide replication serversthat support incremental techniques for propagatingupdates from a primary database to one or more
replicas. Such replication servers can be used toincrementally refresh a warehouse when sourceschange
Data Shipping
8/6/2019 Business Intelligence - Data Warehouse Implementation
http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 78/157
Data Shipping
Data Shipping : (e.g. Oracle Replication Server), atable in the warehouse is treated as a remote snapshotof a table in the source database. After_row trigger isused to update snapshot log table and propagate theupdated data to the warehouse
8/6/2019 Business Intelligence - Data Warehouse Implementation
http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 79/157
Transaction Shipping
Transaction Shipping : (e.g. Sybase Replication Server,Microsoft SQL Server), the regular transaction log isused. The transaction log is checked to detect updates onreplicated tables, and those log records are transferred to areplication server, which packages up the correspondingtransactions to update the replicas
D i d D
8/6/2019 Business Intelligence - Data Warehouse Implementation
http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 80/157
Derived Data
Derived Warehouse Dataindexesaggregatesmaterialized views
When to update derived data?The most difficult problem is how to refresh thederived data? The problem of constructing algorithms
incrementally updating derived data has been thesubject of much research!
Materialized Views
8/6/2019 Business Intelligence - Data Warehouse Implementation
http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 81/157
Materialized Views
Define new warehouse relations using SQLexpressions
sale prodId clientid date amtp1 c1 1 12p2 c1 1 11p1 c3 1 50
p2 c2 1 8p1 c1 2 44p1 c2 2 4
product id name pricep1 bolt 10p2 nut 5
joinTb prodId name price clientid date amtp1 bolt 10 c1 1 12p2 nut 5 c1 1 11
p1 bolt 10 c3 1 50p2 nut 5 c2 1 8p1 bolt 10 c1 2 44p1 bolt 10 c2 2 4
join of sale and product
P i
8/6/2019 Business Intelligence - Data Warehouse Implementation
http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 82/157
Processing
Index StructuresWhat to Materialize?Algorithms
I d St t
8/6/2019 Business Intelligence - Data Warehouse Implementation
http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 83/157
Index StructuresIndexing principle:
mapping key values to records for associative directaccess
Most popular indexing techniques in relationaldatabase: B+-treesFor multi-dimensional data, a large number of indexing techniques have been developed: R-trees
I d St t
8/6/2019 Business Intelligence - Data Warehouse Implementation
http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 84/157
Index Structures
Index structures applied in warehousesinverted lists
bit map indexes join indexestext indexes
MORE
What to Materialize?
8/6/2019 Business Intelligence - Data Warehouse Implementation
http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 85/157
What to Materialize?
Store in warehouse results useful for commonqueriesExample:
day 2 c1 c2 c3p1 44 4
p2 c1 c2 c3p1 12 50p2 11 8
day 1
c1 c2 c3p1 56 4 50p2 11 8
c1 c2 c3p1 67 12 50
c1p1 110p2 19
129
. . .
materialize
total sale
View and Materialized
8/6/2019 Business Intelligence - Data Warehouse Implementation
http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 86/157
View and MaterializedViewsView
derived relation defined in terms of base (stored)relations
Materialized viewsa view can be materialized by storing the tuples of the view in the databaseindex structures can be built on the materializedview
View and Materialized
8/6/2019 Business Intelligence - Data Warehouse Implementation
http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 87/157
View and MaterializedViews
Maintenance is an issue for materialized viewsrecomputationincremental updating
8/6/2019 Business Intelligence - Data Warehouse Implementation
http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 88/157
Managing
Metadata Repository
8/6/2019 Business Intelligence - Data Warehouse Implementation
http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 89/157
Metadata Repository
Administrative metadatasource database and their contentsgateway descriptionswarehouse schema, view and derived datadefinitionsdimensions and hierarchiespre-defined queries and reports
data mart locations and contents
Metadata Repository
8/6/2019 Business Intelligence - Data Warehouse Implementation
http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 90/157
Metadata Repository
Administrative metadatadata partitionsdata extraction, cleansing, transformationrules, defaultsdata refresh and purge rulesuser profiles, user groupssecurity: user authorization, access control
Metadata Repository
8/6/2019 Business Intelligence - Data Warehouse Implementation
http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 91/157
Metadata Repository
Businessbusiness terms & definitiondata ownership, charging
Operationaldata layoutdata currency (e.g., active, archived, purged)use statistics, error reports, audit trails
Importance of managing
8/6/2019 Business Intelligence - Data Warehouse Implementation
http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 92/157
Importance of managingmetadata
The integration of meta-data, that is ”data about data”Meta-data is used for a variety of purposes and the management of itis a critical issue in achieving a fully integrated data warehouseThe major purpose of meta-data is to show the pathway back towhere the data began, so that the warehouse administrators know the
history of any item in the warehouseThe meta-data associated with data transformation and loading mustdescribe the source data and any changes that were made to thedataThe meta-data associated with data management describes the dataas it is stored in the warehouseThe meta-data is required by the query manager to generateappropriate queries, also is associated with the user of queries
State of Commercial
8/6/2019 Business Intelligence - Data Warehouse Implementation
http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 93/157
State of CommercialPracticeProducts and Vendors Datamation, May 15, 1996; R.C. Barquin, H.A. Edelstein: Planning
and Designing the Data Warehouse. Prentice Hall. 1997]
Connectivity to sourcesApertus CA-Ingres GatewayInformation Builders EDA/SQLIBM Data JionerInformix Enterprise Gateway Microsoft ODBCOracle Open Connect Platinum InfohubSAS Connect Software AG EntireSybase Enterprise Connect Trinzic InfoHub
Data extract, clean, transform, refreshCA-Ingres Replicator Carleton PassportEvolutionary Tech Inc. ETI-Extract Harte-Hanks TrilliumIBM Data Joiner, Data Propagator Oracle 7Platinum InfoRefiner, InfroPump Praxis OmniReplicatorPrism Warehouse Manager Redbrick TMUSAS Access Software AG SouorcepointSybase Replication Server Trinzic InfoPump
State of Commercial
8/6/2019 Business Intelligence - Data Warehouse Implementation
http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 94/157
State of CommercialPractice
Multidimensional Database EnginesArbor Essbase Comshare Commander OLAPOracle IRI Express SAS SystemWarehouse Data Servers
CA-IngresIBM DB2
Information Builders Focus InformixOracle Praxiz Model 204Redbrick Software AG ADABASSybase MPP TandemTerdata
ROLAP ServersHP Intelligent Warehouse Information Advantage AsxysInformix Metacube MicroStrategy DSS Server
State of Commercial
8/6/2019 Business Intelligence - Data Warehouse Implementation
http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 95/157
State of CommercialPracticeQuery/Reporting Environments
Brio/Query Business ObjectsCognos Impromptu CA Visual ExpressIBM DataGuideInformation Builders Focus SixInformix ViewPoint Platinum Forest & TreesSAS Access Software AG EsperantMultidimensional AnalysisAndydne PabloArbor Essbase Analysis Server Business Objects Cognos PowerPlayDimensional Insight Cross Target Holistic Systems HOLOSInformation Advantage Decision Suite IQ Software IQ/VisionKenan System Acumate Lotus 123Microsoft ExcelMicrostrategy DSSPilot Lightship Platinum Forest & Trees
Prodea Beacon SAS OLAP ++Stanford Technology Group Metacube
State of Commercial
8/6/2019 Business Intelligence - Data Warehouse Implementation
http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 96/157
State of CommercialPractice
Metadata ManagementHP Intelligent Warehouse IBM Data GuidePlatinum Repository Prism Directory Manager
System ManagementCA Unicenter HP OpenViewIBM DataHub, NetView Information Builder Site Analyzer
Prism Warehouse Manager SAS CPETivoli Software AG Source PointRedbrick Enterprise Control and Coordination
Process ManagementAt& T TOPEND HP Intelligent WarehouseIBM FlowMark Platinum Repository
Prism Warehouse Manager Software AG Source PointSystems integration and consulting
Research
8/6/2019 Business Intelligence - Data Warehouse Implementation
http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 97/157
ResearchData cleaning
focus on data inconsistencies, not schema differencesdata mining techniques
Physical Designdesign of summary tables, partitions, indexes
tradeoffs in use of different indexesQuery processing
selecting appropriate summary tablesdynamic optimization with feedbackacid test for query optimization: cost estimation, use of transformations, search strategiespartitioning query processing between OLAP server and backend server.
Research
8/6/2019 Business Intelligence - Data Warehouse Implementation
http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 98/157
Research
Warehouse Managementdetecting runaway queriesresource managementincremental refresh techniques
computing summary tables during loadfailure recovery during load and refreshprocess management: scheduling queries,load and refreshuse of workflow technology for processmanagement
References
8/6/2019 Business Intelligence - Data Warehouse Implementation
http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 99/157
References
www.toug.org/files/tougpr200302_4.pptwww-db.stanford.edu/~hector/cs245/Notes12.pptwww.epa.gov/storet/conf/Wilson_Data_Warehouse.pptwww.learndatamodeling.comwww.learnbi.comwww.datawarehousing.ittoolbox.comwww.datawarehousing.com
8/6/2019 Business Intelligence - Data Warehouse Implementation
http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 100/157
Thank You
QUESTIONS?
8/6/2019 Business Intelligence - Data Warehouse Implementation
http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 101/157
APPENDIX AData warehouse Schemas
Star schema
8/6/2019 Business Intelligence - Data Warehouse Implementation
http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 102/157
saleorderId
datecustId
prodIdstoreIdqtyamt
customer custIdname
addresscity
productprodId
nameprice
storestoreId
city
A single object (fact table) in the middle connected to a number
of dimension tables
Star schema
8/6/2019 Business Intelligence - Data Warehouse Implementation
http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 103/157
customer custId name address city53 joe 10 main sfo81 fred 12 main sfo
111 sally 80 willow la
product prodId name pricep1 bolt 10p2 nut 5
s tore storeId cityc1 nycc2 sfoc3 la
sale oderId date custId prodId storeId qty amto100 1/7/97 53 p1 c1 1 12o102 2/7/97 53 p2 c1 2 11o105 3/8/97 111 p1 c3 5 50
Terms
8/6/2019 Business Intelligence - Data Warehouse Implementation
http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 104/157
TermsBasic notion: a measure (e.g. sales, qty, etc)Given a collection of numeric measures
Each measure depends on a set of dimensions (e.g.sales volume as a function of product, time, andlocation)
Terms
8/6/2019 Business Intelligence - Data Warehouse Implementation
http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 105/157
Relation, which relates the dimensions to themeasure of interest, is called the fact table (e.g.sale)Information about dimensions can be
represented as a collection of relations – calledthe dimension tables (product, customer, store)Each dimension can have a set of associated
attributes
Example of Star Schema
8/6/2019 Business Intelligence - Data Warehouse Implementation
http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 106/157
DateMonthYear
Date
CustIdCustNameCustCityCustCountry
Customer
Sales Fact Table
Date
Product
Store
Customer
unit_sales
dollar_sales
schilling_sales
Measurements
ProductNoProdNameProdDescCategoryQOH
Product
StoreIDCityStateCountryRegion
Store
Example of Star Schema
Dimension Hierarchies
8/6/2019 Business Intelligence - Data Warehouse Implementation
http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 107/157
Dimension HierarchiesFor each dimension, the set of associated attributes can
be structured as a hierarchy
storesType
city region
customer city state country
Dimension Hierarchies
8/6/2019 Business Intelligence - Data Warehouse Implementation
http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 108/157
Dimension Hierarchies
store storeId cityId tId mgr s5 sfo t1 joes7 sfo t2 freds9 la t1 nancy city cityId pop regIdsfo 1M north
la 5M south
region regId name
north cold regionsouth warm region
sType tId size locationt1 small downtownt2 large suburbs
Snowflake Schema
8/6/2019 Business Intelligence - Data Warehouse Implementation
http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 109/157
A refinement of star schema where thedimensional hierarchy is represented explicitly
by normalizing the dimension tables
ProductNoProdName
Product
Example of Snowflake Schema
8/6/2019 Business Intelligence - Data Warehouse Implementation
http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 110/157
Sales Fact Table
Date
Product
Store
Customer
unit_sales
dollar_sales
schilling_sales
ProdNameProdDescCategoryQOH
CustIdCustNameCustCityCustCountry
Cust
DateMonth
DateMonth
Year
Month
Year
Year
CityState
City
CountryRegion
Country
StateCountry
State
StoreIDCity
Store
Measurements
Fact constellations
8/6/2019 Business Intelligence - Data Warehouse Implementation
http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 111/157
Fact constellations
Fact constellations : Multiple fact tables sharedimension tables
BACK
8/6/2019 Business Intelligence - Data Warehouse Implementation
http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 112/157
APPENDIX BData Modeling & OLAP
Multidimensional Data
8/6/2019 Business Intelligence - Data Warehouse Implementation
http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 113/157
Model
sale Product Client Amtp1 c1 12
p2 c1 11p1 c3 50p2 c2 8
c1 c2 c3p1 12 50p2 11 8
Fact relation Two-dimensional cube
Sales of products may be represented in one dimension (as a fact relation) or in two dimensions, e.g. : clients and products
Multidimensional Data
8/6/2019 Business Intelligence - Data Warehouse Implementation
http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 114/157
Model
sale Product Client Date Amtp1 c1 1 12p2 c1 1 11p1 c3 1 50p2 c2 1 8p1 c1 2 44p1 c2 2 4
day 2 c1 c2 c3
p1 44 4p2 c1 c2 c3
p1 12 50p2 11 8
day 1
Fact relation 3-dimensional cube
Multidimensional Data
8/6/2019 Business Intelligence - Data Warehouse Implementation
http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 115/157
Model and Aggregates
Add up amounts for day 1In SQL: SELECT sum(Amt) FROM SALE
WHERE Date = 1
sale Product Client Date Amtp1 c1 1 12p2 c1 1 11p1 c3 1 50p2 c2 1 8p1 c1 2 44
p1 c2 2 4
81result
Multidimensional Data
8/6/2019 Business Intelligence - Data Warehouse Implementation
http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 116/157
Model and Aggregates Add up amounts by dayIn SQL: SELECT Date, sum(Amt)
FROM SALE GROUP BY Date
sale Product Client Date Amtp1 c1 1 12p2 c1 1 11p1 c3 1 50p2 c2 1 8
p1 c1 2 44p1 c2 2 4
Date sum1 812 48
result
Multidimensional Data
8/6/2019 Business Intelligence - Data Warehouse Implementation
http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 117/157
Model and Aggregates
Add up amounts by client, productIn SQL: SELECT client, product, sum(amt)
FROM SALEGROUP BY client, product
Multidimensional Data
8/6/2019 Business Intelligence - Data Warehouse Implementation
http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 118/157
Model and Aggregates
sale Product Client Date Amt
p1 c1 1 12p2 c1 1 11p1 c3 1 50
p2 c2 1 8p1 c1 2 44p1 c2 2 4
sale Product Client Sump1 c1 56p1 c2 4
p1 c3 50p2 c1 11p2 c2 8
Multidimensional Data
8/6/2019 Business Intelligence - Data Warehouse Implementation
http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 119/157
Model and Aggregates
In multidimensional data model togetherwith measure values usually we store
summarizing information (aggregates)
c1 c2 c3 Sump1 56 4 50 110
p2 11 8 19Sum 67 12 50 129
Aggregates
8/6/2019 Business Intelligence - Data Warehouse Implementation
http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 120/157
Operators: sum, count, max, min,median, ave
“Having” clauseUsing dimension hierarchy
average by region (within store)maximum by month (within date)
Cube Aggregation
8/6/2019 Business Intelligence - Data Warehouse Implementation
http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 121/157
gg g
day 2 c1 c2 c3p1 44 4p2 c1 c2 c3
p1 12 50p2 11 8
c1 c2 c3p1 56 4 50p2 11 8
c1 c2 c3sum 67 12 50
sump1 110p2 19
129
. . .Example: computing sums
day 1
Cube Operators
8/6/2019 Business Intelligence - Data Warehouse Implementation
http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 122/157
day 2 c1 c2 c3p1 44 4p2 c1 c2 c3
p1 12 50p2 11 8
c1 c2 c3p1 56 4 50p2 11 8
c1 c2 c3sum 67 12 50
sump1 110p2 19
129
. . .
sale(c1,*,*)
sale(*,*,*)sale(c2,p2,*)
day 1
Cube
8/6/2019 Business Intelligence - Data Warehouse Implementation
http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 123/157
c1 c2 c3 *p1 56 4 50 110p2 11 8 19* 67 12 50 129day 2 c1 c2 c3 *
p1 44 4 48p2* 44 4 48
c1 c2 c3 *p1 12 50 62p2 11 8 19* 23 8 50 81
day 1
*
sale(*,p2,*)
Aggregation Using
8/6/2019 Business Intelligence - Data Warehouse Implementation
http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 124/157
Hierarchies
day 2 c1 c2 c3p1 44 4p2 c1 c2 c3
p1 12 50p2 11 8
day 1
region Aregion Bp1 12 50
p2 11 8
customer
region
country
(customer c1 in Region A;customers c2, c3 in Region B)
Aggregation Using
8/6/2019 Business Intelligence - Data Warehouse Implementation
http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 125/157
Hierarchies
c1c2
c3c4
videoCamera
Chennai
Bangalore
CD
Date of sale
10121112
35
711
219715
aggregation withrespect to city
client
city
region
Video Camera CDCH 22 8 30BN 23 18 22
A Sample Data Cube
8/6/2019 Business Intelligence - Data Warehouse Implementation
http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 126/157
sum
sum
sum
USA
Canada
Mexico
Countr y
Date
P r o d
u c t
CDvideocamera
1Q 2Q 3Q 4Q
OLAP Servers
8/6/2019 Business Intelligence - Data Warehouse Implementation
http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 127/157
Relational OLAP (ROLAP): Extended relational DBMS that maps operations onmultidimensional data to standard relationsoperations
Store all information, including fact tables, asrelations
Multidimensional OLAP (MOLAP): Special purpose server that directly implementsmultidimensional data and operationsstore multidimensional datasets as arrays
OLAP Servers
8/6/2019 Business Intelligence - Data Warehouse Implementation
http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 128/157
Hybrid OLAP (HOLAP):Give users/system administrators freedom to selectdifferent partitions.
OLAP Queries
8/6/2019 Business Intelligence - Data Warehouse Implementation
http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 129/157
Roll up : summarize data along a dimensionhierarchy
If we are given total sales volume per city we canaggregate on the Location to obtain sales per states
OLAP Queries
8/6/2019 Business Intelligence - Data Warehouse Implementation
http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 130/157
c1c2
c3
c4
videoCamera
Chennai
Bangalore
CD
Date of sale
10121112
35
711
219715
aggregation withrespect to city
client
city
region
Video Camera CDCH 22 8 30BN 23 18 22
OLAP Queries
8/6/2019 Business Intelligence - Data Warehouse Implementation
http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 131/157
Roll down, drill down : go from higher levelsummary to lower level summary or detailed data
For a particular product category, find the detailedsales data for each salesperson by dateGiven total sales by state, we can ask for sales per city,or just sales by city for a selected state
OLAP Queries
8/6/2019 Business Intelligence - Data Warehouse Implementation
http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 132/157
day 2 c1 c2 c3p1 44 4p2 c1 c2 c3
p1 12 50p2 11 8
c1 c2 c3p1 56 4 50p2 11 8
c1 c2 c3sum 67 12 50
sump1 110p2 19
129
drill-down
rollup
day 1
8/6/2019 Business Intelligence - Data Warehouse Implementation
http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 133/157
OLAP Queries
8/6/2019 Business Intelligence - Data Warehouse Implementation
http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 134/157
Pivoting can be combined with aggregation
sale prodId clientid date amtp1 c1 1 12p2 c1 1 11p1 c3 1 50p2 c2 1 8p1 c1 2 44p1 c2 2 4
day 2 c1 c2 c3p1 44 4p2 c1 c2 c3
p1 12 50p2 11 8
day 1
c1 c2 c3 Sump1 56 4 50 110p2 11 8 19
Sum 67 12 50 129
c1 c2 c3 Sum1 23 8 50 812 44 4 48
Sum 67 12 50 129
OLAP Queries
8/6/2019 Business Intelligence - Data Warehouse Implementation
http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 135/157
Ranking: selection of first n elements (e.g. select 5best purchased products in July)Others: stored procedures, selection, etc.
Time functionse.g., time average
Cube Operation
8/6/2019 Business Intelligence - Data Warehouse Implementation
http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 136/157
SELECT date, product, customer, SUM (amount)
FROM SALES
CUBE BY date, product, customer
Need compute the following Group-Bys(date, product, customer),(date,product),(date, customer), (product,customer),
(date), (product), (customer)
Cuboid Lattice
8/6/2019 Business Intelligence - Data Warehouse Implementation
http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 137/157
Data cube can be viewed as a lattice of
cuboidsThe bottom-most cuboid is the base cube.
The top most cuboid contains only one cell.
(B)(A) (C) (D)
(B,C) (B,D) (C,D)(A,D)(A,C)
(A,B,D) (B,C,D)(A,C,D)
(A,B)
( all )
(A,B,C,D)
(A,B,C)
Cuboid Lattice
8/6/2019 Business Intelligence - Data Warehouse Implementation
http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 138/157
city, product, date
city, product city, date product, date
city product date
all
day 2 c1 c2 c3p1 44 4p2 c1 c2 c3
p1 12 50p2 11 8
day 1
c1 c2 c3p1 56 4 50p2 11 8
c1 c2 c3p1 67 12 50
129
use greedyalgorithm todecide whatto materialize
Efficient Data Cube
8/6/2019 Business Intelligence - Data Warehouse Implementation
http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 139/157
ComputationMaterialization of data cube
Materialize every (cuboid), none, or some.
Algorithms for selection of which cuboids tomaterialize:
size, sharing, and access frequency :Type/frequency of queriesQuery response timeStorage cost
Update cost
Dimension Hierarchies
8/6/2019 Business Intelligence - Data Warehouse Implementation
http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 140/157
Client hierarchy
region
state
city
cities city state regionc1 CA Eastc2 NY Eastc3 SF West
Dimension HierarchiesComp tation
8/6/2019 Business Intelligence - Data Warehouse Implementation
http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 141/157
Computation
city, product
city, product, date
city, date product, date
city product date
all
state, product, date
state, date
state, product
state
roll-up along clienthierarchy
Cube Computation - ArrayBased Algorithm
8/6/2019 Business Intelligence - Data Warehouse Implementation
http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 142/157
Based Algorithm
An MOLAP approach:the base cuboid is stored as multidimensionalarray.
read in a number of cells to compute partialcuboids
Cube computations
8/6/2019 Business Intelligence - Data Warehouse Implementation
http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 143/157
A
C
{ABC}{AB}{AC}{BC}{A}{B}{C}{ }
B
ALL
BACK
8/6/2019 Business Intelligence - Data Warehouse Implementation
http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 144/157
APPENDIX CIndex Structures
Inverted Lists
8/6/2019 Business Intelligence - Data Warehouse Implementation
http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 145/157
2023
1819
2021
22
232526
r4r18r34r35
r5r19r37r40
rId name ager4 joe 20
r18 fred 20r19 sally 21
r34 nancy 20r35 tom 20r36 pat 25r5 dave 21
r41 jeff 26
ageindex
invertedlists
datarecords
Inverted Lists
8/6/2019 Business Intelligence - Data Warehouse Implementation
http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 146/157
Query:Get people with age = 20 and name = “fred”
List for age = 20: r4, r18, r34, r35
List for name = “fred”: r18, r52
Answer is intersection: r18
Bitmap Indexes
8/6/2019 Business Intelligence - Data Warehouse Implementation
http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 147/157
Bitmap index: An indexing technique that hasattracted attention in multi-dimensional databaseimplementationtable
Customer City Car c1 Detroit Fordc2 Chicago Hondac3 Detroit Hondac4 Poznan Ford
c5 Paris BMWc6 Paris Nissan
Bitmap Indexes
8/6/2019 Business Intelligence - Data Warehouse Implementation
http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 148/157
The index consists of bitmaps:
Index on City:
ec1 Chicago Detroit Paris Poznan1 0 1 0 02 1 0 0 03 0 1 0 04 0 0 0 1
5 0 0 1 06 0 0 1 0
bitmaps
Bitmap Indexes
8/6/2019 Business Intelligence - Data Warehouse Implementation
http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 149/157
Index on Car:
ec1 BMW Ford Honda Nissan1 0 1 0 02 1 0 1 03 0 0 1 04 0 1 0 05 1 0 0 06 0 0 0 1
bitmaps
Bitmap Indexes
8/6/2019 Business Intelligence - Data Warehouse Implementation
http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 150/157
Index on a particular column
Index consists of a number of bit vectors - bitmapsEach value in the indexed column has a bit vector (bitmaps)The length of the bit vector is the number of recordsin the base tableThe i-th bit is set if the i-th row of the base tablehas the value for the indexed column
Bitmap Index
8/6/2019 Business Intelligence - Data Warehouse Implementation
http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 151/157
2023
1819
2021
22
232526
id name age1 joe 202 fred 203 sally 214 nancy 205 tom 206 pat 257 dave 218 jeff 26
ageindex
bitmaps
datarecords
1101100
00
00
10001011
Using Bitmap indexes
8/6/2019 Business Intelligence - Data Warehouse Implementation
http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 152/157
Query:Get people with age = 20 and name = “fred”
List for age = 20: 1101100000
List for name = “fred”: 0100000001
Answer is intersection: 0100000000
Good if domain cardinality smallBit vectors can be compressed
Using Bitmap indexes
8/6/2019 Business Intelligence - Data Warehouse Implementation
http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 153/157
They allow the use of efficient bit operations to
answer some queries“how many customers from Detroit have car ‘Ford’”
perform a bit-wise AND of two bitmaps: answer – c1“how many customers have a car ‘Honda’”count 1’s in the bitmap - answer - 2
Compression - bit vectors are usually sparse for largedatabases – the need for decompression
Bitmap Index – Summary
8/6/2019 Business Intelligence - Data Warehouse Implementation
http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 154/157
With efficient hardware support for bitmap operations(AND, OR, XOR, NOT), bitmap index offers better access methods for certain queries
e.g., selection on two attributes
Some commercial products have implemented bitmapindex
Works poorly for high cardinality domains since thenumber of bitmaps increases
Difficult to maintain - need reorganization whenrelation sizes change (new bitmaps)
Join“Combine” SALE PRODUCT relations
8/6/2019 Business Intelligence - Data Warehouse Implementation
http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 155/157
“Combine” SALE, PRODUCT relations
In SQL: SELECT * FROM SALE, PRODUCTsale prodId storeId date amt
p1 c1 1 12p2 c1 1 11p1 c3 1 50p2 c2 1 8p1 c1 2 44p1 c2 2 4
product id name pricep1 bolt 10p2 nut 5
joinTb prodId name price storeId date amtp1 bolt 10 c1 1 12p2 nut 5 c1 1 11p1 bolt 10 c3 1 50p2 nut 5 c2 1 8p1 bolt 10 c1 2 44p1 bolt 10 c2 2 4
Join Indexes
8/6/2019 Business Intelligence - Data Warehouse Implementation
http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 156/157
product id name price jIndexp1 bolt 10 r1,r3,r5,r6p2 nut 5 r2,r4
sale rId prodId storeId date amtr1 p1 c1 1 12r2 p2 c1 1 11r3 p1 c3 1 50r4 p2 c2 1 8
r5 p1 c1 2 44r6 p1 c2 2 4
join index
Join Indexes
8/6/2019 Business Intelligence - Data Warehouse Implementation
http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 157/157
Traditional indexes map the value to a list of record
ids. Join indexes map the tuples in the join result of two relations to the source tables.
In data warehouse cases, join indexes relate the valuesof the dimensions of a star schema to rows in the facttable.
For a warehouse with a Sales fact table and dimension city, a join index on city maintains for each distinct city a list of RIDs of the tuples recording the sales in the city
Join indexes can span multiple dimensions
Top Related