Post on 09-May-2020
ATHABASCA UNIVERSITY
BUSINESS INTELLIGENCE (BI) APPLICATION
DEVELOPMENT FROM THE OPERATIONAL DATA
BY
AMIT JAIN
A project submitted in partial fulfillment
of the requirements for the degree of
MASTER OF SCIENCE in INFORMATION SYSTEMS
Athabasca, Alberta
December, 2006
© Amit Jain, 2006
ii
iii
DEDICATION
This work is dedicated to the memory of my loving mother Trishala Jain.
iv
ABSTRACT
Businesses are striving for new ways to measure their performance in areas
such as customer satisfaction, customer service, company’s reputation and
corporate goal achievements. Data collected in the company’s operational systems
hold the key to this information. Business intelligence (BI) solutions provide tools and
technologies to explore the trends, patterns and fetch the desired performance
measures from the organization’s operational data.
This essay includes an overview of the benefits of BI to an organization with a
perspective on the historical, ongoing and future developments in this field.
Technological concepts of data warehousing, extract transform load (ETL) process,
Online analytical processing (OLAP) along with analytical tools applied in a typical BI
application have been discussed.
A sample BI project has been documented in the essay to demonstrate the
step by step approach of a BI application development process using different tools
and technologies.
v
ACKNOWLEDGMENTS
I would like to thank the faculty and staff of the School of Computing and
Information Systems at Athabasca University for their expert guidance throughout
this program.
Special thanks to Dr. Kinshuk for supervising this essay and Dr. Kewal
Dhariwal for providing some valuable insight into the subject matter.
Last but not the least, without the encouragement and support of my family,
especially my wife Taruna, this feat would not have been possible.
vi
TABLE OF CONTENTS
CHAPTER I .....................................................................................................1
INTRODUCTION.............................................................................................1
STATEMENT OF PURPOSE.......................................................................2
RESEARCH PROBLEM ..............................................................................3
SIGNIFICANCE ...........................................................................................3
ASSUMPTIONS...........................................................................................4
LIMITATIONS ..............................................................................................5
GLOSSARY OF TERMS..............................................................................5
ORGANIZATION OF THE ESSAY...............................................................7
CHAPTER II ....................................................................................................9
REVIEW OF LITERATURE .............................................................................9
Business Intelligence – amazement factor ...............................................9
Business Intelligence - Opportunities, Limitations and Risks .................10
Enterprise level BI ..................................................................................11
Business Intelligence Application Development Project Lifecycle ..........12
Data Warehouse ....................................................................................13
CHAPTER III .................................................................................................15
OVERVIEW OF THE BI SOLUTIONS...........................................................15
DEFINITION OF BUSINESS INTELLIGENCE...........................................18
HISTORY OF BUSINESS INTELLIGENCE ...............................................19
BENEFITS OF BI TO THE ORGANIZATION.............................................21
COMPONENTS OF A BUSINESS INTELLIGENCE APPLICATION .........23
vii
Data Warehouse ....................................................................................24
ETL (extract-transform-load) Process ....................................................35
OLAP Database .....................................................................................38
User Interface.........................................................................................39
APPLICATIONS OF BUSINESS INTELLIGENCE SOLUTIONS ...............43
STEPS FOR BUILDING A BUSINESS INTELLIGENCE APPLICATION...44
1. Business case assessment ................................................................44
2. Building the data warehouse ..............................................................45
3. Building the ETL process ...................................................................50
4. Building an OLAP database ...............................................................51
5. Implementing user interface ...............................................................52
FUTURE TRENDS IN BUSINESS INTELLIGENCE ..................................52
CHAPTER IV.................................................................................................55
EXAMPLE OF BI APPLICATION DEVELOPMENT.......................................55
INTRODUCTION .......................................................................................55
PROJECT SCENARIO ..............................................................................55
CURRENT ENVIRONMENT......................................................................57
THE SOLUTION ........................................................................................58
BUILDING THE BI APPLICATION FOR CAR-RENTAL INC. ....................59
1. Data warehouse .................................................................................59
2. ETL Process.......................................................................................62
3. OLAP Database .................................................................................64
4. User Interface.....................................................................................66
viii
CHAPTER V..................................................................................................68
CONCLUSIONS AND RECOMMENDATIONS .............................................68
Conclusion .............................................................................................68
Recommendations .................................................................................69
Suggestions for further research ............................................................71
REFERENCES..............................................................................................72
APPENDIX A.................................................................................................75
OPERATIONAL DATABASE SCRIPT FOR THE SAMPLE APPLICATION
.............................................................................................................................75
Script to create operational database objects: .......................................75
APPENDIX B.................................................................................................85
DATA WAREHOUSE CREATION SCRIPT FOR THE SAMPLE
APPLICATION......................................................................................................85
Script to create Data warehouse objects:...............................................85
ix
LIST OF TABLES
Page
Table 1: Comparison of Operational and Informational Systems ............................24
Table 2: Data Warehouse versus Data Mart ............................................................28
x
LIST OF FIGURES
Page
Figure 1: Business Intelligence Solution ..................................................................17
Figure 2: Data flow from Source to End-Users in BI Platform ..................................23
Figure 3: Generic two-level data warehousing architecture .....................................32
Figure 4: Independent data mart data warehousing architecture .............................33
Figure 5: Dependent data mart and operational data store architecture ..................35
Figure 6: ETL Process .............................................................................................36
Figure 7: OLAP Architecture ....................................................................................38
Figure 8: Slicing a data cube....................................................................................41
Figure 9: Example of a drill-down.............................................................................41
Figure 10: Sample of a Fact Table ...........................................................................46
Figure 11: Example of Dimension tables..................................................................47
Figure 12: Sample of a Star Schema .......................................................................49
Figure 13: Operational Database Diagram for Car-Rental Inc. Website...................56
Figure 14: Data Warehouse ER diagram for Car-Rental Inc. Website Reservations61
Figure 15: ETL Tool (Integration Services) from SQL Server BI development studio
..........................................................................................................................62
Figure 16: Data mapping between source and destination inside a SSIS package .64
Figure 17: Data Source View used in the OLAP cube..............................................65
Figure 18: OLAP Cube Browser in SQL Server BI Development Studio..................66
Figure 19: End User OLAP Tool - ProClarity Desktop Professional .........................67
1
CHAPTER I
INTRODUCTION
Business Intelligence (BI) applications play an important role in the strategic
planning and decision support system of a business organization. BI is essentially an
architecture that includes integration of operational systems, decision-support
applications and databases to provide easy access to the knowledge contained in
the business data for business community (Moss & Atre, 2003).
“Business Intelligence (BI) provides an executive with timely and accurate
information to better understand his or her business and to make more informed,
real-time business decisions. Full utilization of BI solutions can optimize business
processes and resources, improve proactive decision making, and maximize profits/
minimize costs.” (Raisinghani, 2004).
BI is also defined as a process involving data enhancement into information
and then into knowledge. BI applications typically support following activities (Moss
& Atre, 2003):
• Multidimensional analysis, e.g., Online analytical processing (OLAP)
• Click-stream analysis
• Data mining
• Forecasting
• Business Analysis
• Balanced scorecard preparation
• Visualization
• Querying, Reporting and charting
2
• Geospatial analysis
• Knowledge management
• Enterprise portal implementation
• Mining for text, content and voice
• Digital dashboard access
• Other cross-functional activities
BI application development project consists of several steps involving various
technologies, such as:
• Data warehouse development
• Extract/Transform/Load (ETL)
• Meta Data Repository
• OLAP Cubes
• Data Mining
• End-User Presentation (self-serve approach)
STATEMENT OF PURPOSE
The purpose of this essay on Business Intelligence is to describe what is BI
and how it is being applied using the operational data in business organizations,
involving integration of various concepts, technologies and tools. A sample BI
application is developed from the operational data to explore the stages,
technologies, activities and strategies of a typical BI application development in
order to understand their opportunities, limitations and risks.
3
RESEARCH PROBLEM
In most organizations the top management have access to the latest business
trends and understand the importance of Business Intelligence. However the
information lies embedded in the operational data within the organization, it’s up to
the existing information technology (IT) staff and analysts to apply these BI concepts
to be able to extend their benefits to the organization. Due to the various tools,
technologies and technical jargons surrounding the BI topic, it generally becomes
overwhelming for a business analyst to understand the overall BI architecture and
for an IT solution developer to create an end-to-end BI solution for a business
scenario. There is lots of information available on BI that covers only specific parts of
a BI solution, e.g., data warehouse development. But it is difficult to find a literature
on BI that ties together all of its tools and technologies from an overall application
development perspective including analytical processing and user presentation
tools. Most of the available information is either more business oriented or pure
engineering approach with emphasis on statistical analysis.
SIGNIFICANCE
Due to the widespread use of information technology, organizations are
experiencing data overload. Only the IT department has unrestricted access to the
operational data and they keep creating reports as per the end-user requests,
however the enterprise data is of utmost importance to the decision makers
throughout the organization and a report for each business scenario is extremely
difficult to create and manage.
4
Hence, the key to taking advantage of the wealth of information hiding in the
enterprise databases is to implement and create business intelligence strategy after
thorough assessment of the efforts involved, costs and ROI for the BI initiative. Once
this is done, business users could be armed to self-serve and pinpoint the required
information for strategic decision-making and to gain competitive advantage.
In today’s highly competitive and increasingly uncertain world, the quality and
timeliness of an organization’s BI capabilities can prove to be a huge factor in not
only profit making but also survival. Liautaud and Hammond (2001) have listed
several advantages of BI for an enterprise:
• Making better and faster decisions - Separation of information gathering from
the decision making process, proactive intelligence
• Balancing the Corporate scorecard
• Lowering costs, increasing revenue, leveraging the investment from the ERP
systems.
• Improved internal communication
• Liberation of operational data from the load of doing day to day reporting and
analysis. This is achieved by loading of data required for reporting and
analytical purposes into the separate data collection areas called data
warehouses.
ASSUMPTIONS
The reader of this essay is assumed to be familiar with the enterprise
database components, objects and RDBMS fundamentals like Normalization, ER
diagrams etc. It is also assumed that reader is conversant with the basic SQL
5
statements used for Data definition, Data manipulation and Data control. An
understanding of the internal structure, systems and business functions of an
organization is also expected.
LIMITATIONS
The essay identifies some open problems with the approaches used in the BI
application development, but does not contribute significantly to their solution. Most
of the BI concepts and technologies discussed in this essay are vendor and platform
independent, however the sample BI application is limited for use with Microsoft SQL
Server 2005 suite of Business Intelligence tools and Windows platform.
GLOSSARY OF TERMS
BPM - Business Process Management is a term that describes activities
and/or events that are performed to optimize a business process.
Software tools called BPM tools aid these activities.
DSS – Decision Support System
Data mining – data mining consists of applications that enable organizations
to make better use of knowledge contained in data by identifying
trends and patterns from the data in data warehouse and data marts.
Data warehouse – an integrated decision support database whose content is
derived from the various operational databases (Hoffer et al., 2002).
Dimensional modeling - dimensional modeling is a technique that is widely
accepted for the data warehouse design process.
EDW – Enterprise Data Warehouse
EIS – Executive Information System
6
ETL – Extract/Transform/Load
ERD – A graphical illustration of an entity-relationship model for business
data.
ERP – Enterprise Resource Planning
Informational systems – Systems designed to support decision making based
on historical point-in-time and prediction data for complex queries or
data-mining applications (Hoffer, Prescott & McFadden, 2002).
KPIs – Key performance indicators are used to assess the present state of a
business and to prescribe a course of action.
Market basket analysis - market basket analysis is a data mining application
that deals with identification of cross selling opportunities, by analyzing
the products that customer tend to buy together.
Normalization – A process of dividing a big complicated database table into
several small tables to reduce data redundancy and improve data
integrity.
ODS – Operational Data Store
OLAP – Online Analytical Processing
OLTP – Online Transaction Processing
Operational data – Volatile current data in the databases used across the
organization by the Operational Systems.
Operational system – A system used to run a business in real time, based on
the current data (Hoffer, 2002).
RDBMS – Relational Database Management System
7
SQL – Structured Query Language
Star schema - star schema is used in creating a dimensional model that
consists of fact tables and dimension tables that are joined to form a
star like structure.
Structured data - data stored in a format that can be used efficiently by a
computer. There is usually a conceptual definition and data type
definition for this kind of data.
Unstructured data – information stored in a data structure, which is inefficient
to read by a machine. Textual data in the form of email, word
document and spread sheets, and image, audio and video are the
examples of unstructured data.
ORGANIZATION OF THE ESSAY
This chapter (chapter I) provided a brief introduction to the essay subject and
defines the scope of this project.
Chapter II includes the review of literature on the various BI methodologies
and solutions available for creating a BI solution.
Chapter III provides an explanation on how different tools and technologies
are integrated to produce a BI solution. Several core BI topics are covered in this
chapter such as Data warehouse development, Extract/Transform/Load (ETL), Meta
Data Repository, OLAP Cubes, Data Mining and End-User Presentation.
Chapter IV includes the documentation and step-by-step explanation of a
sample BI solution for a common business scenario involving a Vehicle Rental
Company. Microsoft SQL Server 2005 database engine, Analysis server, Integration
8
server and evaluation version of ProClarityTM Desktop Professional software have
been used for the creation of the sample BI solution.
The essay finishes with a conclusion, recommendations and suggestions for
further research in Chapter V.
9
CHAPTER II
REVIEW OF LITERATURE
Business Intelligence – amazement factor
Dresner H. coined the term "Business Intelligence" (BI) in 1989 while he was
an analyst at a research company Gartner Inc. While acronyms like DSS (decision
support systems) and EIS (executive information systems) were being widely used
at that time, Dresner wanted a term that would better define the access to and
analysis of quantitative information by a wide variety of users (Martens, 2006).
During an interview with Martens in 2006, Dresner mentions that original term
of BI had refined over the years. Initially some companies tried to relate BI with even
the unstructured information. However it became clear that BI could provide more
value by delivering the structured information to the user, who would not be required
to be an expert in operational research. General improvements in computer
technology have made a big difference in making BI more adoptable, however it’s up
to the end-user to have the business insight and make the most out of the BI tools
and applications made available to her. BI adoption was held up for all these years
due to business culture and internal constraints. Implementation of BI meant that the
right information would be directly available to all the business executives across the
organization. Due to this reason middle management was concerned about losing
their ability to hide negative business trends and highlighting only the positive
developments to the top management.
Dresner (2006) also mentions that almost one third of the BI users are in
finance, and then consumer packaged goods, retail, manufacturing and government.
10
Everyone including health-care and education understands the importance of BI, but
they have limited IT budgets. Geographically BI has been being widely implemented
in North America, Western Europe and Australia. The emerging markets are in Asia-
Pacific, Japan and South America. BI will continue to flourish in future with more
integration with service-oriented architecture (SOA), web services and Business
Process Management (BPM).
Jedras (2006) has highlighted the results of an executive survey by Teradata,
an enterprise data warehousing vendor, which shows that BI is becoming
indispensable to the decision makers all across the enterprise. As per the results
more than forty one percent companies use BI for making more than half of their
decisions. Teradata’s CEO, Mr. Fair, affirms a staggering fact that BI was becoming
an indispensable tool for the decision makers across the organizations. “Executives
are realizing the better job they do to analyze data to serve their customers, the
better differentiated they’ll be,” says Fair (Jedras, 2006). An interesting change from
previous years has been that companies are focusing more on customer loyalty,
company’s reputation among customers and customer service. The challenge facing
the companies is to convert the unstructured data into structured data and
integrating it with the decision making process in order for earlier detection of a
harmful pattern or trend that could offer tremendous monetary gains for the
organization.
Business Intelligence - Opportunities, Limitations and Risks
Raisinghani (2004) has provided a comprehensive overview of the BI field in
their executive’s guide. In addition to the description of BI, the topics cover areas of
11
BI execution and management, major opportunities, limitations, issues and
associated risks. There is significant amount of uncertainty and peril associated with
the executive decision-making that could prove to be disastrous for the company. BI
helps the organizations to reduce the risks and make intelligent decisions. An
intelligent agent model has been presented that can be employed for effective
presentation of key measurements to improve decision making cycle-time and gain
competitive advantage. Knowledge discovery through data mining has been
discussed by the author, which introduces the overall data mining process including
domain analysis, data selection, data preprocessing, transformation and evaluation
of the results in the end. The author has also covered data mining tools,
technologies and applications. Some recommendations have been provided on
system architecture, logical application structure, implementation project integration
with respect to BI and how to configure, improve and maintain the reporting, OLAP
and HOLAP environments. BI techniques such as text mining and transforming
textual patterns into knowledge are discussed.
Enterprise level BI
Biere (2003) has provided a detailed overview of the BI justification, planning
and implementation initiatives from the management’s perspective. The guide takes
an enterprise wide view of the BI, which covers areas such as:
• Setting appropriate expectations and goals for a BI project
• Understanding how the key components of a complete BI solution fit
together.
12
• Designing effective BI solutions including content management, handling
unstructured data and end-user segment.
• Justifying BI solutions by analyzing its ROI based on the true cost of a BI
development project that could include hiring of BI experts and purchasing
proprietary software packages. This would also include meeting between
end-users and IT to establish the goals of a BI project.
• Product selection, solution design, deployment and providing effective
support for BI end users while maximizing the ROI throughout the project
lifecycle.
• Corporate performance management (CPM).
• Preview of the future BI technologies.
• BI project planning checklist.
Business Intelligence Application Development Project Lifecycle
Moss and Atre (2003) developed a step-by-step guide for the complete BI
project lifecycle, with details on the complexity of such applications. This book
provides an excellent overview of all the engineering stages, development steps,
human resources allocation, activity dependency matrix, task/subtask matrix and
guidelines for a BI application development project. While this literature is more
oriented towards a BI project management, it also provides some useful technical
insights into Database design, ETL design, Meta data repository design, data
mining, OLAP tools and application development process.
13
Data Warehouse
Data warehousing is one of the most important components of a BI
application, hence it is extremely important to have a deep understanding of this
topic for this essay. There are several useful resources available for studying the
concept of data warehousing.
Hoffer (2002) defined a data warehouse as “a subject-oriented, integrated,
time-variant, non-updateable collection of data used support of management
decision-making processes and business intelligence”. Hoffer (2002) has provided a
useful resource as a starting point for some basic data warehousing concepts such
as its architecture, ETL, star schema, market basket analysis and user interface with
appropriate examples.
Kimball and Ross (2002) provided a Toolkit book on Data warehousing using
dimensional modeling techniques. The authors argue that dimensional modeling is
the only coherent modeling architecture for building distributed data warehouse
systems. Dimensional modeling also helps in simplifying the overall design to help
the users understand the database design easily and while building efficient BI
application software at the same time. There are some classic case studies involving
data warehousing with dimensional modeling in retail sales, inventory, procurement,
order management, customer relationship management (CRM), accounting, human
resource management, financial services, telecommunication and utilities functions.
As a complement to dimensional modeling techniques, Adamson and
Venerable (1998) provided a further detailed insight into building a data warehouse.
In order to explain the data warehouse design method, they have provided specific
14
information related to fundamental business description, requirements,
understanding and reporting in sales, marketing, production, budgets, financial
reporting, profitability and intellectual capital areas. Authors have also covered the
areas of key measures and ratios, presenting information to the end user and
building an enterprise data warehouse. The authors agree that the key to business
intelligence lies in the effective data warehouse design and the presentation of the
information to the user in a form that reflects the way management wants to analyze
their business processes.
15
CHAPTER III
OVERVIEW OF THE BI SOLUTIONS
Over the last few decades, businesses around the world have grown beyond
the boundaries of the nation and become global in existence. To survive and grow in
this competitive world, businesses started looking for alternative ways in addition to
profits to measure their success and long-term goals. Customer loyalty, customer
service and company’s reputation are of increasing importance on the corporate
goals and agendas (Jedras, 2006).
Businesses can identify their key performance indicators (KPIs) in order to
assess their achievements with respect to the target performance in important areas.
For example, if a business is trying to improve customer satisfaction, they can
concentrate on several KPIs like order cancellation, late shipment, incomplete order
shipment and returns (Wu, 2002). BI solutions provide powerful tools to visualize the
KPIs. Using visualization business executives can quickly identify trends and keep
track of metrics (Wu, 2002).
Analyzing the performance at the end of the year is not acceptable in this fast
paced economy, as organizations want to be proactive rather than reactive to stay
ahead of the competition. Businesses depend on the timely and accurate analysis of
the information hidden in their day-to-day operational data in order to measure
performance and make better-informed decisions. An analysis of this sort would
mean lots of mathematical calculations and identification of patterns that an average
human mind simply can’t explore.
16
Corporate IT departments have spent countless hours in developing never-
ending list of reports for the end users. But due to the several reasons listed below,
IT could never satisfy the reporting requirements of the business users:
• Lack of business expertise in IT department to be able to fully
understand the requirement.
• Long development cycle resulting in changed business scenario and
requirements by the time report is delivered.
• Clash of interest between IT and other business functions. It is in the
best interest of IT departments to safe guard the data not only from
the external intruders but also from within the organization to ensure
successful operation of the company.
Reports running directly from the operational data could also slow down the
core business applications causing delay in business functions and eventually more
headaches for IT. However, this approach was detrimental to the organizations
because executives need to have flexible and unrestricted access to measure
business performance data from several organizational perspectives (Biere, 2003).
Business intelligence solutions provide ways to explore the trends, patterns
and present the information from the organization’s existing data while keeping the
data safe and secure with proper security measures. BI is all about providing easy,
timely, flexible and good quality information directly to the business executive to help
them make informed business decisions with reduced amount of uncertainty and
risk.
17
Biere (2003) introduced BI as mostly about querying, reporting, math and
doing difficult calculations, in order to improve the awareness of the critical business
activity to the decision makers and effective intercommunication within the
enterprise.
Figure 1: Business Intelligence Solution (all-bi.com, 2006)
BI solution is not an off-the-shelf product, it has to be designed for your
business and management needs using the IT infrastructure as illustrated in figure 1.
Each business is different and BI initiatives can be really expensive to undertake, so
it takes serious and conscious effort from the corporation to develop a BI strategy
based on their own unique business requirement along with the supporting
technology (Wu, 2001).
Although BI solutions help in presenting the information in most intuitive and
flexible way to support the decision-making, they don’t really take away the reliance
on human mind to comprehend the results.
18
DEFINITION OF BUSINESS INTELLIGENCE
There are several versions of BI definition; Biere (2003) defined it as the
conscious, methodical transformation of data from any and all data sources into new
forms to provide information that is business driven and results oriented.
Moss and Atre (2003) pointed out that BI is neither a product nor a system. BI
is essentially an architecture that includes integration of operational systems,
decision-support applications and databases to provide easy access to the
knowledge contained in the business data for business community.
BI concept is simply to make use of the data already available to your
company from internal and externally published sources to help decision makers
make better and faster decisions.
Following are some common characteristics of BI solutions based on the
various definitions and discussions on the topic:
• BI solutions consist of several elements such as databases, end-user
tools, and security.
• BI solutions should not negatively impact the day-to-day operational
systems.
• BI applications involve setting up and running the processes for data
transformation into knowledge.
• BI applications have to be designed as per the requirements of the
business users.
• BI applications include some serious mathematical calculations using
aggregates and functions taking full advantage of the modern
19
computing systems to do query and reporting, statistical analysis and
forecasting.
• BI applications include security measures for information control.
• BI solutions provide an easy to use interface directly for the end user to
get business information in timely and efficient manner.
HISTORY OF BUSINESS INTELLIGENCE
Before the introduction of computing in businesses, business owners used to
rely on their gut-feeling and past experience to make business decisions. With the
introduction of the automation of business processes with computer systems and
databases, businesses started having access to massive amounts of data. Due to
lack of infrastructure for data exchange and incompatibilities between systems,
report generation and analysis some times used to take months (Wikipedia.org,
2006). Reports were created with significant involvement of IT programmers. There
was an increasing demand for query and reporting tools on this operational data by
the end users or non-IT persons.
During 1970s several vendors started offering tools for end users to directly
access the data and do the analysis. But there were some problems with these early
solutions, as listed below, mainly because of the lack of a strong technological
concept of data storage such as RDBMS (Relational Database Management
System), which was yet to be established (Biere, 2003).
• Vendors had no option but to offer their own proprietary data storage
as a middle layer between original data and end user for optimized
reporting.
20
• IT had still to be involved to move the data from the original sources.
With the advent of personal computers, client/server systems and wide
spread use of open standards in data storage systems such as RDBMS in 1980s
there were several changes on the BI front. Along with client/server solutions,
vendors started offering data analysis tools based on common language SQL
(Structured Query Language) and a common RDBMS platform. These open
standards were the result of cooperation among vendors that offered numerous BI-
related benefits (Biere, 2003). The analysis tools started supporting various types of
RDBMS databases irrespective of the vendors. Skills in relational database
technology could be reused among different systems. The data started being stored
in a more BI-adaptive relational format of forms and reports.
In 1989, Dresner H. coined the term "Business Intelligence" (BI) while he was
an analyst at a research company Gartner Inc. Acronyms like DSS (decision support
systems) and EIS (executive information systems) were being widely used at that
time to define these kind of systems, but Dresner wanted a term that could better
define the access to and analysis of quantitative information by a wide variety of
users (Martens, 2006).
However, with the introduction of these new tools for extracting information
directly from the original source or operational databases, businesses started to
experience other limitations (Biere, 2003):
• Operational data could consist of anomalies, which were brought back
to the end users running their analysis tools.
21
• There was no solution for the complexities and data volume involved in
integration of the disparate data sources required for the BI solutions.
• Data validation and accuracy were getting ignored.
• Running the analytical tools on the operational data was causing
performance issues for core business operating systems.
Due to these problems there was a primary need felt for making the operating
data independent from the data used for analytical processing. These limitations
were overcome by the introduction of the Data warehousing concept during 1990s,
which eventually became an integral part of the BI solutions. Data warehousing
combined all the steps required for collection of data from various sources, data
transformation, and data validation, and finally storing it separately into a format
more conducive to analysis. Implementation of data warehouses brought along
some issues such as system performance and bandwidth issues during population
of large amounts of data into the warehouse. This gave impetus to the advancement
in computer hardware and networking techniques to overcome these issues.
BENEFITS OF BI TO THE ORGANIZATION
In today’s highly competitive and increasingly uncertain world, the quality and
timeliness of an organization’s BI capabilities can prove to be a huge factor in not
only profit making but also survival (Liautaud & Hammond, 2001). Following are
some of the advantages of BI for an enterprise:
• BI involves integration of data from various internal and external sources. This
results in improved communication and knowledge exchange between
departments while coordinating business activities. Due to this improved
22
coordination, organizations are able to adjust quickly to changes in financial
conditions, customer preferences and supply chain operations (Wikipedia.org,
2006).
• BI makes the key business information available to the decision makers in the
most efficient and timely manner. Hence business users are able to make
better and faster decisions with reduced amount of guesswork and risk
(Raisinghani, 2004).
• BI allows companies to be proactive rather than reactive by using forecasting
and trend analysis techniques (Liautaud & Hammond, 2001).
• BI provides competitive advantage to the businesses by identifying trends and
problems ahead of the competition (Wikipedia.org, 2006).
• BI helps in improving customer experience, by identifying market trends and
responding quickly to changing customer requirements (Olszak & Ziemba,
2006).
• Balancing of the corporate scorecard is made possible by realization of an
enterprise’s strategy, mission, goals and tasks through BI applications
(Olszak & Ziemba, 2006).
• ERP systems allowed the organizations to centralize data and eliminate the
inconsistencies and inefficiencies of working with standalone departmental
systems (Cognos, 2004). Using BI applications organizations can look for
leveraging their existing investments in the ERP systems to full potential in
lowering costs and increasing revenue (Liautaud & Hammond, 2001). In a
study on “Operational Performance Management” in 2003, Ventana Research
23
found that deploying BI over ERP and Application Servers for measuring and
monitoring business activities and processes was rated at 93 percent
importance (Cognos, 2004).
• BI includes separation of information gathering from the decision making
process, which streamlines the processes and improves the operational
systems performance by liberating it from the load of doing analysis (Liautaud
& Hammond, 2001).
COMPONENTS OF A BUSINESS INTELLIGENCE APPLICATION
As discussed earlier, a BI application contains several tools and technologies
under its umbrella. A typical BI application broadly consists of following four
components illustrated in figure 2 (Biere, 2003; Hancock & Toren, 2006):
• Data Warehouse
• ETL (extract-transform-load) Process
• OLAP Database
• User Interface
Data
Mart
Figure 2: Data flow from Source to End-Users in BI Platform (Hancock & Toren, 2006)
24
Data Warehouse
Modern organizations are experiencing data overload due to wide spread
computerization of operations and are looking for ways to explore this wealth of
information lying in the operational systems. Most of the systems are designed to
run the day-to-day business operations doing transactional processing, capturing
events, storing and manipulating the data. Sets of these operational systems that
capture the detailed information of the individual business events are called
transactional systems or online transaction processing (OLTP) systems. While these
OLTP systems are well designed and optimized to handle business events, they are
ill equipped for analytical processing requirements, i.e., not able to answer
management’s questions on the overall business trend, volumes, best selling
products, regional performance and so on. This creates an informational gap
between operational processing and informational processing. Data warehouses are
designed to bridge this gap by merging information from various sources and storing
it in the most optimized manner for helping decision support systems for the overall
business process (Adamson & Venerable, 1998).
Table 1 provides a comparison of operational and informational systems from
many different points of view.
Table 1: Comparison of Operational and Informational Systems (Hoffer et al., 2002)
Characteristic Operational Systems Informational Systems
Primary Purpose Run the business on current
basis
Support managerial decision
making
Type of data Representing current state of Historical point-in-time
25
business snapshots and predictions
Primary Users Operational staff such as
clerks, salespersons etc.
Managers, business analysts,
customers
Scope of usage Well defined, planned and
simple updates
Broad, ad hoc, complex queries,
aggregation and analysis
Design goal Performance throughput,
availability
Ease of access, flexibility and
use
Volume High, constant updates and
queries on few table rows
Periodic batch updates and
queries on many or all rows
Data warehousing concepts have evolved to serve following business issues
(Humphries, Hawkins & Dy, 1998):
1. Operational Systems Fail to Provide Decisional Information: Data
required for analysis are often scattered across the operational
systems and mostly in a volatile state supporting the ongoing business
transactions. It takes significant amount of resources to produce ad
hoc reports that are eventually found to be inconsistent, inaccurate, or
obsolete.
2. Decisional Requirements Cannot Be Fully Anticipated: Business
situations keep on changing, therefore it’s impossible for IT to generate
reporting for every scenario. Decision makers should be able to review
enterprise data from different angles and at different levels of detail to
find and address business problems as the problems arise.
26
As per the studies, almost all of the large organizations already have a data
warehouse or already in process of developing it. Information Technology Toolbox,
Inc. conducted a Data Warehouse Survey sponsored by Oracle in 2005. Survey
results indicated that 91% of the respondent organizations were utilizing a data
warehouse. Over 43% expected at least a 10% increase in spending on data
warehouses in the next 12 months. A large majority of respondents, 64%, indicated
that they currently had a data warehouse management system in place (ITtoolbox,
2006).
Before getting into more details of data warehouse it is important to list the
goals of a data warehouse development (Humphries et al., 1998):
• To make an organization’s information easily assessable to the
business users with proper security measures applied to the data
warehouse and/or front-end application.
• To present one common source of information in consistent manner
with quality assurance of the data collected from various operational
sources.
• Accurate recording of historical data to support quick analysis of
company’s performance.
• To be able to slice and dice through the data dynamically to present
information from different angles and depth of details.
• Separate transactional processing from decision support systems to
improve overall systems performance. This also allows the system to
be more adaptive and resilient to change.
27
Ralph Kimball and Bill Inmon are known as the father of Data warehousing.
Both have different design approaches to building a data warehouse. The Inmon
data warehouse design model is based on the data stored in most atomic and
normalized format in a main data warehouse. This data is aggregated and made
available across the enterprise through exploration warehouses, data mining
warehouses and OLAP databases (Drewek, 2005). Kimball popularized the concept
of dimensional modeling as a key technique in data warehouse building process
(Humphries et al., 1998). Kimball approach states, ”the data structures required prior
to dimensional presentation depend on the source data realities, target data model
and anticipated transformation” (Drewek, 2005). Hence the Kimball approach does
not require a normalized data structure prior to loading the dimensional tables
(Drewek, 2005). Different kinds of data warehousing architectures have been
discussed later in this topic.
Several elements combine together to form data warehouse architecture.
Following is a brief introduction to some of the important elements of a data
warehouse (Kimball & Ross, 2002):
• Operational Source Systems
These are all transaction (OLTP) based internal and external business
systems that are used to run the organization. They are also referred
to as Source Data Systems. As explained earlier, they contain current
state of business with minimal amount of historical information.
• Data Staging Area
28
It is both a storage area and a set of processes for extract-transform-
load (ETL) of the data, as explained later in this topic. Data staging
area is used to collect data from various operational sources and then
transform to make it suitable for loading into the data warehouse. Data
staging area is never used for end user interaction for querying or
reporting purposes.
• Data Mart
Data mart is described as a mini-data warehouse, i.e., a data
warehouse with limited scope. Data mart follows a ”bottom-up”
approach, where specific requirements of a particular business
function or problem, e.g., sales, returns etc. are established first to
create data marts and then marts are rolled up later into data
warehouse (Biere, 2003). Contents of a data mart are obtained either
directly from operational sources or from the data warehouse. Table 2
shows a comparison between Data warehouse and Data mart:
Table 2: Data Warehouse versus Data Mart (Hoffer et al., 2002; Biere, 2003)
Characteristic Data Warehouse Data Mart
Approach Top-down Bottom-up
Cost Significant amount of
time & effort
Low in comparison
Development Slow Faster
Scope Application independent
and Enterprise-wide
Specific DSS application
and functional area
29
specific
Data Lightly denormalized Highly demoralized
Subjects Multiple subjects One central subject of
user concern
Sources Operational data
sources
Operational data
sources or Data
Warehouse
Others • Flexible
• Data-oriented
• Long life
• Large
• Single complex
structure
• Restrictive
• Project-oriented
• Short life
• Starts small
• Multi, semi-complex
structure
• Meta data
Meta data is the data that describe the properties and characteristics of
the data warehouse. Meta data is similar to an encyclopedia of the
information contained in the data warehouse (Kimball & Ross, 2002).
Humphries et al. (1998) described three types of Meta data for a data
warehouse:
1. Administrative Meta data includes information about the data
sources, source data contents, data warehouse objects and
business rules used for data transformation from the sources
30
into the data warehouse. This type of meta data contains
description of source databases, source-to-target field mapping,
warehouse schema design, warehouse back-end data structure,
warehouse back-end tools or programs, warehouse
architecture, business rules and policies, security authentication
rules and units of measure.
2. End-user Meta data is used to describe the definitions of the
warehouse data descriptions and any hierarchies that may exist
within the various dimensions. End users can use this
information to generate their queries and understand the results.
Examples of end-user meta data includes information about
warehouse contents, predefined queries and reports, business
rules and policies, hierarchy definitions, status information, data
quality, warehouse load history and warehouse data purging
rules.
3. Optimization Meta data are used to help with the optimization of
the data warehouse design and performance. Examples of such
meta data are aggregate definitions and query statistics
collections:
- Aggregate definitions include the documentation on the
warehouse aggregates in the Meta data repository.
Front-end tools with aggregate navigation capabilities
use this type of Meta data to work properly.
31
- Collection of query statistics is helpful to track the types
of queries that are made against the warehouse. Data
warehouse administrators can use this information for
database optimization and tuning. It also helps to identify
any warehouse data that is not being used.
Data warehouse elements described above combine to form the data
warehouse architectures. There are three main types of data warehouse
architectures starting with a basic two-level architecture, a three-level architecture
used in complex environments, and the three-level data architecture that is
associated with a three-level physical architecture (Hoffer et al., 2002).
• Generic Two-Level Architecture – there are four basic steps involved in
this generic architecture as shown in the figure 3. During this process,
data from various internal and external sources is extracted to the data
staging area where it is processed and exported to data warehouse.
Extraction and data loading happens on a periodic basis. Users can
access the data warehouse through various means, such as query tools,
report writers and analytical applications.
32
Figure 3: Generic two-level data warehousing architecture (Hoffer et al., 2002)
• Independent Data Mart Data Warehousing Environment - In this type of
architecture, several independent data marts are created directly from the
operational data as illustrated in figure 4. This type of architecture serves
specific to a user or functional groups within the organization, e.g., sales
data mart, a supply chain data mart, etc. This kind of architecture with
small subsets of data is easier, faster, costs less to develop and provide
quicker results than having a large single data warehouse serving the
whole organization.
However there have been several limitations to this type of design (Hoffer,
et al., 2002):
- Separate ETL process for each data mart results in duplication of data
and efforts.
33
- Independent data marts are unable to provide an enterprise wide clear
view of the analysis issues, which could be cross-functional in nature,
such as customers and products.
- One may argue that it is possible to join tables across data marts but
that would slowdown the analysis process and data marts may also be
out of sync with each other.
Figure 4: Independent data mart data warehousing architecture (Hoffer et al., 2002)
Concept of independent data mart has been a topic of debate among
researchers with the ones supporting it projecting the strategy of
incremental development approach of decision support systems rather
than investing massive amount of time, effort and money in developing an
enterprise wide data warehouse. The ones against it propose for having a
more suitable architecture for in-depth enterprise-wide business analysis
right in the beginning, which is the whole point of doing this exercise.
34
There are some risks and limitations of implementing an independent data
mart (Hoffer et. al., 2002):
- Data redundancy and duplication of effort due to the requirement of
separate ETL process for each data mart.
- Inconsistencies among data marts make it difficult to have a clear
enterprise wide view of the common data subjects such as customers
and products.
- Data analysis at a detailed level requiring drill down capabilities would
be difficult, since the required data may be distributed among different
data marts.
• Dependent Data Mart and Operational Data Store (ODS) Architecture –
Limitation of independent data mart architecture is addressed by using
dependent data mart and operational data store architecture shown in
figure 5. There is only ETL process that loads one central data
warehouse, which also solves the problem of data getting out of sync
across data marts. To provide an in-depth view with drill-down capabilities
of the related information across the enterprise, an operational data store
(ODS) component is added to this architecture.
35
Figure 5: Dependent data mart and operational data store architecture (Hoffer et al.,
2002)
ODS is specified as an integrated, subject oriented, updateable, current-
valued, detailed copy of the operational database designed to support
operational users for their reporting and decision-making applications
(Hoffer et al., 2002; Kimball & Ross 2003). ODS helps in providing a
detailed view of the information at the enterprise level including
normalized current data from various sources, supporting majority of user
requirements. Kimball and Ross (2003) however cautioned at not having
the additional burden of having a third physical system in the form of ODS
unless necessary due to business needs.
ETL (extract-transform-load) Process
ETL processes deal with collection of data from various operational sources
on variety of platforms and merging this data into a format suitable for the BI target
36
databases in BI decision support environment as shown in figure 6 (Moss and Atre,
2003).
Figure 6: ETL Process (data-warehouses.net, 2006)
The goal of ETL process is to produce data that is detailed, historical,
normalized, comprehensive, timely, and quality controlled, to be able to support
decision making (Hoffer et al., 2002).
Moss and Atre (2003) have specified three possible stages of the ETL
programs:
1. Initial Load – for the population of the BI target databases for the first time
from operational sources.
2. Historical Load – an extension of the initial population of BI target
databases with archived historical data from offline storage devices.
3. Incremental Load – Ongoing population of BI target databases with
current operational data.
37
Extract process collects the required data from source files and databases to
populate the EDW. Initial and historical extracts are just one-time functions.
Incremental load is an ongoing process, which can be accomplished in two ways,
extract all records at a point in time called static extract or capturing only the
changes in the source data called Incremental extract (Hoffer et al., 2002). Due to
the volume of data involved, incremental extract is more suitable for data extraction.
Transformation process deals extensively with the data integration and data
cleansing, which consist of almost 80 percent of the overall ETL work. Source data
comes with several problems such as (Moss and Atre, 2003):
• inconsistencies in primary keys;
• inconsistencies data values causing duplicate data;
• different data formats used for date and currency fields across different
sources;
• inaccurate data values e.g., invalid dates, SIN, mismatching between
address and area codes; and
• synonyms and homonyms causing data redundancy and confusion in
naming the fields.
Practice of data conversion is still weak in the industry, leading to data and
information quality issues. Data transformation is the most difficult and challenging
process that deals with conversion of the data from the operational sources to the
required data format of the data warehouse. To improve the data quality Data
scrubbing (or data cleansing) technique may also be employed during this process.
38
It uses pattern recognition and other artificial intelligence techniques to improve the
quality of data before transformation (Hoffer et al., 2002).
Loading process uses refresh or update mode for loading the data into target
data warehouses. Refresh mode utilizes bulk rewriting of the data at periodic
intervals; therefore it is more resource intensive and used only during initial load.
Update mode is a more preferred approach for ongoing maintenance, where only
the changes in source data are written to the data warehouse. This process also
includes creating the necessary indexes to organize the data warehouse data for
speedy access (Hoffer et al., 2002).
There are many software tools available in market to support the ETL
activities, e.g., ActaWorks from Acta Technologies, AutoImport from White Crane
Systems, Data Migration Tools from Friedman & Associates, ETL Manager from
iWay Software and SpeedLoader from Benchmark Consulting. Scalzo (2003)
however recommended developing a custom ETL because ETL tools generally do
not produce optimally efficient code, costs more money than the time saved and
there are just too many of them to choose from.
OLAP Database
A typical OLAP architecture shown in figure 7 consists of an OLAP database
server that lies between the data warehouse and the user.
Figure 7: OLAP Architecture (Todman, 2000)
39
An OLAP database consists of one or more cubes. A cube consists of data
from data warehouse tables and presents information to the users in the form of
measures and dimensions. A real-life cube consists only three dimensions, but data
structure of an OLAP cube allows numeric measures to be analyzed across many
different dimensions. An OLAP cube is loaded and processed periodically with the
data from the data warehouse. OLAP Cubes are queried directly by the user
interface tools to do the multidimensional analysis. Queries against an OLAP cube
returns the result in a matter of seconds, including the ones that summarize years of
history and huge amounts of transactions. OLAP cubes are compatible with the
interactive user interface tools that provide drill-down and slicing of the information in
split seconds. OLAP cubes achieve such great performance by calculating and
storing the data aggregates in advance while being processed with the data from
data warehouse. An example of an aggregate is a set of totals by product group and
month. When a query is executed, the OLAP database engine uses the appropriate
available aggregate or it sums up the detailed records on the run (Hancock & Toren,
2006).
User Interface
User interface consists of analytical tools for accessing and analyzing data
from data warehouses and data marts. There are three main categories of such BI
analytics tools:
1. Traditional query and reporting
2. OLAP Tools
3. Data mining
40
Traditional query and reporting tools consist of spreadsheets such as MS-
Excel, personal computer databases like MS-Access and report writers like crystal
reports. There are plenty of tools available in this category from several vendors.
Some of the common features of these tools include support for all standard
databases, easy to use, usage of standard SQL, less resource intensive with small
client, offering output to common file formats such as PDF, Excel, HTML, and so on
(Biere, 2003). These tools provide predefined informative reports for regular usage.
Hence these tools are not suitable for decision influencing and business analysis
needs.
On-Line Analytical Processing (OLAP) Tools provide users with a capability to
interact with multi-dimensional data cubes. These tools provide a graphical view of
the multidimensional data as per the user’s analytical requirements.
There are several variants of the OLAP type products called Relational OLAP
(ROLAP) that view the database as a normalized schema, Multidimensional OLAP
(MOLAP) that loads data into an intermediate multidimensional structure called
cube. Database OLAP (DOLAP) provides OLAP functionality using the DBMS query
language and Hybrid OLAP (HOLAP) allows access to the data using either
multidimensional cubes or relational query language (Hoffer et al., 2002). However
most favored model is MOLAP due its performance and multidimensional
capabilities (Biere, 2003).
Figures 8 and 9 show some common applications of OLAP tools in slicing and
dicing the data, and drill-down for detailed overview.
41
Figure 8: Slicing a data cube (Hoffer et al., 2002)
Figure 9: Example of a drill-down (Hoffer et al., 2002)
42
Data-Mining Tools enable organizations to make better use of knowledge
contained in data by identifying trends and patterns from the data in data warehouse
and data marts. Data mining uses a mix of techniques from conventional statistics,
artificial intelligence and computer graphics (Hoffer et al., 2002). Data mining is not a
generic tool and it does not posses any hidden intelligence of its own. In order to be
effective, data mining tool must have access to the entire range of organizational
data required for intended analysis (Biere, 2003).
Following are some of the techniques used in the data-mining solutions
(Hoffer et al, 2003; Biere, 2003):
• Case-based reasoning – based on rules from real-world case
examples.
• Rule discovery – Searching for patterns and correlations in large data
sets.
• Signal Processing (Clustering) – Identifying clusters of information with
similar characteristics.
• Neural nets – develops predictive models based on principles modeled
after the human brain.
• Fractals – compressing large databases without losing information.
• Market basket analysis – single trip and over time.
• Time series analysis – trends over time.
Data-mining is used in several types of applications such as fraud analysis,
profiling populations, business trend analysis, target marketing, usage analysis,
43
customer value analysis, customer retention, up-selling and many others (Hoffer et
al, 2002; Biere, 2003).
APPLICATIONS OF BUSINESS INTELLIGENCE SOLUTIONS
Following are some of the application areas of the BI tools; some of them
have already been discussed earlier in this chapter:
• CPM (Corporate Performance Management)
• Multidimensional analysis, e.g., OLAP
• Click-stream analysis – used to analyze website traffic
• Data mining
• Forecasting
• Business Analysis
• Balanced scorecard preparation
• Visualization tools – presenting the information in graphical manner so
that users could self discover the trends and patterns
• Querying, Reporting and charting
• Geospatial analysis - analysis of features or phenomena that occur on
the earth
• Knowledge management
• Enterprise portal implementation
• Mining for text, content and voice
• Digital dashboard access - business management tool used to get a
quick overview of the business health
44
STEPS FOR BUILDING A BUSINESS INTELLIGENCE APPLICATION
Hancock and Toren (2006) recommend an iterative approach of short
duration BI projects that should focus on one specific business case at a time,
instead of going for a large enterprise-wide BI project. This approach is based on the
data warehousing methodologies suggested by Kimball and Ross (2003) that
supports building of a dimensional data warehouse using a series of interconnected
projects within the organization.
Main steps involved in building a BI application includes building a data
warehouse, ETL processes, OLAP database and implementing a user interface are
described in the following section (Hancock & Toren, 2006):
1. Business case assessment
Due to the high cost of creating a BI environment, an organization considering
such an initiative must develop a BI strategy based on their unique requirements and
a business justification to balance the expenditure involved and the benefits gained
(Moss & Atre, 2003; Wu, 2001). BI solution should be justified by analyzing its ROI
based on the true cost of a BI development project that could include hiring of BI
experts and purchasing proprietary software packages. This should also include
meeting between end-users and IT to establish the goals of a BI project (Biere,
2003). Most IT departments don’t understand the development methodology of a BI
solution. It requires business managers to take the lead on the BI projects and own
the application. IT may become the bottleneck, due to dated practices. All of this
points to carrying out a readiness assessment based on the current enterprise
infrastructure and improving the change management practice.
45
2. Building the data warehouse
Dimensional modeling is a data modeling techniques that have gained wide
acceptance for data warehouse design and implementations. Kimball is an
acknowledged guru in this field, who popularized the concept of dimensional
modeling as key technique in data warehouse building process (Humphries et al.,
1998). Unlike operational database modeling where normalization is on the top
priority, dimensional modeling involves denormalizing the database structure to
create schemas that are suitable for fast data retrieval for analytical requirements
and making decision support applications easier to use (Humphries et al., 1998;
Hoffer et al. 2002). Kimball and Ross (2003) provided an introduction to elements of
a dimensional model that consists of fact tables and dimension tables that are joined
to form a star schema, briefly described below.
Fact Table:
• Fact table is used to store the numerical performance measures of the
business. Figure 10 shows a sample fact table, “Fact_Sales” that
stores the quantifiable information on company sales, i.e., units sold,
unit price and sales amount.
46
Figure 10: Sample of a Fact Table
• Several dimensions are used to define the scope of the measurement
in the fact table. There are three dimensions, Time, Product and
Customer used in the sample fact table shown in figure 10.
• The level of detail in fact table is called grain of fact table. For example
in our sample fact table “Sales_Fact”, the grain of the fact table is a
line item in the sales transaction, characterized by field name,
SalesOrderLineNumber.
• A row in the fact table corresponds to a measurement. All
measurements in the fact table must be at the same grain level.
• Fact tables have many-to-many relationship with dimension tables and
consist of massive number of rows.
Dimension tables:
47
Figure 11: Example of Dimension tables
• Dimension tables serve as an entry point into the fact table. Robust
dimension attributes help in providing analytical slicing and dicing
capabilities.
• Fields in dimension table serve as the source for query constraints,
grouping and report tables. Hence the field names in dimension tables
are named descriptively, e.g., DateOfBirth instead of DOB, to make it
easier for end-user to create queries and analyze results. Also, data
contained in the table should be more descriptive, e.g., gender field is
populated with “Male” or “Female” rather than something like “M” or “F”
that may have been used in operational table.
• In analytical reports or queries, dimensional fields are used for
retrieving data by specific categories, e.g., sales by week, by brand, by
yearly income level of customer etc.
48
• Surrogate keys – every join between dimension and fact table are
based on meaningless numeric keys in sequential order called
surrogate keys or non-natural keys. For example in Figure 11, Product
dimension table uses ProductDimensionKey column as primary key
instead of the natural primary key column, ProductCode. There are
several advantages offered by this approach
- It helps isolate the data warehouse from operational data
changes in dimensional keys and maintain better control away
from the operational codes.
- Key overlaps problems are avoided since operational codes
may get purged and repeated after a period of time.
- There are performance advantages since numeric keys provide
better data indexing unlike some of the alphanumeric production
keys.
Star Schema:
49
Figure 12: Sample of a Star Schema
• Joining of the fact and dimension tables forms a star like structure in
ER diagram shown in figure 12, which is called a Star schema.
• Fact table consisting of numeric measurements is joined with a set of
dimension tables consisting of descriptive field attributes to create a
high performance analytical data structure.
• Star schema is simple, symmetric and easy to understand by the
business analysts due to the normalized ER model.
• Star schema is also quiet flexible to easily to add or remove facts and
dimension attributes.
50
3. Building the ETL process
In this step of BI application development the main objective is to populate the
dimension and fact tables in the data warehouse. The goal is to bring together data
from various sources, transform the data to make it compliant with dimensional
model and finally load the resulting data into the data warehouse.
ETL process includes following sequence of actions (Hancock & Toren,
2006):
- Data is extracted from various operational sources and stored in
a staging database, if necessary due to the nature of required
data transformations.
- Extracted data is transformed into the required form and loaded
into the dimensional tables first and then into the fact tables due
to the foreign key constraints.
- A one-time load of the historical and current data is done into
the data warehouse.
- After the one-time load, ETL processes are scheduled at regular
intervals to move current data into the data warehouse.
Dimensional tables can be updated or simply reloaded with
current data, if they are not too big. Fact tables have to be
regularly appended with recent data.
51
4. Building an OLAP database
OLAP database provides the flexible way for the users to query data to
support their business initiatives while taking advantage of the data collected from
the ETL processes into the data warehouse.
OLAP database is built using a set of tools other than the RDBMS engine
used for creating relational databases such as the ones used for the data
warehouse. Microsoft SQL Server 2005 Analysis Services (SSAS) is an example of
a tool that provides the capability to create OLAP databases and manage the cubes.
Creating an OLAP database solution with SSAS involves following steps
(Hancock & Toren, 2006):
- Specify data source to connect to the data warehouse.
- Create the data source view that contains the logical view of the
parts of the source data warehouse used for analysis, i.e., fact
and dimension tables.
- Identify the dimension and fact tables along with their
relationships in the data source view.
- Build the cube using the data source view. Each fact table that
is included in the cube becomes a measure group, with a
corresponding set of measures based on the numeric fact
columns. Some calculated fact fields could be added to the
cube at this stage. Related attributes are organized into a
hierarchy providing a way for the users to navigate the
information by drilling down through one or more levels.
52
- Next step is to load the data into the cube and use the cube
browser built into the analysis services tool to view the data for
testing purposes.
- Security schemes can also be added to the OLAP database by
adding different roles to restrict access to any sensitive
information to a selected group of people in the organization.
- Final step is to schedule the cube processing with the current
data from the data warehouse at regular intervals. This step is
usually synchronized so that it happens just after the ETL
process has finished its business.
5. Implementing user interface
A complete BI solution must supply information in whatever suitable ways
required by the business analysts. Various types user interface components have
been described earlier in the topic on components of a BI solution.
Essentially a user interface should support information retrieval from the
OLAP cube to display summary reports and allow drill-down at desired levels of
information. Depending on the user requirements an interface could be as simple as
an Excel spreadsheet or as complicated as a data mining solution discussed earlier.
FUTURE TRENDS IN BUSINESS INTELLIGENCE
BI technologies have become an integral part of the organizations and
advancing at a rapid pace. Following are some of the future trends in the field of BI:
• Advanced analytical and data visualizations techniques such as
predictive modeling, guided decision-making capabilities and
53
geographic visualization will continue to improve (Imhoff, 2006;
Knightsbridge, 2007).
• Mergers and consolidation among BI vendors will continue resulting in
more options for packaged BI solution offerings and full-service BI
vendors (Biere, 2003; Imhoff, 2006). Small innovative players in BI
business are being taken over by the giants like IBM and Oracle
(Imhoff, 2006).
• Real-time analytics is under development to shorten the data latency
between the business event happening and reporting. Virtual BI
components such as virtual operational data stores (ODSs) and data
marts using enterprise EII (Enterprise Information Integration)
technologies are being developed to reduce data latency (Imhoff,
2006).
• CPM (Corporate Performance Management) is an emerging area of
BI, which enables a company to align its execution with business
objectives. This involves data integration and extending the usage of
KPIs throughout the organization, so that executives can easily
understand the impact of operational activities on financial results and
organizational goals (Schauer, 2004; Knightsbridge, 2007).
• BI Networks concept is under discussion by various vendors. BI
Networks will have the ability to provide a common BI platform to the
organization’s customers and partners for group-based decision
54
making. BI information could be published and controlled by the
enterprise under such network (Biere, 2003).
• SOA (Service Oriented Architecture) present some great opportunities
for delivering enhanced BI capabilities to users. “An SOA-enabled BI
infrastructure could provide seamless access to both batch and real-
time data integrated across operational and analytical sources. SOA
also presents opportunities for innovation in areas such as real-time
data collection and real-time analytic services.” (Knightsbridge, 2007).
55
CHAPTER IV
EXAMPLE OF BI APPLICATION DEVELOPMENT
INTRODUCTION
This chapter includes the documentation and step-by-step explanation of a
sample BI solution development for a fictitious car rental company called Car-Rental
Inc. Microsoft SQL Server 2005 database engine, SQL Server Business Intelligence
development studio, SQL Server Analysis server, SQL Server Integration server and
an evaluation version of ProClarityTM Desktop Professional software version 6.0 have
been used for the creation of the sample BI application.
This example is a typical representation of a real life BI application that
consists of various tools, technologies and processes.
PROJECT SCENARIO
Car-Rental Inc. is a medium size company dealing in car rental business
across Canada. Car-Rental Inc. started as a small company with very few rental
offices (called locations) in Ontario; however it has grown significantly over the last
decade with 250 locations across Canada. Head office of Car-Rental Inc. is located
in Mississauga, Ontario. Car-Rental has around 1000 employees working at various
locations and regional offices. There are several different types of cars available for
rental, including luxury brands and hybrid cars. Car rental rates include some free
kilometers of driving. After that an extra per kilometer charge is applicable.
In 1998, Car-Rental launched a website to allow visitors to make rental
reservations on the web. This move was highly successful as it allowed visitors from
other countries and different cities to easily make reservations on the web for vehicle
56
pickup on the company’s airport locations in Canada. Car-Rental also found it more
effective to run various promotions offering special rates on their website. Figure 13
shows an entity-relationship (ER) diagram for the Car-Rental website database.
Appendix A contains the SQL script for all the database objects (tables and
constraints) used in the Car-Rental website database.
Figure 13: Operational Database Diagram for Car-Rental Inc. Website
57
Marketing department works in close coordination with Rates administration
group to run various promotions on the web. Marketing keeps track of holiday
seasons, any major events, past experience with promotions and business
competitors to regulate these promotional activities.
CURRENT ENVIRONMENT
Marketing and rates administration departments keep making IT requests for
new reports for different business analysis scenarios as and when they come up
during their interdepartmental meetings and changing business conditions. As a
result, over the years there have been hundreds of reports accumulated on the
company intranet. There are several issues with the existing environment:
• Many of the reports are used only once to answer a specific business
question.
• Due to the high number of reports, IT and marketing department have
lost track of the objective and logic behind various reports.
• Most of the reports are static and do not allow users to drill-down into
details.
• Due to the fast-paced business environment in car rental business,
marketing department is loosing on business opportunities while
waiting for the IT department to create reports for getting their queries
answered.
• Reports often provide inconsistent results.
58
• Some reports contain data aggregates; compare year-by-year results
and involve complex calculations that require several minutes to return
results.
• Various users within the company run these reports several times a
day. Since reports are running directly from the operational database,
this is negatively affecting the website performance during peak hours.
• Users in the rates administration department are interested in different
view of the reports originally created for Marketing, resulting in
duplication of work for IT for creating more than one report based on
the same core logic and calculations.
• Car-Rental website database was designed and normalized for the
online rental reservation event. IT developers find it difficult to create
reports for doing business analysis such as analyzing the effectiveness
of one promotion over the other or finding out the effect of holiday
seasons on the number of reservations and so on.
THE SOLUTION
To counter the problems with the existing environment, managers at Car-
Rental Inc. decided to implement data warehousing techniques and develop a BI
solution for the Internet reservation information. Management at Car-Rental Inc.
wanted to accomplish following goals with this new initiative:
• Provide common and consistent source of data for analytical analysis
and reporting.
59
• Analytical information should be easily accessible by the business
users in the most efficient manner.
• Operational systems should be separated from the load of doing data
analysis and reporting queries.
• Users should be able to quickly access the historical information and
data aggregates, such as total number of reservations and
reservations revenue for a specific period.
• Users should also be able to slice and dice through the information to
view data from different angles and levels of details.
BUILDING THE BI APPLICATION FOR CAR-RENTAL INC.
Building a BI application for the Car-Rental Inc. requirements discussed
above required design, development and integration of following components:
1. Data warehouse
Four-step dimensional design process suggested by Kimball and Ross (2003)
was applied as follows to design the data warehouse:
1. Select the business process to model – Business process to model in
this case is the car rental reservation received from the website.
Focusing on the business process rather than the departments
themselves helps in designing the dimensional schema more efficiently
with consistent information.
2. Declare the grain of the business process – Grain of business process
is selected based on answer to the question: how would you describe
a single row in the fact table. We selected the most atomic information
60
for this process, i.e., rental reservation, to be the grain. Business users
were consulted for their agreement on the selected fact table
granularity.
3. Choose the dimensions that apply to each fact table row – This step is
based on the attributes that business people use to describe the data
that results from the business process, i.e., reservations in this case.
Based on the information gathered from business users, we choose
dimension tables as Date, Location, Car Class and Promotion. All
possible descriptions that take on single values in context of each
measurement are added as a field to these dimension tables. Business
analysts were consulted while selecting robust attributes for the
dimension tables, e.g., fiscal year, calendar year, long weekend
indicator, holiday indicator, hybrid car class indicator and airport
location. Dimensional fields have meaningful names and data type
selected to contain descriptive information in the field. For example,
field HybridIndicator will hold information like “Hybrid” or “Non-Hybrid”,
rather than “Y” or “N”, to help the end users easily understand the
information.
4. Identify numeric facts that will populate each fact table row – This is
based on the information that business users are trying to measure
and analyze. For our car rental reservation scenario the measurements
of importance are rental days, rental rate and number of reservations.
Hence the fact table, “Internet_Rez_Fact” is added with measurement
61
fields such as RentalDays and RentalRate. We don’t need to create a
reservation count measure, since it can be extracted from the OLAP
cube as an aggregate measure that is equal to the number of rows in
fact table.
Figure 14 shows the star schema for the data warehouse ER diagram for car
rental reservations based on the dimensional design approach discussed above.
Appendix B contains the SQL scripts for creation of the data warehouse illustrated in
figure 14.
Figure 14: Data Warehouse ER diagram for Car-Rental Inc. Website Reservations
62
2. ETL Process
ETL process mainly includes two stages of data loading, one for the historical
data and the other being the incremental load for ongoing data updates. Since the
reservation data on the website database for the Car-Rental Inc. is not massive at
the moment, IT department recommended reloading all the data in dimension and
fact tables on daily basis for the sake of simplicity, rather than trying to do the
incremental updates. A SQL Server Integration Services (SSIS) project was created
in SQL Server BI development studio to accomplish the ETL task.
Figure 15: ETL Tool (Integration Services) from SQL Server BI development studio
63
The tasks and transformations necessary for each of these ETL processes
are stored in Integration Services packages. Figure 15 shows a package created in
SSIS project to transform and load data into dimension tables. As shown in the
figure, a Connection manager feature is used to specify the source and destination
connection used by the Data Flow task. Since the data is being reloaded into the
dimension tables each time, there is a Preparation SQL task in place to empty out
the dimension tables first before loading the fresh set of data.
SQL queries were developed to extract and transform the source data to
make it available for loading into the data warehouse. OLE DB Destination Editor
window, as shown in figure 16, is used to do mapping between source query and
destination table in data warehouse. Location_Key being a surrogate key has been
ignored in the mapping grid, so that the Identity feature of T-SQL applied on this
column takes care of auto-incrementing the value in Location_Key field with insertion
of each row.
Loading of fact table with reservation data requires not only the collection of
data from the operational database but also transformation of the business keys in
the source records to the corresponding surrogate keys used by the dimension
tables. SSIS makes this task simple by providing a Lookup transform task to
translate business keys into surrogate keys.
After thorough testing, ETL packages were deployed on the production server
and scheduled to run on daily basis using SQL Server Agent service.
64
Figure 16: Data mapping between source and destination inside a SSIS package
3. OLAP Database
An OLAP database consisting of the Reservations Cube was created in
Microsoft SQL Server Analysis services (SSAS) using SQL Server BI Development
Studio.
As shown in the figure 17, OLAP cube is based on a data source view (DSV)
created from the data source connection to the data warehouse.
65
Figure 17: Data Source View used in the OLAP cube
Internet reservations fact table that is included in the cube has Rental Days
and Rental Rate measures. As can be seen in figure 17, a new measure Internet
Rez Fact Count is added to the cube, which is based on the number of rows in the
fact table. A calculated measure for Rental Amount, which is equal to the
multiplication result of the existing measures, i.e., Rental Days and Rental Rate, is
also added to the cube. Related attributes in Date dimension and Location
dimension were organized into a hierarchy. This would enable the users to drill-down
66
into the summary details, for example, after viewing the data for calendar year; users
may want to drill down to view data by quarter, by month or by week.
After processing the cube, cube browser built into the analysis services tool
was used to view the data for testing purposes. Figure 18 shows a test scenario
involving data analysis for a calculated measure, Rental Amount generated from
Luxury car classes at the Airport locations over the years. Ability to drill down to the
Phone Area Code level from the Province level of data summary is also
demonstrated in figure 18.
Figure 18: OLAP Cube Browser in SQL Server BI Development Studio
Finally, a cube-processing job was scheduled in synchronization with the ETL
process, so that the cube is processed immediately after the ETL job has finished.
4. User Interface
There were several options considered for the user interface tool for the
OLAP database, such as MS Excel, SQL Server Reporting services, Crystal reports
and ProClarity desktop professional. However, the end-users preferred ProClarity
67
desktop professional to the others for this application because of the ease of use,
flexibility, ability to drill down, graphical analysis capabilities and available extensions
for any future data mining requirements.
Figure 19 shows a graphical analysis in ProClarity tool with bar charts
illustrating the number of reservations over the years for different car classes.
Figure 19: End User OLAP Tool - ProClarity Desktop Professional
68
CHAPTER V
CONCLUSIONS AND RECOMMENDATIONS
Conclusion
Business users across the organizations are getting frustrated by the their
dependence on IT for providing the analysis information and are demanding for the
information to be available to them in timely and flexible manner suitable for analysis
from different points of view. Due to the continuously changing business conditions,
business executives want to interrogate the data in ever expanding ways, without
having to go through IT, the ownership, usage, change management, maintenance
and data access issues. Business intelligence (BI) applications provide a solution to
these problems by arming the end-users with tools to improve their awareness of the
key events and trends that are important to the decision makers in the organization.
An organization can benefit from the BI applications in several ways including
improved coordination of business activities, ability to adjust quickly to changes,
ability to make better and faster decisions reducing the guesswork, being proactive
rather than reactive, being ahead of the competition, providing better customer
experience, realization of an enterprise’s strategy, mission, goals and tasks,
leveraging the existing investments in the systems to full potential and thus
improving the overall business performance.
Due to the high cost of creating a BI environment, an organization considering
such a project must develop a BI strategy, readiness assessment and business
justification to balance costs involved and the benefits gained. Most IT departments
69
don’t understand the development methodology of a BI solution. It requires business
managers to take the lead on the BI projects and own the application.
Potential opportunities presented by the BI solutions are just too many to
ignore for any professional business in these times of low margins, cut throat
competition and increasing globalization.
BI solutions have to be developed for each specific business scenario; hence
there are no packaged BI solutions available that could suit any organization.
There are several tools and technologies involved in development of a BI
application, such as data warehousing, data integration or ETL processes, OLAP,
user interfaces and data mining techniques. Due the complex nature of these
components and high cost of development, businesses have to make serious efforts
to implement BI capabilities in their organization.
BI field is advancing at a rapid pace and there are several ongoing trends
emerging in the future development of this field. BI users are now demanding real
time business intelligence. Advanced data mining techniques involving conventional
statistics, artificial intelligence and computer graphics are being used for predictive
modeling, analytics “suites”, dashboards, and supply-chain management. New
concepts in BI like CPM, BI Networks and SOA architecture are going to be under
continuous development in future.
Recommendations
Operational systems based on OLTP applications are not designed and
appropriate to handle business analysis and reporting requirements. Hence the
organizations must create separate OLAP based systems to support decision-
70
making, based on the information collected from various internal and external
sources (Humphries et al., 1998).
BI projects must be based on a strategy to meet the needs of the entire
organization to get the most benefit out of this technology. This strategy should
include the means of data transfer from the organization's information systems, as
well as a course of action to allow the organization to realize its desired BI goal. The
BI strategy must contain the functional requirements along with the supporting
technology (Wu, 2001).
Instead of going for a large-scale enterprise wide BI project, an iterative
approach of developing small value-based BI projects is more beneficial for the
organization. This approach also provides opportunity for improvement, learning
through the different phases and could easily adapt to changing business conditions
(Hancock & Toren, 2006; Kimball & Ross, 2003).
Dimensional modeling technique, which is widely accepted for the data
warehouse design process, should be applied as a key technique during the data
warehouse building phase.
OLAP database provides flexible and efficient ways for the users to query
data to support their business initiatives. End-user tools must be able to read directly
from the OLAP cubes to take advantage of their capabilities to quickly retrieve data
aggregates and easily allow user interface tools to drill down or slice and dice the
information as per the requirements.
71
ETL and OLAP cube processing jobs must be synchronized so that the cube
processing is done after the ETL process has finished data loading into the data
warehouse.
The non-IT business users handle the user interface tools; hence these tools
must be intuitive, allow users to easily identify available fact and dimension
information, and support drill down of the information at required levels.
Suggestions for further research
As discussed earlier, BI field consists of multiple tools and technologies that
are continuously undergoing improvements. This provides many opportunities for
further research on the subject of BI.
An interesting area of research could be based on Data mining tools and
techniques involving AI for Case-based reasoning, Rule discovery, Signal
Processing, Neural nets, Fractals, Market basket analysis and Time series analysis.
Real time business intelligence is another relatively new area of research that
includes proactive caching and "push" notifications of the new data from source
database to OLAP database (Hancock & Toren, 2006).
Implementation of RSS feeds, XML web services, Interlinks between global
companies (supply chains) and inter-company Data Warehouses are some of the
other areas of research.
72
REFERENCES
Adamson C. and Venerable M. (1998). Data Warehouse Design Solutions. Wiley
Computer Publishing
All-bi.com (2006). all-BI Business Intelligence Solutions. Retrieved on Oct 20, 2006
from http://www.all-bi.com/
Biere, M. (2003). Business Intelligence for the Enterprise. IBM Press. Prentice Hall
Professional Technical Reference
Cognos (2004). Using BI to Leverage ERP Data. A Cognos white paper. Retrieved
on Nov 20, 2006 from
http://www.cognos.com/pdfs/whitepapers/wp_using_bi_to_leverage_erp_data
Data-Warehouses.net (2006). ETL Process - Guide to Data Warehousing and
Business Intelligence. Retrieved on Oct 23, 2006 from http://data-
warehouses.net/architecture/etlprocess.html
Drewek K. (2005). Data Warehousing: Our Great Debate Wraps Up. Business
Intelligence Network. Retrieved on Feb 03, 2007 from http://www.b-eye-
network.com/view/766
Hancock J. C. & Toren, R. (2006). Practical Business Intelligence with SQL Server
2005. Addison Wesley Professional
Hoffer, J. A., Prescott, M. B. & McFadden, F. R. (2002). Modern Database
Management. Sixth Edition. Prentice Hall.
Humphries, M., Hawkins, M. W. & Dy, M. C. (1998). Data Warehousing: Architecture
and Implementation. Prentice Hall.
73
Imhoff C. (2006). Three Trends in Business Intelligence Technology. Business
Intelligence Network. Retrieved on Feb 03, 2007 from http://www.b-eye-
network.com/view/2608.
ITtoolbox (2006). 2005 ITtoolbox Data Warehouse Survey. Retrieved on Nov 21,
2006 from
http://oracle.ittoolbox.com/documents/research/survey.asp?survey=oracledw_
Survey&p=1.
Jedras, J. (2006). BI helping companies look past profitability. Computer World.
Vol.22, #21.
Kimball R. & Ross M. (2002). The Data Warehouse Toolkit, Second Edition. Wiley
Computer Publishing
Knightsbridge (2007). The Top Ten Trends in Business Intelligence for 2007.
Knightsbridge Solutions LLC. White Paper.
Liautaud B. & Hammond M (2001). e-Business Intelligence. McGraw-Hill.
Martens, C. (2006). Business intelligence and the 'wow' factor. IT World Canada.
Retrieved on Oct 15, 2006 from
http://www.itworldcanada.com/Pages/Docbase/ViewArticle.aspx?ID=idgml-
9930e82d-bdfc-42a0-9ac1-be64d6fbe0bc&ql=062676
Moss L. & Atre S. (2003). Business Intelligence Roadmap: the complete project
lifecycle for decision-support applications. Addison-Wesley Information
Technology Series.
Olszak, C. M. & Ziemba, E. (2006). Business Intelligence Systems in the Holistic
Infrastructure Development Supporting Decision-Making in Organizations.
74
Interdisciplinary Journal of Information, Knowledge, and Management,
Volume 1, 2006.
Raisinghani, M. (2004). Business Intelligence in the Digital Economy: Opportunities,
Limitations and Risks. Idea Group Inc.
Scalzo, B. (2003). Oracle® DBA Guide to Data Warehousing and Star Schemas.
Prentice Hall
Schauer J (2004). The Next Evolution in Business Intelligence. Executive Interview
published in DM Review Magazine. Retrieved on Feb 03, 2007 from
http://www.dmreview.com/article_sub.cfm?articleId=1011022
Todman, C. (2000). Designing a Data Warehouse: Supporting Customer
Relationship Management. Prentice Hall.
Wikipedia.org (2006). Define: Business Intelligence. Retrieved on Aug 10, 2006 from
http://en.wikipedia.org/wiki/Business_intelligence_tools
Wu, J. (2001). Business Intelligence: The Value of Business Intelligence
Applications. Retrieved on Sep 24, 2006 from
http://www.dmreview.com/article_sub.cfm?articleId=3887
Wu, J. (2002). Business Intelligence: Visualization of Key Performance Indicators.
Retrieved on Feb 02, 2007 from
http://www.dmreview.com/article_sub.cfm?articleId=5229
75
APPENDIX A
OPERATIONAL DATABASE SCRIPT FOR THE SAMPLE APPLICATION
Database Engine: SQL Server 2005
Database Query Language: Transact SQL (T-SQL)
Database Name: CarRentalWeb
List of Tables:
- Reservations
- Promotions
- Rates
- CarClass
- Locations
- Province
- MajorEvents
- Holidays
Script to create operational database objects:
USE [CarRentalWeb]
GO
/****** Object: Table [dbo].[CarClass] Script Date: 11/26/2006 22:16:02
******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
76
SET ANSI_PADDING ON
GO
CREATE TABLE [dbo].[CarClass](
[CarClassID] [int] IDENTITY(1,1) NOT NULL,
[CarClassName] [varchar](50) NULL,
[Hybrid] [bit] NULL,
[Luxury] [bit] NULL,
CONSTRAINT [PK_CarClass] PRIMARY KEY CLUSTERED
(
[CarClassID] ASC
)WITH (IGNORE_DUP_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY]
GO
SET ANSI_PADDING OFF
GO
/****** Object: Table [dbo].[Holidays] Script Date: 11/26/2006 22:16:02
******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
SET ANSI_PADDING ON
77
GO
CREATE TABLE [dbo].[Holidays](
[HolidayID] [int] IDENTITY(1,1) NOT NULL,
[HolidayName] [varchar](50) NOT NULL,
[Date] [datetime] NOT NULL,
[LongWeekend] [bit] NOT NULL,
CONSTRAINT [PK_Holidays] PRIMARY KEY CLUSTERED
(
[HolidayID] ASC
)WITH (IGNORE_DUP_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY]
GO
SET ANSI_PADDING OFF
GO
/****** Object: Table [dbo].[Locations] Script Date: 11/26/2006 22:16:02
******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
SET ANSI_PADDING ON
GO
78
CREATE TABLE [dbo].[Locations](
[LocationID] [int] IDENTITY(1,1) NOT NULL,
[LocationName] [varchar](50) NOT NULL,
[Address] [varchar](50) NOT NULL,
[City] [varchar](50) NOT NULL,
[ProvinceID] [char](2) NOT NULL,
[PostalCode] [varchar](50) NOT NULL,
[PhoneNumber] [varchar](50) NOT NULL,
[AirportLocation] [bit] NOT NULL,
CONSTRAINT [PK_Locations] PRIMARY KEY CLUSTERED
(
[LocationID] ASC
)WITH (IGNORE_DUP_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY]
GO
SET ANSI_PADDING OFF
GO
/****** Object: Table [dbo].[MajorEvents] Script Date: 11/26/2006 22:16:02
******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
79
GO
SET ANSI_PADDING ON
GO
CREATE TABLE [dbo].[MajorEvents](
[EventID] [int] NOT NULL,
[EventName] [varchar](50) NOT NULL,
[DateFrom] [datetime] NOT NULL,
[DateTo] [datetime] NOT NULL,
CONSTRAINT [PK_MajorEvents] PRIMARY KEY CLUSTERED
(
[EventID] ASC
)WITH (IGNORE_DUP_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY]
GO
SET ANSI_PADDING OFF
GO
/****** Object: Table [dbo].[Promotions] Script Date: 11/26/2006 22:16:02
******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
80
SET ANSI_PADDING ON
GO
CREATE TABLE [dbo].[Promotions](
[PromotionID] [int] IDENTITY(5,5) NOT NULL,
[PromotionName] [varchar](50) NOT NULL,
[ActiveFrom] [datetime] NOT NULL,
[ActiveTo] [datetime] NOT NULL,
CONSTRAINT [PK_Promotions] PRIMARY KEY CLUSTERED
(
[PromotionID] ASC
)WITH (IGNORE_DUP_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY]
GO
SET ANSI_PADDING OFF
GO
/****** Object: Table [dbo].[Province] Script Date: 11/26/2006 22:16:02
******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
SET ANSI_PADDING ON
81
GO
CREATE TABLE [dbo].[Province](
[ProvinceID] [char](2) NOT NULL,
[ProvinceName] [varchar](50) NOT NULL,
CONSTRAINT [PK_Province] PRIMARY KEY CLUSTERED
(
[ProvinceID] ASC
)WITH (IGNORE_DUP_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY]
GO
SET ANSI_PADDING OFF
GO
/****** Object: Table [dbo].[Rates] Script Date: 11/26/2006 22:16:02 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[Rates](
[RateID] [int] IDENTITY(1,1) NOT NULL,
[CarClassID] [int] NOT NULL,
[PromotionID] [int] NOT NULL,
[RatePerDay] [money] NOT NULL,
82
[FreeKmsPerDay] [int] NOT NULL,
[ExtraPerKmChrg] [money] NOT NULL,
CONSTRAINT [PK_Rates] PRIMARY KEY CLUSTERED
(
[RateID] ASC
)WITH (IGNORE_DUP_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY]
GO
/****** Object: Table [dbo].[Reservations] Script Date: 11/26/2006 22:16:02
******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[Reservations](
[ReservationNumber] [int] IDENTITY(10000,1) NOT NULL,
[FirstName] [nchar](10) NULL,
[LastName] [nchar](10) NULL,
[DOB] [nchar](10) NULL,
[DriverLicNumber] [nchar](10) NULL,
[LocationID] [int] NOT NULL,
[PickupDate] [datetime] NOT NULL,
83
[DropoffDate] [datetime] NOT NULL,
[RateID] [int] NOT NULL,
[LastUpdateDate] [datetime] NULL,
CONSTRAINT [PK_Reservations] PRIMARY KEY CLUSTERED
(
[ReservationNumber] ASC
)WITH (IGNORE_DUP_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY]
GO
USE [CarRentalWeb]
GO
USE [CarRentalWeb]
GO
USE [CarRentalWeb]
GO
ALTER TABLE [dbo].[Locations] WITH CHECK ADD CONSTRAINT
[FK_Locations_Province] FOREIGN KEY([ProvinceID])
REFERENCES [dbo].[Province] ([ProvinceID])
GO
ALTER TABLE [dbo].[Rates] WITH CHECK ADD CONSTRAINT
[FK_Rates_CarClass] FOREIGN KEY([CarClassID])
REFERENCES [dbo].[CarClass] ([CarClassID])
84
GO
ALTER TABLE [dbo].[Rates] WITH CHECK ADD CONSTRAINT
[FK_Rates_Promotions] FOREIGN KEY([PromotionID])
REFERENCES [dbo].[Promotions] ([PromotionID])
GO
ALTER TABLE [dbo].[Reservations] WITH CHECK ADD CONSTRAINT
[FK_Reservations_Locations] FOREIGN KEY([LocationID])
REFERENCES [dbo].[Locations] ([LocationID])
GO
ALTER TABLE [dbo].[Reservations] WITH CHECK ADD CONSTRAINT
[FK_Reservations_Rates] FOREIGN KEY([RateID])
REFERENCES [dbo].[Rates] ([RateID])
85
APPENDIX B
DATA WAREHOUSE CREATION SCRIPT FOR THE SAMPLE APPLICATION
Database Engine: SQL Server 2005
Database Query Language: Transact SQL (T-SQL)
Database Name: CarRentalWeb_DW
List of Tables:
- Internet_Rez_Fact
- Promotion_Dimension
- Rate_Dimension
- CarClass_Dimension
- Location_Dimension
- Date_Dimension
Script to create Data warehouse objects:
USE [CarRentalWeb_DW]
GO
/****** Object: Table [dbo].[CarClass_Dimension] Script Date: 12/02/2006
00:47:11 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
SET ANSI_PADDING ON
GO
86
CREATE TABLE [dbo].[CarClass_Dimension](
[Class_key] [int] IDENTITY(1,1) NOT NULL,
[CarClassID] [int] NULL,
[CarClassName] [varchar](50) NULL,
[Hybrid_Indicator] [varchar](50) NULL,
[Luxury_Indicator] [varchar](50) NULL,
CONSTRAINT [PK_Class_Dimension] PRIMARY KEY CLUSTERED
(
[Class_key] ASC
)WITH (IGNORE_DUP_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY]
GO
SET ANSI_PADDING OFF
GO
/****** Object: Table [dbo].[Date_Dimension] Script Date: 12/02/2006
00:47:11 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
SET ANSI_PADDING ON
GO
87
CREATE TABLE [dbo].[Date_Dimension](
[Date_key] [int] NOT NULL,
[Date_Pickup] [datetime] NULL,
[Calender_Month_Name] [varchar](50) NULL,
[Calender_Month_Number] [varchar](50) NULL,
[Calender_Month_Name_Year] [varchar](50) NULL,
[Quarter] [varchar](50) NULL,
[Quarter_Calender_Year] [varchar](50) NULL,
[Calender_Year] [varchar](50) NULL,
[Fiscal_Month_Number] [varchar](50) NULL,
[Fiscal_Month_Number_Year] [varchar](50) NULL,
[Fiscal_Year] [varchar](50) NULL,
[WeekdayName] [varchar](50) NULL,
[Weekend_Indicator] [varchar](50) NULL,
[Holiday_Indicator] [varchar](50) NULL,
[LongWeekend_Indicator] [varchar](50) NULL,
[MajorEvent_Indicator] [varchar](50) NULL,
CONSTRAINT [PK_Date_Dimension] PRIMARY KEY CLUSTERED
(
[Date_key] ASC
)WITH (IGNORE_DUP_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY]
88
GO
SET ANSI_PADDING OFF
GO
/****** Object: Table [dbo].[Internet_Rez_Fact] Script Date: 12/02/2006
00:47:11 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
SET ANSI_PADDING ON
GO
CREATE TABLE [dbo].[Internet_Rez_Fact](
[Date_key] [int] NOT NULL,
[Class_key] [int] NOT NULL,
[Location_key] [int] NOT NULL,
[Promotion_key] [int] NOT NULL,
[ReservationNumber] [varchar](50) NULL,
[RentalDays] [int] NULL,
[RentalRate] [money] NULL
) ON [PRIMARY]
GO
SET ANSI_PADDING OFF
89
GO
/****** Object: Table [dbo].[Location_Dimension] Script Date: 12/02/2006
00:47:11 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
SET ANSI_PADDING ON
GO
CREATE TABLE [dbo].[Location_Dimension](
[Location_key] [int] IDENTITY(1,1) NOT NULL,
[LocationID] [int] NULL,
[LocationName] [varchar](50) NULL,
[Address] [varchar](50) NULL,
[City] [varchar](50) NULL,
[Province] [varchar](50) NULL,
[PostalCodeF3] [varchar](50) NULL,
[PhoneAreaCode] [varchar](50) NULL,
[AirportLocation_Indicator] [varchar](50) NULL,
CONSTRAINT [PK_Location_Dimension] PRIMARY KEY CLUSTERED
(
[Location_key] ASC
)WITH (IGNORE_DUP_KEY = OFF) ON [PRIMARY]
90
) ON [PRIMARY]
GO
SET ANSI_PADDING OFF
GO
/****** Object: Table [dbo].[Promotion_Dimension] Script Date: 12/02/2006
00:47:11 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
SET ANSI_PADDING ON
GO
CREATE TABLE [dbo].[Promotion_Dimension](
[Promotion_key] [int] IDENTITY(1,1) NOT NULL,
[PromotionID] [int] NULL,
[PromotionName] [varchar](50) NULL,
[ActiveFrom] [smalldatetime] NULL,
[ActiveTo] [smalldatetime] NULL,
[PromotionName_ActiveFromTo] [varchar](50) NULL,
CONSTRAINT [PK_Promotion_Dimension] PRIMARY KEY CLUSTERED
(
[Promotion_key] ASC
91
)WITH (IGNORE_DUP_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY]
GO
SET ANSI_PADDING OFF
GO
USE [CarRentalWeb_DW]
GO
ALTER TABLE [dbo].[Internet_Rez_Fact] WITH CHECK ADD
CONSTRAINT [FK_Internet_Rez_Fact_Class_Dimension] FOREIGN
KEY([Class_key])
REFERENCES [dbo].[CarClass_Dimension] ([Class_key])
GO
ALTER TABLE [dbo].[Internet_Rez_Fact] WITH CHECK ADD
CONSTRAINT [FK_Internet_Rez_Fact_Date_Dimension] FOREIGN
KEY([Date_key])
REFERENCES [dbo].[Date_Dimension] ([Date_key])
GO
ALTER TABLE [dbo].[Internet_Rez_Fact] WITH CHECK ADD
CONSTRAINT [FK_Internet_Rez_Fact_Location_Dimension] FOREIGN
KEY([Location_key])
REFERENCES [dbo].[Location_Dimension] ([Location_key])
GO
92
ALTER TABLE [dbo].[Internet_Rez_Fact] WITH CHECK ADD
CONSTRAINT [FK_Internet_Rez_Fact_Promotion_Dimension] FOREIGN
KEY([Promotion_key])
REFERENCES [dbo].[Promotion_Dimension] ([Promotion_key])