Guide for Datawarehousing

24
Business Intelligence A practical guide to getting started with Data Warehousing

Transcript of Guide for Datawarehousing

Page 1: Guide for Datawarehousing

Business Intelligence

A practical guide to getting started withData Warehousing

Page 2: Guide for Datawarehousing

ii

This report was prepared for IBM by the Applied TechnologiesGroup.The text is the property of the Applied TechnologiesGroup. IBM has been granted unrestricted rights to producethe text in this format. For additional information, contact theApplied Technologies Group, One Apple Hill, Suite 216, Natick,MA 01760,Tel. (508) 651-1171.

The following terms are trademarks or registered trademarks of the IBM Corporation in the Untied States,other countries or both:

AIXAS/400Data GuideDB2 DataJoinerDataPropagatorDB2DB2 OLAP ServerDB2 Universal DatabaseIBMIMSInformation WarehouseIntelligent MinerNetFinityOS/2OS/400OS/390RISC System/6000SP2Visual Warehouse

UNIX is a registered trademark of The Open Group.

Windows NT is trademark of Microsoft Corporation in theUnited States, other countries or both.

Microsoft,Windows, and Windows NT are trademarks ofMicrosoft Corporation in the United States, other countriesor both.

Other company, product, and service names, which mightbe denoted by a double asterisk (**), might be trademarks or service marks of others.

© International Business Machines Corporation 199910-99All Rights Reserved

Page 3: Guide for Datawarehousing

iii

Business Intelligence

A practical guide to getting startedwith Data Warehousing

Page 4: Guide for Datawarehousing

Visual Warehouse:The IBM data warehouse & datamart solution

“We needed to build a very large data warehouse on a massively parallel processing platform. Very few companies play well in that world, and most are proprietary. The open architecture, massively parallel processing capability, and scalability of DB2 were the driving forces behind our decision.”—Jane Landon, Systems Executive, Prudential Life Insurance of America

“IBM has an extensive range of database products that arevery well complemented by a nice set of supporting businessintelligence products such as Visual Warehouse.”—Ray Ziegler, Assistant Vice President, Information Technology, CIGNA Reinsurance

“We need to be able to slice and dice our data by payer,physician, or geography, for strategic planning, budgeting,and negotiating contracts with the insurance community.Visual Warehouse equips us with the capabilities to do thisand provides timely answers to high-level questions…We believe this data warehousing solution is going to be a journey over many years and will open us to new opportunities—this is just the starting point.”—Ron Johnson, Chief Information Officer, Shore Memorial Hospital

• WWW Access• Query/Reporting• OLAP• Data Mining• Business Intelligence

Applications

InformationUsers & Analysts

• Data Extraction• Automated Scheduling• User Authorization• Warehouse Management

and Monitoring• Information Cataloging• Data Cleansing

Data WarehouseWarehouseGeneration

&Management

SybaseIMS

VSAMFiles

DB2Oracle

InformixSQL Server

iv

Visual Warehouse is an inte-grated package of businesssoftware that shortens the distance between mountains of complex data and insightfulbusiness decisions that delivera competitive edge. Based oninnovative, proven technology,Visual Warehouse can helpcompanies both large andsmall uncover entirely newinsights about their businesses,their markets, their competitors,and their customers. Theseinsights can improve the bottomline by providing a more comprehensive and focusedview of the market realitiesthat affect day-to-day business operations.

Page 5: Guide for Datawarehousing

Extensible

Quick start

v

R o b u s t

scaleableIntegrated

Table of contents

Considerations for a successful warehousing project . . . . . . . . . 1

Aligning technology with business objectives . . . . . . . . . . . . . . . 1

Perspectives on data marts and warehouses . . . . . . . . . . . . . . . 3

The state of standards . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

Process criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

Technology criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

Tool integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

Implementation tactics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

APPENDIX A—Elements of a warehouse . . . . . . . . . . . . . . . . . . 12

APPENDIX B—Case studies . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

Glossary of terms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

Note to the reader: This guide is oriented to both business managers exploring the potential of data warehousing for their organizations, and technical implementers who are seeking practicaladvice on implementing data warehouse systems.To those unfamiliarwith data warehousing terms we would suggest beginning withAppendix A,which reviews the major components and nomenclature.Business managers may wish to focus on the Criteria section (specifically the Business and Process Criteria) and the first few sections of Implementation Tactics. Readers looking for technicalinformation may wish to focus on Technology Criteria and theImplementation section.All readers may wish to look at the case studies that highlight IBM solutions for data warehousing.

Page 6: Guide for Datawarehousing

“The beauty of Visual Warehouse is that it automates data collection from all our operational systems, refines it, and feedsit to the data warehouse. There is no comparable product in themarket that combines all these functions in one box.”

–Serge Tremblay, Database Administrator, McDonald’s Canada

Mission, Vision,Goals

Alignment Technology

FFiigg.. 11

management of meta data (information aboutdata in the warehouse) stand out. Therefore,thought needs to be given to what the initialstrategy needs to be to ensure that an organi-zation truly benefits from the technology.Moreover, we all know we are living in a verydynamic and fast changing business environ-ment, where the only certainty is uncertaintyand change. Therefore, the strategy mustdeploy Warehouses such that they can growand adapt to the changes we know arecoming, even though we don’t today knowwhat those changes are. This calls for prudentplanning in developing the strategy to selectarchitectures, which can react flexibly tochanges in market dynamics, organizationalrestructuring, economic fluctuations, etc.

Aligning technology withbusiness objectives

Data Warehousing is a rapidly maturing tech-nology changing with every product release.There is no Rosetta Stone that will tell anyone organization what works and what doesn’t.Consequently, experience and evolution arethe best overall planning principles, whichcan be deployed by management today. Thereare a number of criteria, which should bekept in mind as strategies are developed. Thecriteria can be grouped into three categories:Business Criteria, Process Criteria, andTechnology Criteria.

Business criteriaThe first set of criteria have to do with the business problems at hand and what demandsare made on the project by the business dynamics encountered:

CCrriittiiccaall ssuucccceessss ffaaccttoorrss:: WWhhyy aarree yyoouu ddooiinngg tthhiiss??No technology project will ever succeed if it isnot properly aligned with the business Mission,Vision, and Goals. Therefore, it is vitallyimportant to understand such elements of thestrategy such as:

1

Considerations for a successful warehousing project

OverviewEnterprises today, both nationally and glob-ally, are perpetually seeking competitiveadvantages. It has become an incontrovertibleaxiom that information is the key to deter-mining how to gain such a competitiveadvantage. The problem today is how to dealwith the mountains of raw data which ourever more efficient information systems arecollecting, massaging, processing, deriving,and disseminating. We literally are in themidst of a volcanic eruption of data.Somewhere hidden in this explosion of data isthe clues management needs to define theirstrategic positioning in the market so as tomaximize their competitive stance.

Into this picture, technology has inserted theconcept of Data Warehousing as one alterna-tive to coping with the well-known informationoverload described above. DataWarehousesexist to help management and decisionmakers transform raw data into informationand to help management identify key trends.It helps the enterprise foresee predictableevents and act in anticipation of those eventsand helps management understand the entirepicture of what has already happened. It allowsthem to develop a good systemic understand-ing of events, thus allowing focused reactionsto those events, such as redefining andreengineering business processes to takeadvantage of that understanding.

A clear prerequisite to enabling managementto accomplish all of the above is the fact thatthe data to be analyzed has to be accessibleand flexible, and it has to be available in aformat that is usable by the requester. To date,too much emphasis has been placed on theraw technology which embodies the concept

of Data Warehousing, and not enough on theunderlying and concomitant strategy, plan-ning, business processes, and services whichdevelop, maintain, and use the DataWarehouse technology.

Experience has shown us that in projectswhere the technology has been perceived as afailure, the problem does not usually lie withthe technology itself, but rather with the wayin which the technology was applied. It isoften applied to the wrong business problem,at the wrong scale, with insufficient training,and planning, and with little or no thought tohow users need to access the data, etc. Someanalysts’ statistics show that over 50% ofwarehouse projects fail to meet their statedobjectives. In order to mitigate the risks asso-ciated in these projects, the project must workin close concert with the business communitythat will benefit from the warehouse and alsoneed to have a solid grounding from a finan-cial return perspective.

Data Warehousing technology can benefitenterprises at different levels and scales ofimplementation. This applies from depart-mental systems running on commodity plat-forms such as Microsoft NT, and databaseservers such as DB2® Universal Database™ forWindows NT, on up to the enterprise levelsystems running on enhanced parallel MPParchitectures such as IBM SP2 with DB2Universal Database. Database ManagementSystems play key roles in the long-term via-bility of Data Warehouses. Issues such as easyaccess to operational data, scalability, and

Page 7: Guide for Datawarehousing

• What is the problem at hand? Is it a problemrelated to cycle time, customer satisfaction,more cost-effective decision making, betterbusiness intelligence, or a general lack ofinformation with which to make decisions onany of the above?

• Which of the departmental or enterprise goalsand responsibilities are directly related to theproblem at hand?

• What are the Critical Success Factors—thosethings that must be done well to solve the problem?

• Which of the organizational components of the enterprise is best positioned to solve the problem?

• Which audience within this organization willbest use this technology and how: executives,financial analysts, scientists, engineers, clericaland administrative users, line managers,others and why do they need this (who will itbenefit)?

• How can this technology be used to solve theproblem? This is where the alignment of thetechnology to the problem will occur.

QQuuaannttiiffyyiinngg bbeenneeffiittssOnce the benefits of the technology have beenaligned to the business objectives, they shouldthen be quantified. The reason for this is thatmanagement must be able to answer the ques-tion: How will we know if this project is suc-cessful? The answer must fit the form: “Thisproject will be successful if it allows us toachieve the following goals...”

In many organizations, quantifying benefitstakes the form of a financial analysis, specifi-cally a Return on Investment (ROI) analysis.A study by International Data Corporation(IDC), co-sponsored by IBM, showed thatData Warehousing could provide significantand impressive ROI numbers. The study,which included 62 participants, demonstratedthat the overall ROI on warehouse projectswas 401% with payback periods of two tothree years.

What was interesting about the study, howeverwas that the smaller, departmental implemen-tations, sometimes known as Data Marts, hada 533% ROI, while the larger, enterpriseefforts showed an impressive 322% ROI.

The IDC study identified three kinds of bene-fits in the use of Data Warehouses.

• Cost avoidance benefits. These benefits werethe ability not to spend money that ispresently spent on generating endless reportsto end-users. This included the resourcesexpended by IT to generate answers on ad hocqueries. In many ways, Data Warehousing rep-resents a liberation for IT by providing userswith the tools they have needed over the yearsto generate their own reports. By allowing theusers to do so, one eliminates the often end-less loops of the user requesting a report fromIT, IT delivering the report, the user either notapproving the report due to some miscommu-nication or changing their minds after seeingwhat they asked for, etc.

• Efficiency gains from increased productivityamong end-user professionals who gather andanalyze data. The analyst who must stop ananalysis to get information and who has to asksomeone else to get the information loses effi-ciency in two ways. The first is in the loopdescribed above, where there might be sev-eral iterations of request and responsebetween the analyst and IT before the analystis satisfied with the results of the request. Thesecond is in the interruption of the analysis,and the inefficiencies associated with recover-ing the thought processes that were underway when the analysis was suspended.

• Warehouse dependent savings due to deci-sions based on analysis that could only comefrom data in a warehouse. This is a quality-of-decision issue that comes from the fact thatcertain data associations may not exist in any

one operational system and therefore are notavailable to the analyst. By building thoserequisite associations and relations in theWarehouse, a situation where the whole isgreater than the sum of its parts occurs andthose relationships allow the analyst to dotheir work better, and become more effective.This is not a case of people making bad deci-sions prior to the advent of the system; this isabout giving people better tools to empowerthem to do better work.

Not all benefit quantifiers are in terms ofROI. A recent article described how firmsoften eschew formal ROI analysis becausethey consider the data warehouse a strategicinvestment. In this case, these organizationswere convinced prima facia that the benefitswould be worth the costs. A cautionary word,however: make sure that there is an under-standing of the projected costs before startingthe project if there is no formal financialmeasurement technique such as ROI.

Regardless of the culture of the organization,whether it accepts soft benefits in its approvalprocess or not, be sure to quantify the antici-pated benefits in business terms. Without this,the organization will have no metrics to deter-mine whether or not the project is successful.

2

DataWarehouse

DataMart

600%

400%

200%

0%

Overall

ROI

FFiigg.. 22

Page 8: Guide for Datawarehousing

3

Perspectives on datamarts and warehouses

The IDC finding would lead one to believethat since the data marts delivered the high-est ROI, a viable strategy could be to alloweach department to implement their own datamarts independently. The problem with thisthinking is that it ignores the issue of cross-functional analysis. Let us examine whathappens when Finance and Marketing eachdevelop their own data marts independently ofeach other. Let us further assume that eachkeeps track of sales, but defines them differ-ently. For example, marketing defines a saleas a booking while finance defines it as a pay-ment. What are some of the potential ramifi-cations of this situation? A primaryconsequence is that analyses of the samebusiness dynamic—sales—may result in twodepartments having significantly differentviews and analyses. More importantly, sup-pose finance needs something from the mar-keting Data Mart, or vice versa. Without aunifying design or data standard, the analysisof data that spans two or more organizationalData Marts will not be possible using thisstrategy. It is therefore important in develop-ing a Warehousing strategy to understand theimplications of an independent Data Martstrategy and factor these risks into its formu-lation. A more detailed discussion of indepen-dent vs. dependent Data Marts is presented inthe Process Criteria.

“The heat is on…to get warehouses up and running fast” (but) “The biggest benefitcomes down the road, when you can support20 different decision support applicationswith the same architecture”—GartnerGroup.

This quote from the GartnerGroup highlightsthe need to develop an overall strategy andarchitecture for the organization prior to or atleast in parallel with the development of thefirst Warehouse project. At the beginning ofthese projects, everyone is in a hurry to reapthe immediate benefits of the systems andbecomes impatient with the planning process,which is sometimes seen to slow things down.

However, experience tells us the payoffs willbe that much greater, and subsequentimprovements will come that much sooner ifa sufficient amount of time is invested at thefront end of the project in defining an overall

enterprise architecture into which the DataMarts can connect.

Understanding productintegration issuesThe current maturity level of the DataWarehousing industry leads to a proliferationof many disconnected offerings in the mar-ketplace. Many small vendors have joined thefray with products targeted at one or two ele-ments of the Data Warehousing architecture.As a result, there are very few offerings in themarket, which answer all of the needs of apotential end user. This leads to a risk regard-ing how well different packages will integrate,which common platforms are supported, andso on. A trend is emerging where a number oflarge vendors form integrated product teamswith smaller vendors that hold a solid solutionin an important warehousing niche. Forexample if a large vendor had a great DBMSsolution, they might team with product ven-dors with extract technology, data cleansingtechnology or specialized OLAP tools. Onebenefit of these vendor consortiums is toagree on common approaches to sharing metadata between tools and databases from differ-ent vendors. Meta data, with its global signifi-cance, is a key to product integration incontemporary data warehousing solutions.The “investment” of these vendors integrat-ing products from different vendors benefitsend users by resolving the tricky and riskyissues associated with integrating productsfrom different vendors.

Analytical applicationsOne of the fastest growing trends in the DataWarehousing market is the emergence ofpackaged analytical applications. The abilityto script data analysis applications to deriveparticular end results is paramount to theevolution of effective data warehousing solu-tions. In this regard, early data warehousingsolutions were all one-off customized applica-tions built to deal with particular businesschallenges. This continues to be an importantpart of data warehousing activities. But inaddition, as applications and business needsbecome more generalized and as customapplications begin to be widely deployed,there is a viable place for sets of off-the-shelfanalysis applications that can be deployedquickly and effectively to meet many commonbusiness problems. These off-the-shelf

applications also include the capacity forsemi-customization and are the trend of thefuture. The costs of these applications shouldcontinue to drop, while the real value will continue to grow.

Dealing with cultural issuesData Warehousing is about pooling resources(data), which implies sharing, which in turncan imply a loss of control, a concept some-times inimical to many data owners. Thiskind of organizational provincialism cansometimes throw up impediments to aWarehousing project and must be dealt within the early phases of the project.

Business unit and process considerationsTechnology is only useful to the extent that itsupports our ability to carry out our assign-ments and achieve our corporate goals.Therefore the introduction of any new technology must be aligned with the businessunits and processes which it is intended tosupport. The IT organization may or may nothave all the requisite technical skills, but ITwill not successfully implement a warehouseproject without the involvement and commit-ment of the business unit. Too often, technology is developed independently of anybusiness process considerations, many timeswith catastrophic results.

Choosing technical servicesVery few, if any, IT organizations have therequisite combination of skills and resourcesrequired to perform all of the technology,planning and implementation tasks requiredfor successful Data Warehouse projects. Thisis not intended as an affront to IT organiza-tions, but rather a simple observation that itwill take a broad spectrum of talent whichcrosses many disciplines to make this work.The chances that a single IT organization willhave all of these talents available for this pro-ject at the same time are small. Therefore, atsome point in time, many organizations willneed to locate a partner to consult in thetechnical planning for the Warehouse andthen eventually assist in the implementationof the system. The partner selected must beable to operate within the constraints enu-merated in the organization’s Warehousingstrategy, including the methodology chosen,

Page 9: Guide for Datawarehousing

4

implementation style chosen (see ProcessCriteria - Methods), and so on. Choosing the wrong partner, for example one who hasData Warehousing experience but not DataMart experience, or one who does not haveexperience across the entire spectrum ofproducts and services, can increase the risksassociated with these systems. For example,IBM offers specific services in conjunctionwith its Visual Warehouse solution as well as custom services for any scope of warehouse implementation.

The state of standards

OMG committee for commonwarehouse meta dataIBM, in conjunction with Oracle and Unisys,is sponsoring an OMG (Open ManagementGroup) subcommittee for the standardizationof Common Warehouse Meta Data. The objec-tives of this committee are to establish anindustry standard for common warehousemeta data interchange and to provide ageneric mechanism that can be used to trans-fer a wide variety of warehouse meta data.The intent is to define a rich set of warehousemodels to facilitate the sharing of meta data,to adopt open API’s (Java and Corba) fordirect tool access to meta data repositories,and to adopt XML as the standard mecha-nism for exchanging meta data between tools.The subcommittee, chaired by IBM, is in theprocess of accepting vendor proposals for theabove objectives.

OMG committee for XML/XMIA related OMG subcommittee has beenformed to standardize XML Meta DataInterchange (XMI). IBM, Unisys and otherindustry leaders are also involved in thiswork. IBM and Unisys have submitted a pro-posal co-submitted by Oracle, DSTC, andPlatinum Technology and supported bynumerous other vendors. The proposal for anXML Meta Data Interchange Format specifiesan open information interchange model thatis intended to give developers working withobject technology the ability to easily inter-change meta data between modeling toolsand between tools and meta data repositories.In a data-warehousing context, the proposaldefines a stream-based interchange formatfor exchanging instances of UML models.

One of the inherent risks of a new technologyis the lack of standards, and Data Warehousingis no exception. There are countless examplesof competing technologies that resolved them-selves into one standard, and most of the timethe resolution creates winners and losers.Eight-track tape owners were losers in thetechnology battle with cassettes, Betamaxowners lost against VHS in video recording,CPM lost to MS/DOS, and the list goes on.Obviously, one risk mitigation strategy in thisarena is to align the project with a big playerin the industry—one that will have an influ-ence in determining the winning standards.Another innovative mechanism which wasused in a major bank IT shop was to includein the cost/benefit analysis an estimated costto bury the existing system and replace it witha new one in case the wrong decision wasmade. If the project still made sense afterincluding this cost, then the bank would goahead with it in spite of the standards risk.

Lack of attention to trainingFor many years, exposure to the world of thedatabase was limited to the inhabitants of theglass house—the IT department.Consequently, many end users are not familiarwith the concepts behind navigating a dataschema or unraveling the mysteries of joiningtables via keys to get queries answered. A lib-eration of sorts will ensue from allowing usersaccess to their own data. However, the usersmust be prepared for life in this liberation andmust be ready to accept the responsibilities ofrunaway queries, etc. Therefore, it is incum-bent on management to make sure that ade-quate training is provided to users to allowthem to use the system effectively. They haveto use the system without getting so frustratedthat they give up, or worse, poison the projectby maligning it to others. They cannot bringthe system to its knees by constructingqueries, which run forever, or base an analysison faulty data because the user was not famil-iar enough with the system to understand theinformation for which he or she was asking.

Process criteria

A number of elements of the Warehousingstrategy have to do with processes: processesby which the strategy is implemented, andprocesses which are supported by the overall strategy.

Scope of effort: How bigshould it be?Many large warehouse projects have failedbecause of an inability of the organization tohandle the size and scope of the project. It istempting to think of a single repository whereall of the enterprise’s data problems can besolved in one fell swoop. And, if the organiza-tion can indeed come up with an integrateddata model and solve all of the issues associ-ated with such architecture, the benefits areindeed significant. However, this is some-times not realistic and not necessary for theproblem at hand. Industry studies have esti-mated the average size of a large warehouseproject to be nearly two million dollars1 witha time to completion measured in years. Nodoubt, some business problems require theintegration of data from many systems andwill require a global strategy. Many otherswill not, and a simpler strategy, one of DataMarts, will likely be less risky.

This question of project scope should bereadily answered from the exercise describedearlier on aligning the technology to the cor-porate Vision and Mission. Once the questionsrelating to who needs the technology andwhich problems are being addressed areanswered, the scope should be relativelystraightforward to determine, which shouldallow management to allocate appropriateresources to manage it.

A strategy, which entails an enterprise widescope, has certain implications associatedwith it, which should be understood. First andforemost it requires integration and coopera-tion among multiple organizational elements.Many issues will arise regarding different def-initions of similar or identical terms, competingobjectives and agendas, data parochialism andan unwillingness to give up control. This is asituation, which can be difficult to manageand successfully navigate. Oftentimes, changemanagement is necessarily intertwined withthis exercise, since the organization will haveto wrestle with inter-departmental issues as

Page 10: Guide for Datawarehousing

5

described above. Management must assesswhether or not the organization is ready todeal with these kinds of issues, or whetherthey might not better wait until a more appropriate time.

A Departmental, or Data Mart approach is bydefinition smaller in scope, more focused inits outcome, quicker to achieve, less costly.However, there is a risk to developing DataMarts in a vacuum, as described in moredetail in the method section below. Ideallythere is a need to think globally about futureintegration with other departmental applica-tions and Data Marts to avoid developing Data Islands.

Data Marts have their place and present astrong business case for starting with such animplementation, but if an organization deter-mines that an enterprise warehouse is theappropriate strategy then there are many suc-cessful models to emulate.

Implementation approach optionsDeciding whether a Data Warehouse or DataMart is right for the enterprise is an appropri-ate beginning. It must, however, be followedby a decision on whether to buy an integratedpackage from a single vendor, engage a systemsintegrator to bring together a collection of bestof breed products, or have the enterprise’s ITdepartment create a best of breed solution.

The obvious advantage to dealing with anorganization which can offer a complete solu-tion is, of course, faster realization of businessgoals, usually at a reduced cost, and with agood degree of certainty that the ultimatesolution will work (lower technological risk).These benefits do come at a price, though,and that price is the potential compromisesthat have to be made in functionality and performance by accepting the full suite ofproducts from a single vendor. Further, theenterprise may have to adapt businessprocesses to fit the specific characteristics of technology sourced from a single vendor.

The best of breed concept is certainly not new.It has as its foundation the tenet that theenterprise will be better off if it can somehowbring together the best extraction tools, data-bases, SMP/MPP hardware, disk drives,analysis tools, network, etc. and get them tofunction as a unified system. Aside from thedifficulty and religious wars that accompanythe attempts to define best, the price paid forthe anticipated exceptional performance isprimarily the pain associated with integratingthe disparate components. Different vendorsvalue different architectures and functionalitycharacteristics, and the best extraction tools,for example, may not integrate well with thebest meta data repository, and so on. Manyimplementations of components which “should”integrate, are advertised as “compatible”, butrequire extensive work to get all details towork for an organization’s application. This

kind of scenario is especially significant intechnologies which are young, and in whichstandards are not yet entrenched. DataWarehousing meta data standards are stillevolving, although IBM, in conjunction withOracle and Unisys, is sponsoring an OMG(Open Management Group) subcommittee forthe standardization of Common WarehouseMeta Data. This will result in a rich set ofwarehouse models to facilitate the sharing ofmeta data. Another OMG subcommittee hasalso been formed to standardize XML MetaData Interchange (XMI). These standards ini-tiatives will go a long way to resolving theseimportant issues. Most integrators find thatall projects require compromises to achieveintegration, although some level of custom fitwith the enterprise is generally achieved.

There is a second, subtler price associatedwith this custom fit, and that is what to do inthe future about upgrading to later releases.Assume, for example, that an integrator com-bines extraction tool A with meta data reposi-tory B, database C and analysis tools D.Assume further that all of the products, (A, B,C, and D) are at release 1.0. The integratorfinishes the job, and the customer is satisfied,and some time later extraction tool A movesto release 2.0, three months from then theanalysis tool moves to release 2.0, and so on,each release bringing with it features andfunctions the organization would like toincorporate. The challenge is: how to do that?Are these independent release updates com-patible with each other? Will the resultantsystem be backward compatible with the orig-inal system? If the organization continues torely on the integrator to maintain a system atthe latest best-of-breed status, this is tanta-mount to a full employment act for the inte-grator, and certainly of dubious value to thecustomer. On the other hand, a customer whohas purchased an integrated package from asingle vendor can put the onus of compatibil-ity and upgrading on the shoulders of thevendor, and upgrade the warehouse to thenext release by buying a single upgrade.

The third option, that of using the enterprise’sown IT shop to integrate the package is prob-ably not a viable option except for very largeand sophisticated organizations. Few IT shopshave the expertise to integrate “best of breed”components, and fewer still have the availableresources. Remember that one of the reasonsto build a Warehouse is because the IT

DatabaseDatabase Database

DataStore

DataStore

DataStore

FFiigg.. 33

Page 11: Guide for Datawarehousing

Common Meta data

DatabaseDatabase Database

DataStore

DataStore

DataStore

Data Loading Architecture

Common Access Architecture

FFiigg.. 44

6

department is buried with requests from userswho are already frustrated with the inabilityto get answers to their mission-critical plan-ning questions. The idea of taking an entireteam of IT specialists out of the front lines formonths (or years for a global warehouse), andsimultaneously maintaining or reducing thebacklog could be a formula for disaster.

Top-down and bottom-upapproachesHow should an organization approach thedesign and construction of the Warehouse?Should it go for a top-down technique, settingup an enterprise-wide architecture and thenconstructing Warehouses or Data Marts thatconform to that architecture? Or perhapstake a bottom-up philosophy, starting right inwith highly focused and targeted Data Martprojects aimed at specific critical areas of thebusiness? The issues here have to do withdeciding between addressing short-term tac-tical requirements to help individual depart-ments and long term strategic planning issuesregarding data architectures that have to cutacross political and organizational boundaries.

The Top-Down approach can yield the bestlong-term results, but it also invokes the mostangst within an organization. It is very diffi-

cult, expensive, and time consuming toachieve consensus on a single, consistent,accepted and valid view of the business, thedata it needs, etc. In the prior example of theMarketing and Finance departments, two dif-ferent operational definitions for “sales” weredeveloped. Which of the definitions will beused in the Warehouse, or will there be twoterms that must now be defined? If so, theycannot both be called sales, and the metadata had better be clear as to what the finaldetermination became. It is clear that manymore change management issues will have tobe dealt with in this approach, as organiza-tions grapple with the sins of the past in nothaving promulgated an accepted data stan-dardization program, etc.

The Bottom-Up approach favors the use ofsmaller, more focused applications ofWarehouses that can avoid the pitfalls of theTop-Down approach by simply limiting theextent of the implementation. This approachalso exhibits simpler data archaeology prob-lems: there are usually limited data sets, lim-ited user views, a good understanding of thedata needs and how they relate to the businessproblem. In its purest form, this approachtrades the near term pain of dealing with datastandardization issues for the longer terminability to operate cross-organizationally.

In this approach, each department is respon-sible for extracting whatever data they need,defining their own meta data and using theirown private Warehouses for decision supportat the departmental level. Three obviousproblems arise: (1) this architecture is difficultto scale up to an enterprise view; and (2) thelack of standardization prevents analysts inone organization from accessing informationfrom another organization’s Warehouse whichmight be of use to them in their analysis. And(3) they may derive different answers to thesame question—e.g., what were sales lastmonth. This is demonstrated by the brickwalls inserted between departmental systemsin Fig. 3 preventing interactions. The primarycause of this is the Warehouse Tower of Babelsyndrome inherent in the underlying philoso-phy. Each mart can create confusing, overlap-ping and contradicting views of the business,like the proverbial six blind men trying todescribe an elephant, each only being able torelate to the elephant according to the portionof the animal they were feeling. What is acustomer? a product? a sale?

This approach works if the organization has abusiness problem with a single focus and thedata to solve that problem exists in only a fewplaces, with no political ownership issues.

Some organizations use a hybrid approach togain the speed and cost advantages of thehighly focused departmental approach, yet atthe same time making sure that the imple-mentation is consistent with the overall goalsof the organization.

This approach uses the principles developedin Rapid Application Development (RAD)methodologies and intentionally delivers iterations of the departmental Warehouse,attempting at each iteration to come closer to an overall enterprise data model and data architecture.

This allows the organization to take advantageof the speed and cost savings of the smallerapproach while at the same time mitigatingand eventually overcoming the lack of inte-gration and islands of automation problemsinherent in this approach. Other organiza-tions are using the smaller implementation asa proof of concept and prototype/pilot instal-lation. This assists in proving the benefits ofthe technology on a smaller scale, and smallerrisk, and then scaling the solution into a more

Page 12: Guide for Datawarehousing

7

global implementation, either by buildingmore integrated departmental Warehouses(see Fig. 4) or by moving to a full global enterprise Data Warehouse implementation.Any of these approaches can work for yourorganization. What is important is that you areaware of the issues associated of each andactively mitigate the cons.

Technology criteria

The technology dimension will of course playa major role in the enterprise strategy.Different strategies will require different tech-nological characteristics and features. Just asthe technology must align itself with the busi-ness mission, the strategy must also considerthe technology.

ScalabilityScalability refers to the ability of a system toincrease in capacity as users demand more, asdata stores grow, as more users are added tothe system and as more applications are devel-oped against the Warehouse.

One of the challenges associated with intro-ducing new technologies and new applicationsis that of determining the actual system loadafter implementation. JAD (Joint ApplicationDevelopment) sessions attempt to mitigate therisk by attempting to produce a picture of userrequirements, but the truth is that users whohave never had the opportunity to work with anew technology don’t really know what to askfor. Further, they don’t know ahead of timewhat kinds of demands they will make on thesystem until they actually sit down and startworking with it. Therefore, as the users becomefamiliar with the query capabilities and thenavigational issues, their own success willcause them to demand more from the systemas word spreads and more users exploit thefeatures of the system. In time, users willbecome more sophisticated and begin explor-ing with ideas of data mining and visualizationas their analyses become more complex. All ofthese are factors that demand scalability in asystem. Companies acquiring warehousingtechnology need to be assured that scalabilityis “built-in” to the architecture.

ManageabilityData Warehouses require the developmentand implementation of new processes, tools,and work systems to manage the extraction/transformation of operational data, theadministration of users (adding, deletingusers, changing access rights) and so forth. Inthe course of deploying a Warehouse, anumber of operational management decisionsmust be made and supported by the productsolution set:

• What is the relationship and meaning of thedata being loaded to the intended businessuse? How frequently should data loads andtransformations be made? Should they bedaily, weekly, monthly, quarterly?

• How will the Warehouse reporting adapt tothe business processes as they change overtime? How will the system model and trackthe business?

• How much data needs to come in on eachload, and how long will it take for the systemto recompute all the indexes, meta dataupdates, and other administrative details thatmust be undertaken? How will the systemmonitor database operations?

• How will the system deal with backups/restores and what should the process be toadminister a disaster recovery plan?

• If the system is to be paid for by usagechargebacks, is there a mechanism for thesystems administrator to keep track of usageby account number or password, and are therereports available to facilitate this feature?

PerformanceHow well the system performs will be theultimate arbiter in the success or failure ofthe project. The intent of the Warehouse is tohelp people do their jobs more effectively andefficiently. If the response time is not ade-quate, users will not use the system, theenterprise will not derive any benefit from theexpenditure, and the project may wind upwith a negative ROI. Therefore the technol-ogy dimensions of performance must be thor-oughly considered in developing a strategy.

Over the years, there have been a number ofstudies to determine the limits of humanpatience in dealing with computer responsetimes. In general, users want to see some-thing back in a timeframe that does not inter-rupt their thought processes. Some have saidthat two to five seconds is a good targetresponse time. But when we are dealing withsuch gargantuan database sizes, it is difficultto conceive of doing a table scan on a multi-million row table in that timeframe, whichleads us to the second human factors point.That is the fact that the amount of time a userwill wait is proportional to the perceived diffi-culty of the procedure requested. Therefore,if a user knows they have entered a particularlynasty query, they will be more tolerant ofdelay. The best advice on this subject is to workwith the user community in establishingmeaningful metrics and working cooperativelyto set and meet expectations on both sides.

There are several areas that directly impactthe performance of a system: the hardwarearchitecture and the database architecture.The hardware dimension is simply the use ofSymmetric Multi Processing (SMP) andMassively Parallel Processing (MPP) architec-tures. The top end of the MPP capacity doesoutstrip the top end capabilities of SMP.However, the applications where this kind ofperformance is necessary are few and farbetween. For this reason, many industry ana-lysts are predicting that SMP will be thehands down winner in the Warehousingmarket. This does not mean that every ware-house implementation should be on a paralleltechnology of one sort or another, many appli-cations can perform satisfactorily on non-par-allel systems.

Some database vendors have followed thehardware architecture by introducing paral-lelism into the database functions. For example,IBM has continually enhanced products suchas DB2 and it now provides capabilities suchas parallel query, loads, joins, scans, and utility processing providing tremendous value.

Page 13: Guide for Datawarehousing

Meta dataExtractionTool

Common

Format

FFiigg.. 77

Translator Meta dataExtractionTool

FormatA

FormatB

Meta dataExtractionTool

Common

Format

CommonInterface

CommonInterface

FFiigg.. 55

FFiigg.. 66

8

Flexibility If there is one thing that distinguishes marketconditions today it is the pace of change. Infact, as pointed out in the introduction, one ofthe drivers leading enterprises to considerWarehousing technology is the need to keepup with that change. Deployment of aWarehouse will not alter the pace of change orthe need to keep up with the change, whichmeans the Warehouse technology itself mustbe flexible, allowing for rapid responses tochanging conditions. The Warehouse must beadaptable to changes in the enterprise, suchas reorganizations, mergers, and acquisitions.It may be necessary to implement new queriesbased on responding to a competitors productintroduction—queries not envisioned in theoriginal design. If the database needs to beredesigned, reloaded or indexed, this couldmean a significant delay in responding to the competition.

Ease/speed of implementationThe development and maintenance toolsavailable with the Warehouse will be key tothe success of the project. Are the tool setstightly integrated, as is usually the case in aone-stop-shop solution, or will the develop-ment team have to wrestle with the tools aswell as with the disparate products to makethem work together? Are the user interfacesgraphical or the old command line metaphor?Do the development tools automatically gen-erate the meta data content, or will there be asecond step to build the meta data and then athird to reconcile the mistakes made in doingthis manually? These considerations apply notonly to the initial deployment of the Warehouse,but also speak to the flexibility, since changesto the Warehouse will likely at some pointinvolve the use of these tools again. A tradearticle in Datamation put it best: “Havingtools that are well and deeply integrated witheach other is a necessity, not a luxury.”

Tool integration

Integration deals with how well and howsmoothly the different architectural compo-nents interact. Obviously these are subjectiveterms, and therefore there are degrees ofintegration in many different areas such asadministrative interfaces, Warehouse management, problem determination, etc.

Tight meta data integration would yield bene-fits from needing only to input meta dataonce and, once operational, preventing metadata from becoming out of synchronization.For example, an extraction tool and a metadata repository could interact according toseveral different models as follows:

• In the first model, each component has its ownstructures and syntax for the meta data and athird component is interposed between themto effect information transfer between them.

In Fig. 5, a translation module changes theformats of the information from one componentto the other. Therefore, in this model there isno direct integration between components.

• A second option (see Fig. 6) is to have eachmodule maintain different internal structuresand syntax, but to have a common definedinterface, such as a meta data standard. Inthis case, the accepted transfer mechanismand translation is built into each module.Compromises must still be made because theinternal structural differences may yet inter-fere with some functions, but overall theinteraction between the components is facili-tated by the acceptance of a common mecha-nism for information interchange. Thissituation is considered to be moderately wellintegrated, and in fact do have an acceptedarchitecture that each component follows toeffect this integration.

• Finally, on the other end of the spectrum aretools that use the same structures and syntaxinternally as well as present a common inter-face to each other (see Fig. 7). These compo-nents are completely integrated. The morevendors involved in the solution, the morechallenges exist integrating them—somesingle vendor solutions make this option rela-tively achievable.

The degree of integration of the Warehouse isan important consideration in developing anarchitecture and an implementation strategy.First of all, the Warehouse must integrate withthe existing architecture and infrastructureof the organization. Reverting to the threemodels discussed earlier, if the Warehousecan only integrate with the existing architec-ture by means of extensive interposition ofcustom code, the project will be very expensive,lengthy and complex. Questions must beasked about the degree to which the proposedWarehouse must conform to existing supportedoperating systems, networks, data standards,existing databases, existing application development environments, and more.

If the Warehouse supports common, openarchitectures, such as in the second model, thelikelihood of being able to add analysis func-tionality later, such as data mining for exam-ple, will be higher. Open architectures alsomake customization of products easier due tothe more regular and predictable nature of theinteraction among architectural components.

Page 14: Guide for Datawarehousing

9

With regard to the existing operational appli-cations from which the Warehouse will extractits information, the tighter the integrationwith all data sources, the better. Rememberthat some of the systems in the enterprise willhave flat files or older hierarchical or networkdatabase architectures which can present achallenge. Which of the three models of integration are required for the enterprise?

Data can be extracted from the databases indiscrete batches or continuously, on a transac-tional basis. Depending on the Warehouseapplication, either of these may be preferableor both may be required. Do extraction toolsexist for both cases, or will there need to besome special code written? Finally, can theexisting Warehouse automatically extract por-tions of the meta data from the existing opera-tional systems, or will there need to be specialsoftware or processes employed to do so?

How easy will it be to integrate the Warehousewith existing operations, including adminis-trative activities? Data extracts, cleansing,transformation and loading should not bedone manually if it can be avoided. Not onlydoes this introduce more cost in labor, but itcan also interject too many vulnerabilities forerror. Understanding all of the dimensions ofintegration with your existing environment andwhat is possible to automate is a must.

The Warehouse should also integrate withenterprise standards, both de jure standards,those promulgated by officially sanctionedbodies such as ANSI, OMG, the Meta DataCouncil and de facto standards that are practices and products generally accepted inthe industry.

CompletenessThe completeness of a solution talks to theexistence of all the architectural components.This means that all of the parts an organiza-tion needs to work are in place and functional,from extraction and transformation to storageand meta data, to the analytical, management,and change processes required.

Many large vendors, particularly IBM, haveinvested significantly in their core technologyand have evaluated the issues surroundingWarehouse technology and implementation anddeveloped partnerships and processes that

address the issues discussed in this section. Thecase studies that follow are good examples thatillustrate many of the points just described.

Implementation tactics

Following is a high level guideline for imple-menting successful Data Warehousing projects.

Planning the projectMany organizations are in such a hurry toinstall a system that they tend to gloss overthis vital step. To paraphrase what LewisCarroll’s Cheshire Cat told Alice when she toldhim she didn’t know where she was going, ifyou don’t know where you’re going any road willget you there. The corollary, of course, is thatif one does have a destination certain, then thechoice of roads is important, and one had bestspend some time setting it up.

Gaining commitment at topSenior Management Sponsorship is crucial inthese projects, and the level of support is com-mensurate with the scope of the project. If theproject is Departmental in scope, then thesenior Departmental leadership must be onboard. Obviously support even higher will onlyhelp, but be careful of bypassing chain-of-command positions which could turn a poten-tial ally into a snubbed detractor.

Forming a user/IT partnershipIn today’s dynamic environment in which datawarehousing solutions are becoming keyingredients to business success, users arelearning nearly as much about their datarequirements as OLTP users. Therefore, it iscritical to form a team where both technicaland end-user personnel can work together todevelop a mutually acceptable and technologi-cally achievable set of specifications andrequirements. This should result in continuouscollaboration throughout the project as manywarehouse projects can be viewed as a discov-ery process where the business perspectivemust be weighed alongside the technical issues.

This is a good area to use RAD (RapidApplication Development) techniques to allowtechnicians to demonstrate to end-users thepotential capabilities of the system and to

allow users a tangible feedback mechanism.

One implementation of Data Warehousingtechnology advocated the use of businessmanagers as well as end users. The businessmanagers can not only talk about what’s donetoday in the existing processes, but are in abetter position to articulate the business visionto both the technicians and the end users.

DDeetteerrmmiinniinngg MMeettrriiccss ffoorr SSuucccceessss ooff PPrroojjeeccttHow will the organization know that the pro-ject was successful? If there is an intention ofusing the original project as a provinggrounds for an enterprise-wide rollout, howcan there be a rollout if there is no yardstickfor measuring the results of the pilot? As partof the planning process it is critical that met-rics be defined which clearly demonstrate thatthe results of the project meet and are alignedwith the original business goals which drovethe project to begin with. It is also imperativethat the metrics be objective rather than sub-jective.

The project plan This step is anathema to many technologists,but a project plan detailing the objectives,approach, strategy, ownership, timeframe, andresources’ responsibilities is a must if the pro-ject is to be managed with any degree of pro-fessionalism and efficiency.

Identifying the areas of expertise requiredOne of the outputs of a good plan is identifica-tion of the resources required and an analysisof whether they exist in-house and whether ornot they are available. Make sure they are rep-resented on the team. Having senior manage-ment support is a good prerequisite for beingable to get the people you need, and if they arenot available, for being able to get permissionahead of time to go outside (hire serviceproviders) if necessary.

Developing a communications planA Warehouse is only useful to the organizationif users exploit its abilities. It is foolhardy tospend effort on developing such a project onlyto spring a surprise onto a department orenterprise which is not ready to take advan-

Page 15: Guide for Datawarehousing

10

tage of it because they didn’t know it wascoming, or when it would be ready, etc. Acommunications plan is essential to dissemi-nate information about the project.

Using benchmarking techniquesIf possible, investigate other organizations thathave successfully completed projects similarto the one at hand. This can take the form ofliterary research or actual trips to view opera-tional systems, interview users, developersand management as to the best practices theyhave encountered in their project. Rememberthat not all practices that work in one envi-ronment are universally portable, so makesure that the context in which a particularpractice is said to work can be extended tothe target environment.

Selecting a methodologyA methodology that is known and accepted bythe organization will go a long way to smooth-ing out many of the rough spots which anyproject will hit. A methodology provides threecomponents to a project:

• A logical series of activities to achieve adesired end. This structure will tell the orga-nization what has to be done and when inorder to produce a quality product.

• A definition of deliverables associated witheach activity and progress reports on the busi-ness benefits from each activity. This tells theorganization what the output is of each step.

• Roles and responsibilities for actors in the activities.

The Methodology will also help define a pro-ject structure and what has to be done tomanage the project: when reviews are to beheld and what each review will cover. Onetrap to avoid here is the one of putting some-one in charge who is too narrow in his or herscope. For example, a technologist who has noappreciation for or understanding of the busi-ness problem, or a functional specialist whohas no patience for technological issues.There has to be a blending of business andtechnology in order for these projects to succeed, and therefore the lead people have

to be sensitive to the various cultural differ-ences of the team members. As the DataWarehousing Institute said in a recent publi-cation: “Data Warehousing is a service business, not a storage business.”

Using service providersAny organization will have areas where thereis either a lack of expertise or where there areinsufficient skilled resources available for theproject. The project manager should look forareas of expertise which are not representedon the team. In many IT shops, the most sig-nificant missing element where consultingservices might be required is the planningelements. On the business side, organizationswill need to adapt to the new opportunitiesthe technology provides. For example, report-ing and analysis, prior to Warehouses, wasvery “flat” file oriented—run a report and lookat it on paper—run another report and get iton paper six weeks later. With Warehousetools such as OLAP, reports become three-dimensional and you pivot and drill throughthem searching for information. Designingthese reports with little or no prior experienceis very challenging in that you must thinkdifferently. Users will need support in learn-ing, specifying and evolving the output ofdata warehouses.

Architecting the solutionArchitecture is defined as the definition of thecomponents of a solution and their interaction(see Fig. 8). This guide has delved into severalareas where the benefits of a well-defined andintegrated architecture have been demon-strated. Defining the components of the solu-tion and how they interact is a critical step inimplementing a successful project. The solutionshould be an end-to-end solution and allow forall the characteristics described earlier, includ-ing scalability, extensibility and manageability.

The choice available to IT range from tightlycoupled and integrated product “suites” provided by some vendors (e.g., IBM VisualWarehouse™) or individual/groups of buildingblocks of Warehouse products that IT wouldtake the primarily role in integrating to meettheir needs.

Defining a directory such as IBM VisualWarehouse Information Catalog into thearchitecture will allow the system to drawbusiness meta data from dozens of other prod-ucts, including DB2, Oracle, Sybase, HyperionEssbase, CASE tools, and more. Tools such asthe Information Catalog can help provideusers with the keystone component of consis-tent, synchronized meta data that helps developers, maintainers and users alike.

Design

Develop

Pilot

Deploy

Train

Operations

Architecture

FFiigg.. 88

Page 16: Guide for Datawarehousing

Designing the systemDesigns should be based on open, well-understood architectures with well-definedinterfaces between the components. Use ofstandards will help the scalability and flexi-bility of the system over time. Navigationaltools for users and user interfaces in generalshould be designed with the business prob-lem in mind in a cooperative process with endusers and managers.

Departmental systems should be designedwithin the context of an enterprise framework.That is, where possible and practical, identifydata elements and concepts that span multipledepartments and try to define them in thebroadest possible terms so as to be able toinclude other departments later on.

Developing the systemThe development environment should becapable of using RAD techniques to help endusers who might not be able to graspWarehousing and analysis concepts immedi-ately. Use iterative prototyping with timeboxing and multiple releases where necessaryto gain consensus from the user community.

One of the challenges associated with build-ing a system is the ongoing integration ofmeta data. The value of integrated meta datais that it reduces the time to implement aWarehouse, as well as providing greater main-tenance and management efficiencies. Whenmeta data is updated manually this processcan introduce increased error rates for endusers (e.g. defining tables incorrectly). Onesolution is put forward by the Meta DataCouncil; a corollary to the OMG (OpenManagement Group) subcommittee for thestandardization of Common Warehouse MetaData and the related OMG subcommittee tostandardize XMI. It is a federated approachwhereby all stores of meta data type informa-tion (data dictionaries, passive repositories,encyclopedias, data base catalogs, etc.) passthrough a “metahub” that changes the syntaxand other relevant information about thedata. This allows any tool that adopts thisapproach to interoperate and interchangemeta data within and around the Warehouseon an ongoing basis. This approach is not yetwidely available but holds great promise for resolving one of the major issues in meta management.

11

Piloting the implementationPilot systems, also known as prototype systemsor proof of concept systems are among themost misunderstood concepts in the industry.There are three possible reasons for an orga-nization to embark upon a pilot program:

• The technology is foreign to the organization,and there is a need to understand the benefitsof the technology to see how it might help inthe business problems at hand.

• The technology is understood, but finding anappropriate application is not. The organiza-tion wants to know if a particular applicationof this technology to a business problem isappropriate, which is to say, will this technol-ogy solve the problem?

• The technology is understood as well as theapplication, but the cost/benefit equation isnot understood. The organization wants toknow if the cost of the solution is worth it.

Each of these motivations will result in differ-ent pilots, in different places in the organiza-tion and with different associated metrics.

Regardless of the motivation, however, a pri-mary question that must be asked is: How willwe know if the pilot answered our questions?The only way is to develop quantitative as wellas qualitative metrics and in addition developtools and techniques for capturing and ana-lyzing the results at the end of the pilot program.

Deploying the systemIf the strategy calls for replicating successfulprojects in other departments, then a roll outplan must be developed to identify the orderof the rollout as well as any integration effortsthat must be dealt with. These might includeprocess reengineering, especially if multipledepartments are today involved in a processthat is not automated, and one of thesedepartments will receive the automation priorto the others.

Training usersOne sure way to snatch defeat from the jawsof victory is to develop a technically outstand-ing Warehouse and then let users loose on itwith no training. That which is intuitive to a

technologist steeped in Warehousing conceptsis gibberish to an end user whose focus is run-ning a business. Training programs must bedeveloped which target not the technologicalniceties behind the screens, but rather thataddress the how’s of using the system from abusiness viewpoint. Focus on training theuser on understanding the meta data and thenavigation capabilities as well as how to usethose in analyzing a business problem.

Managing warehouse operationsDeveloping a Warehouse is one thing, keep-ing it operational is another. It requires man-aging the timeliness of file transfer and loadprocesses as the system grows in size andcomplexity. Daily operations create meta datachanges, such as the addition of a user, orloading of a new star schema, which meansthe meta data repository must be managed. Inaddition, the organization will need feedbackreports on the Warehouse operations. Managerswill want to know that the quarterly dataextraction actually was started on scheduleand completed on time with all appropriateindexes regenerated.

Some users, in spite of all best training efforts will construct queries that are capable of bringing a system to its knees. In these situations, organizations will need databasemanagement tools that report on how muchsystem resources are being used by a queryand allow the interruption and subsequenttermination of the query.

Some reporting tools are also able to identifywhich areas of the database are being hithardest, or most frequently, or even identifywhich times of the month/quarter/year thoseareas are most likely to be accessed. This willallow managers to govern the extraction andindexing processes—change table structures,drop columns, define different aggregations—according to anticipated uses.

New demands by business units enabled bytechnology or new business theory will always tax a core business system such as aWarehouse, but good architecture will deliverflexibility downstream to deal with these“opportunities” such as today’s web enabledmarts or integrated business process marts.

Page 17: Guide for Datawarehousing

12

Summary

Data Warehousing can provide significantnew benefits to an organization. The technol-ogy is moving from an early adopter stage to amaturity phase. This has removed much ofthe technology risk associated with earlyadopters stage but risk associated with imple-mentation and strategy are still very muchfactors. By considering the issues described inthis guide you should be able to reduce riskassociated with implementing data warehousesolutions and deliver higher value solutions ina reduced timeframe.

Appendix A—Elements ofa warehouse

Fig. 9 represents the basic elements of a DataWarehouse architecture that responds to theneeds of enterprises today. This model isslightly different from most of the ones beingpromulgated throughout the industry in thatit combines the technological elements withthe process and planning elements mentionedearlier. The human figures in the diagramrepresent those elements that are primarily ofa service nature, such as planning, analysis,processes and management.

In this section we will briefly touch on theindividual elements and their contribution tothe overall architecture.

Planning/analysisArguably the most important element of theData Warehouse architecture, the planningand analysis constituent is the one in whichthe enterprise determines which problems itneeds to solve, what the characteristics of theproblem are and how best to solve them. Oncethe problem is clearly framed, then the team(a joint IT/Business Unit) can move on toidentifying and resolving other dependentquestions such as:

• What existing business processes are involvedin the defined problem?

• How large a warehouse do we need to satisfyour business problem?

• What operational data systems (existing oper-ational data sources) do we need to access toget the data we need?

• What transformations will need to be built toconvert the operational data into usable ware-housed data?

• What are the characteristics of the DataModel we need to build to answer the antici-pated queries?

Data acquisitionData acquisition includes extraction of datafrom operational systems, cleansing the data(restructuring records, removal of operationaldata, translating field values to a commondata dictionary) and checking data integrityand consistency. It involves transformation(adding time fields, summarization, andderived fields), loading of the clean data intothe Warehouse database, updating theWarehouse indexes, etc. It is the component ofthe architecture that provides the raw datainto the warehouse repository that will thenbe used for analysis. Clearly, it will be impor-tant to understand how the data will be usedso that transformations, cleanings, etc. rendera suitable format with which to accomplishour query goals.

Data storageThe Data Storage element represents thedatabase and accompanying structures thatare used in the analytical process. Relationaland multidimensional databases are primarilyused in architectures today, and while there isa fair amount of debate in the industry as towhich provides the best results, relationaldatabases offer more flexibility, smaller over-all size, and easier access to atomic data fordrill down operations. A trend emergingwithin the industry is for database vendors toincreasingly add multi- dimensional functionsinto relational products.

Meta dataJust as any other kind of warehouse needs tokeep an inventory of its holdings, a DataWarehouse needs to keep track of what data itis currently holding along with the pedigreeof that data. This is the role of the meta datarepository, to give users and technicians easy-to-understand information about the datasuch as where the data came from, whichrules were used in creating the data, what thedata elements mean, etc. Many systems dividethe Business or End-User Directory, so thattechnical information about the data is keptin a separate directory from that which isrequired by the user to understand the datafrom a business perspective. (For example

Database

Database

Database

LegacySystems

DataStore

WarehouseManagement and Control

Business Processes

Methodology Planning

CleanseTransform

DataAcquisition Data Access

Meta dataDataModel

FFiigg.. 99

Page 18: Guide for Datawarehousing

13

IBM Visual Warehouse Manager deals withtechnical meta data while their VisualWarehouse Information Catalog manages thebusiness meta data.)

Data access Here, the rubber meets the road in that thetools presented here to the end user are theones that will ultimately drive their perceptionof the utility of the system. Access can be cat-egorized using one of the following descriptors:Standard Query, Data Interpretation,Multidimensional Analysis, Data Mining, andEnterprise Reporting.

Standard query tools allow users to developan hypothesis and create questions (queries)to test its validity. This is sometimes called averification-driven approach. Examplesinclude business statistics and optimization(linear programming). Multi-dimensionalanalysis tools facilitate flexible investigation ofthe data along various dimensions, applyingoperations such as time series analysis, andenabling interactive drill-down capabilities.Data mining tools use a discovery-drivenapproach where sophisticated data miningalgorithms are automatically applied to detecttrends, patterns, and correlation hidden in thedata. Enterprise Reporting is information dis-tribution for the masses, usually via a Webbrowser that provides collection and distribu-tion of data to large numbers of peoplethroughout the enterprise, sometimes referredto as publish and subscribe.

Data deliveryThis element refers to how the resultant data ispresented to the end user, and has twodimensions, the physical and logical conduitby which it reaches the end user, and themechanisms by which the data may be visu-alized. Geographic dispersion, security anddata volumes will dictate the use of variousconduits such as Local Area Networks,Internet, Wide Area Networks (public andprivate), etc. Query complexity and outputvolume will dictate the rendering of the data.Options range from simple tables such asspreadsheets to simple two-dimensionalgraphics such as bar graphs and pie charts tovery sophisticated visualization technologiesthat utilize three-dimensional landscapes toportray the results.

ManagementManagement of the Warehouse represents anindispensable architectural element. Oncethe Warehouse is set up, refreshes of the datamust be scheduled and efficiently and correctlyexecuted. Backups of the Warehouse must bemade at reasonable and predetermined fre-quencies. Users must be added and deleted.Security must be administered. The list goeson. The larger the warehouse, the more diffi-cult the task becomes, and therefore the moreimportant that this be done well. A recentarticle in Computerworld stressed the factthat some Warehouses have reached a sizesuch that data refreshes cannot be done with-out taking the system down during opera-tional hours, because of 18 and 20 hourupload processes.

MethodologyMethodologies are key ingredients for anycomplex project. A good methodology will define:

• A logical stepwise approach to solving a prob-lem where each step builds on the previous one.

• A complete set of deliverables and measure-ments against goals so that nothing is forgotten in the project.

• A project structure including roles andresponsibilities of each participant.

Use of a good methodology will greatlyincrease the chances of success of any project,but in Data Warehousing it is essential due tothe integrative (and therefore complex)nature of the problem.

Project management andadministration Although this technically falls under thepurview of a good methodology, it is impor-tant to emphasize the need for solid projectmanagement with both IT and Business unitsinvolved in developing and setting up theWarehouse. Assembling the team, setting andmanaging expectations and working an effec-tive communications plan are all a part of agood project management team.

Appendix B—Case studies

IBM’s principal solution for generating andmanaging Data Warehouse and Data Martsystems is Visual Warehouse. Other IBMofferings such as the Data Replication familyand DB2 DataJoiner (for multi-vendor data-base access) can complement VisualWarehouse as data is moved from source totarget systems. IBM also partners with com-panies such as Evolutionary TechnologiesInternational for more complex extract capa-bilities and Vality Technology Inc. for datacleansing technology. In addition, IBM haskey partnering arrangements with BrioTechnology, Business Objects, and Cognos forquery and reporting as well as with Hyperionfor OLAP technology.

For the warehouse database, IBM offersindustry-leading DB2. The DB2 family spansNetfinity systems, AS/400 systems, RISCSystem/6000 hardware, IBM mainframes,non-IBM machines from Hewlett-Packardand Sun Microsystems, and operating systemssuch as OS/2, Windows (9x & NT), UNIX,OS/400, and OS/390. When DB2 DataJoineris used in conjunction with Visual Warehouse,non-IBM databases, such as those fromOracle, Sybase, and Informix serve as thewarehouse database.

More case studies and information is avail-able on IBM’s Web site at www.ibm.com/software/bi.

Visual Warehouse helps McDonald’s Canada get to themeat of its marketing dataBehind every mouthful of a beefy Big Macburger is a huge organization, working longhours to make sure customers get the qualityand service that has become McDonald’s hall-mark. In Canada, more than 1,000 McDonald’srestaurants do brisk business. Nevertheless,growing competitive pressures and customers’demand for new values have promptedMcDonald’s Canada to aggressively expand itsmarket presence with a larger number ofstrategically located restaurants. At the sametime, the company constantly strives to cur-tail its cost of operations. Oswald Edwards,manager for strategic & architectural plan-ning at McDonald’s Canada, explains,

Page 19: Guide for Datawarehousing

14

“Extensive discussions with our businessexecutives revealed that they needed detailed,accurate, and timely information for strategicplanning and decision making.”

Today, a new DB2-based data warehouse, cre-ated and run by IBM Visual Warehouse, isproviding key transaction information tomarket analysts at McDonald’s Canada. Itincludes information that is helping themanswer questions such as what combination ofproducts sells the most at a given time of day,which day of the week a new campaign shouldbe launched, the success of promotional campaigns linked with other leading brands,and much more. Says Edwards, “We were ableto convince our business users that IBM’sdata warehouse solution was the one for us,”adding, “We’ve achieved enormous returnsfrom our investment in Visual Warehouse. Itwill give us access to information that willhelp us to substantially increase restaurantssales and reduce operating expenses.”

SSiimmpplliiffyyiinngg ddeevveellooppmmeenntt,, ddrriivviinngg ddoowwnn mmaaiinntteennaannccee ccoossttssThe data warehouse, in DB2 for AIX, resideson a four-node RS/6000 SP server. Drivingthis is Visual Warehouse. Running on aWindows NT® server, Visual Warehouse cap-tures transactional data that is scattered on anumber of different enterprise business sys-tems, and loads the data warehouse. Says SergeTremblay, database administrator atMcDonald’s: “The beauty of Visual Warehouseis that it automates data collection from all ouroperational systems, refines it and feeds it tothe data warehouse. There is no comparableproduct in the market that combines all thesefunctions in one box.”

The company knew that maintenance—requiring extensive database administrationresources—is one of the most challenging andexpensive aspects of data warehousing. Andthat was another reason it chose VisualWarehouse. “Visual Warehouse takes over theheadache of administration and maintenance,”says Edwards, referring to the software’sautomation of monitoring, performance log-ging, and other administrative tasks. “It winsin terms of cost/performance, because youneed to dedicate fewer personnel resourcesfor warehouse management.”

Summary data from the warehouse isextracted onto four department-specific data

marts, which reside in DB2 for AIX on differ-ent nodes of the RS/6000 SP. Users accessthe data marts using Cognos PowerPlay andImpromptu clients. “Visual Warehouse sup-ports seamless integration with a wide rangeof sophisticated decision support tools. All thedata we have would be meaningless if not forVisual Warehouse’s openness in allowing us touse other tools,” says Tremblay.

MMiinniinngg ffoorr hhiiddddeenn vvaalluuee iinn nneeww ffoouunndd ddaattaaAs the data warehouse continues to evolve, by1998 it will cater to the purchase depart-ment’s needs, with the restaurant, and opera-tions and franchise departments joining inover the next two years. Information oninventory turnaround, geographical distribu-tion of sales, and restaurant profitability, thatwill be stored in the data warehouse, will helpusers from these departments in managingsupply chains and planning new locations.Seeing how existing business users are bene-fiting, Edwards adds, “We understand thetremendous advantage in mining the newlyfound data to discover hidden correlation andwill be reviewing IBM Intelligent Miner.”

TThhee ssccaallaabbllee ttrriiooDuring the initial planning and testing, datafrom a combination of 10 restaurants wasstored in the data warehouse. By year-end,data from 100 restaurants will be available,and eventually McDonald’s aims to get all1,000 restaurants in the loop. In the mean-time, McDonald’s will move its data ware-house to DB2 Universal Database ExtendedEnterprise Edition, as well as migrate toVersion 3.1 of Visual Warehouse. SaysEdwards, “We believe that the scalable, paral-lel processing capabilities of DB2 UniversalDatabase will help us make the best use ofour RS/6000. By bringing the two togetherwith Visual Warehouse, we’ll have the mostscalable and effective system.”

Testing Visual Warehouse Version 3.1 atMcDonald’s, Tremblay is excited about thenew functionality it provides. “VisualWarehouse Version 3.1 now includes new andimproved features that make it an even morepowerful data warehousing tool,” he says. “Forexample, in the new version it is easier tomake changes to source data and have thesechanges flow through the business views.”Tremblay explains that since data warehous-ing is by nature an iterative process, changesare always being made to database objects.

This version of Visual Warehouse makes itmuch easier to change objects. “Now, changescan be made on the fly. This means the data-base administrator doesn’t constantly have toworry about the status of database data,”Tremblay notes. Listing some of the other fea-tures, Tremblay adds, “Multiple operationscan be performed simultaneously, rather thansequentially. Indexes are generated automati-cally, and Visual Warehouse can perform starjoins and even display them graphically.”

IIBBMM IInnssppiirreess CCoonnffiiddeenncceeEven more important than the productsthemselves was the selection of the vendor.Tremblay recalls, “There was a slew of databasecompanies on the market. But we decided onIBM because their technical and support staffwere superior to anyone else out there.” AddsEdwards, “Without the commitment shownby McDonald’s senior management—peoplelike George Mencke, our chief financial offi-cer, and our IT staff—as well as IBM’s willing-ness to partner with us in building ourwarehouse, our project would not have beenas successful as it is.”

DAMAN converts informationinto knowledge with IBM’ssuite of business intelligenceproducts and ETI•EXTRACTAs business becomes increasingly informa-tion-driven, more and more corporations aremoving towards new information systemarchitectures such as data warehouses andenterprise-wide applications. One of thetoughest and most frequently under-esti-mated challenges in implementing these newapplications is the timely and accurate con-version and migration of data between legacysystems and the new ones that will replace oraugment them. DAMAN Consulting, an IBMBusiness Partner, is making that challengemuch less painful by integrating IBM’s busi-ness intelligence suite of products into itsdata migration and conversion solutions.

DAMAN Consulting, a systems integratorbased in Austin, Texas, specializes in provid-ing data migration, data warehousing, anddecision support solutions developed aroundIBM business intelligence products. “Ourapproach to data migration and warehousingcombines IBM business intelligence productswith our in-house methodologies,” explainsGita Lal, executive vice president for DAMAN

Page 20: Guide for Datawarehousing

15

Consulting. “In that context, we have identi-fied some leading edge tools to provide thesecomprehensive solutions, including IBMDataPropagator and ETI•EXTRACT for datamigration, IBM Visual Warehouse for build-ing data marts and warehouses, and VisualWarehouse’s Information Catalog (formerlycalled DataGuide) for meta data manage-ment.” ETI•EXTRACT, a suite of tools thatprovides a powerful infrastructure for gener-ating complete extract, transformation, andpopulate programs, has been developed byEvolutionary Technologies International(ETI), a leading provider of software productsfor data integration management.

IIBBMM aanndd EETTII aadddd vvaalluuee ttoo DDAAMMAANN’’ss ssoolluuttiioonnssIn the data management consulting industry,DAMAN Consulting has earned a reputationfor quick delivery of solutions based on IBM’s business intelligence products. Alfredo R.Ramirez, Jr., president of DAMAN Consultingattributes much of this, not only to DAMAN’sknowledge and ability to quickly and accu-rately perform data migration implementation,but also to IBM’s comprehensive package ofbusiness intelligence products and integrationwith complementary products such asETI•EXTRACT. “IBM’s support for productssuch as ETI•EXTRACT gives us a consider-able advantage, because we don’t have to goout and test the integration components ordevelop integration processes among the dif-ferent tools. They’re already integrated intothe solution,” explains Ramirez. “IBM’s pack-age includes reporting tools, meta data tools,tools for propagating and migrating the datafrom the source application to the warehouse,and also replication technologies.” He adds,“Having an integrated suite of tools that sup-ports all of our reporting needs definitelyincreases our ability to sell such solutions toour customers.”

TTiigghhttllyy iinntteeggrraatteedd pprroodduuccttss mmeeaann rraappiidd,,llooww--ccoosstt iimmpplleemmeennttaattiioonnFor DAMAN’s customers, quick implementa-tion and conversion from legacy systems totheir new data warehouses is a major concern.“Using IBM’s business intelligence productsin conjunction with propagation tools such asETI•EXTRACT enhances our ability todeliver interface and programmatic conver-sion solutions under tight deadlines,” saysRamirez. “ ETI•EXTRACT’s tight integrationwith DB2 and Visual Warehouse hastens the

migration effort and provides the customer anenvironment that can support their reportingneeds very quickly.”

One of DAMAN’s most recent implementa-tions was for a large property and casualtyinsurance provider, where the business objec-tives were to migrate a legacy policy manage-ment system to a newer system running on anIBM mainframe using DB2 for MVS. DAMANalso developed several programmatic datainterfaces from the new policy managementsystem on DB2 to hundreds of disparate sub-systems that support numerous decision sup-port functions. At another site, DAMANdesigned a datamart on DB2 for OS/390 tomaintain an insurance fraud decision supportsystem. ETI•EXTRACT was used for the dataextraction and load processes, and VisualWarehouse was used to design and build the datamart.

“The cost of implementation and deployment issignificantly reduced by using the VisualWarehouse prepackaged solution, largelybecause implementation time is reduced, as isthe level of skill required for implementation,”says Lal. “The combination of VisualWarehouse, ETI•EXTRACT, and DB2 deliv-ers a synergy that provides the basis for anadvanced decision support environment thatis easy to learn, use, and maintain.”

DDaattaa mmiiggrraattiioonn’’ss ttrriippllee tthhrreeaatt:: IIBBMM,, EETTII,,aanndd DDAAMMAANNIn addition to providing comprehensive IBMdata migration and warehousing solutions,DAMAN has developed tools that assist inmeta data management for these solutions,including DAMAN’s InfoManager. Meta datais information about the enterprise data andis a critical element in effective data manage-ment. DAMAN uses Visual Warehouse’sInformation Catalog to index the meta data inthe application. InfoManager then gathersinformation from the Information Catalog andother sources of meta data (including dis-parate applications and data sources externalto the organization), and creates an integratedmeta data model that can be used for per-forming impact analyses on the entire com-puting or decision support environment. Theaccuracy and currency of the meta data modelis further enhanced by InfoManager’sIntelligent Agent technology (supportingmanagement of operational activities andchange notification).

“A big advantage of using IBM’s businessintelligence solutions is the integration of themeta data arena. Most products are sold inde-pendent of each other and then it’s left to theclients to create their own integrated model.With IBM’s business intelligence suite ofproducts, there’s DataPropagator andETI•EXTRACT sharing meta data with theInformation Catalog, which allows users toautomate meta data integration. This provideshuge benefits in terms of the ongoing main-tenance of the data warehouse,” Ramirezexplains. “That’s a key component that makesIBM’s business intelligence suite of productsparticularly appealing.”

Lal concludes, “In addition to being an end-to-end provider of business intelligence solu-tions comprised of world class technology,IBM adds value to its partnerships, and that’ssomething that can’t be said about the com-petition. If you make partnering difficult andyou don’t add value to the partnering process,you’ll lose your partners very quickly. WithIBM, partnering is a desirable thing.”

Blue Cross & Blue Shield prescribe formulas for real-time cost analysis with DB2 OLAP Server and Visual WarehouseFounded in 1939, Blue Cross & Blue Shield ofRhode Island (BCBS) provides health cover-age for one out of every two Rhode Islanders.This translates into well over 450,000 insur-ance policies. Imagine having to track andanalyze every administrative cost associatedwith selling, implementing, and processingeach one of these claims. Quite a tedious task.Yet, the BCBS cost accounting departmentdoes it every day. Ensuring that all costs areproperly allocated across the organizationmeans monitoring business results and inter-acting with data on a day-to-day, hour-to-hour,and even minute-to-minute basis.

Keeping pace with these demands and thehundreds of thousands of policies was nearlyimpossible in the past, when analysts had tomanually sift through mountains of printedreports and key in the required data into spread-sheets. Today, BCBS has empowered its busi-ness analysts with online analytical processing(OLAP) tools to enable sophisticated, multi-dimensional analysis of large volumes of data.Having such capabilities is imperative to BCBSas it allows them to transform data into useful

Page 21: Guide for Datawarehousing

16

information for analysis and decision making,and to address the corporate need to under-stand what is driving business performance.

George Trudel, an internal business and tech-nology consultant at BCBS, agrees: “Whenyou take a look at the demands coming out oforganizations for real time information tomeet the challenges of competition in today’selectronic environment, decision supportwithout OLAP is a very unwieldy task.”

LLeettttiinngg aannaallyyssttss ggeett ddoowwnn ttoo bbuussiinneessssSince 1993, BCBS, Rhode Island’s largesthealth insurance provider, has been usingHyperion Software’s Essbase OLAP server onWindows NT in its cost accounting depart-ment. This department plays a critical role incoding, tracking, reviewing, and analyzing alladministrative expenses for its 75 diverselines of health insurance products. Prior toimplementing Hyperion Essbase, BCBS costaccountants had to thumb through hundredsof pages of reports to dig up the specific infor-mation needed to prepare an expense reportand re-enter the data into Excel spreadsheets.Once Essbase was deployed, cost accountantswere able to acquire the information theyneeded almost instantaneously, leaving themmore time for their main task—cost analysis.

Recently, BCBS took its OLAP capabilities toa new level by giving its OLAP users directaccess to the company’s relational data storeswith DB2 OLAP Server. DB2 OLAP Serverintegrates the Hyperion Essbase OLAPengine and application program interfaces(APIs) with DB2 Universal Database. “WithIBM DB2 OLAP Server leveraging theEssbase OLAP engine, we will not only take agreat burden off of our IT resources,” saysTrudel, “but also give the business user accessto vast amounts of summary information thatwould have normally taken a very long timeto produce.”

EEssssbbaassee aanndd DDBB22 wwoorrkkiinngg ssiiddee--bbyy--ssiiddeeBCBS uses IBM Visual Warehouse to buildand maintain portions of the DB2 data ware-house on its S/390 server. “We have a largevolume of data that needs to interact withother SQL-based decision support tools, sowe want to maintain those in DB2,” Trudelexplains. DB2 OLAP Server enables theEssbase OLAP engine to operate on top ofthis relational store.

Trudel emphasizes that accessing the rela-tional warehouse has not come at the expenseof its Essbase multidimensional stores. DB2OLAP Server and the native Essbase systemwork side-by-side at BCBS, fulfilling impor-tant and complementary roles. “With DB2OLAP Server, we’ll be able to do everythingthat we’ve done with Hyperion Essbase, andin addition, have the ability to easily accessdata from our DB2 data sources,” says Trudel.

MMaannaaggeedd OOLLAAPPBlue Cross & Blue Shield uses both DB2OLAP Server and Essbase as data marts.“Combining Visual Warehouse with DB2OLAP Server and Essbase provides us with amanaged OLAP environment, in which wecan automate the process of extracting datafrom different sources, loading it into the datawarehouse, and performing ongoing mainte-nance of the warehouse,” says Trudel. “Thisreduces the need for our IT staff to constantlymanage the whole system.”

TThhee bboottttoomm lliinneeMicrosoft Excel is the primary desktop toolfor BCBS’s business analysts. PredefinedExcel templates created internally are used toretrieve the data from Hyperion Essbase orDB2 OLAP Server and automatically load itinto the appropriate cost accounting reports.According to Trudel, what used to take sev-eral days to create now takes three and a halfminutes. “With Hyperion Essbase and DB2OLAP Server, we’re able to get a much betterfeel for how costs are being allocated acrossthe organization, the effect of the allocationson our final cost objectives, and the impact ofadministrative expenses on the business,”says Trudel. “Since our cost accountants nowhave more time to produce analytical reportsand build scenarios, they can give us a betterhandle on controlling costs and identifyingissues in that process.”

EEssssbbaassee eennggiinnee eennaabblleess eexxcceelllleenncceeIt’s no secret how Trudel feels about theHyperion Essbase engine. “I think it’s anexcellent tool. It gives business users a tech-nology that enables them to get to the information they need when they need it,” he explains. “And from the IBM side, we havemassive amounts of information in IMS andDB2 databases that we need to get at formulti-dimensional analysis. Being able to getat that information with the OLAP tools enablesus to hypothesize about other potential rev-enue streams that we couldn’t address before.”

Glossary of terms

AAdd hhoocc qquueerryy—Any query that cannot bedetermined prior to the moment the query is issued.

BBuussiinneessss ddiirreeccttoorryy—The portion of the Metadata Repository which deals with informationabout data pertinent to the business users. Forexample, meta data in the Business Directorycould tell users what factors are included inthe calculations of gross vs. net sales.

CCeennttrraalliizzeedd ddaattaa wwaarreehhoouussee—A DataWarehouse implementation in which a singlewarehouse serves the need of several businessunits simultaneously with a single data modelwhich spans the needs of the multiple busi-ness divisions.

CChhaannggee ddaattaa ccaappttuurree—The process of captur-ing changes made to a production datasource. Change data capture is typically per-formed by reading the source DBMS log. Itconsolidates units of work, ensures data issynchronized with the original source, andreduces data volume in a data warehousingenvironment.

CCooppyy mmaannaaggeemmeenntt—The process of taking asnapsot of data from a source and copying itto a target environment.

CCrroossssttaabb—A process or function that com-bines and/or summarizes data from one ormore sources into a two dimensional formatfor analysis or reporting.

DDaattaa aaddmmiinniissttrraattiioonn ((DDAA))—The processesand procedures by which the integrity andcurrency of the data in the warehouse are maintained.

DDaattaa aaggggrreeggaattiioonn*—A type of data derivationwhere a data value is derived from the aggre-gation of different data occurrences of thesame subject data. For example, yearly salesdata can be aggregated from monthly salesdata.

DDaattaa cclleeaannssiinngg—The process of removingerrors and resolving inconsistencies in sourcedata before loading the data into a target envi-ronment.

16

Page 22: Guide for Datawarehousing

17

DDaattaa ddiiccttiioonnaarryy—It is a collection of defini-tions and specifications for data categoriesand their relationships. It is a database of dataabout data (meta data).

DDaattaa eexxttrraacctt—The process of copying a subsetof data from a source to a target environment.

DDaattaa mmaarrtt—A type of data warehouse designedto meet the needs of a specific group of userssuch as a single department or part of anorganization. Typically a data mart focuses ona single subject area such as sales data. Datamarts may or may not be designed to fit into abroader enterprise data warehouse design.

DDaattaa mmiinniinngg—A process of analyzing largeamounts of data to identify hidden relation-ships, patterns, and relationships. This isoften called “discovery-driven” data analysis.

DDaattaa mmooddeell—A logical map that representsthe inherent properties of the data indepen-dent of software, hardware or machine perfor-mance considerations. The model shows dataelements grouped into records, as well as theassociations between those records.

DDaattaa pprrooppaaggaattiioonn//rreepplliiccaattiioonn—A process fordistributing data from a source database totarget databases while usually keeping the databases synchronized.

DDaattaa ssccrruubbbbiinngg//ttrraannssffoorrmmaattiioonn—Theprocess of filtering, merging, decoding, andtranslating source data to create validateddata for the data warehouse. For example, anumeric regional code might be replaced withthe name of the region.

DDaattaa wwaarreehhoouussee*—A subject oriented, inte-grated, time-variant, non-volatile collection of data in support of management’s decisionmaking process. A repository of consistenthistorical data that can be easily accessed andmanipulated for decision support.

DDaattaabbaassee—A collection of data which are logically related.

DDaattaabbaassee mmaannaaggeemmeenntt ssyysstteemmss ((DDBBMMSS))—Asoftware system for creating, maintaining andprotecting databases.

DDeecciissiioonn ssuuppppoorrtt ssyysstteemm ((DDSSSS))—Systemswhich allow decision makers in organizationsto access data relevant to the decisions theyare required to make.

DDeerriivveedd ddaattaa—Warehouse data that resultsfrom calculations or processing applied to thesource data before it is stored in theWarehouse environment. For example, sourcedata might be used to calculate ROI (ReturnOn Investment) which is stored as derived datain the Warehouse.

DDrriillll ddoowwnn—A method of exploring detaileddata that was used in creating a summarylevel of data. Drill down levels depend on thegranularity of the data in the data warehouse.

EEnnhhaanncceedd ddaattaa—Warehouse data that hasbeen cleansed, scrubbed, transformed,derived, summarized, or aggregated.

EEnntteerrpprriissee ddaattaa mmooddeell—A blueprint for all ofthe data used by all departments in the enter-prise. An Enterprise Data Model has resolvedall of the potential inconsistencies andparochial interpretations of the data used andpresents a consistent and commonly under-stood and accepted view and definition of theenterprise data.

EEnntteerrpprriissee ddaattaa wwaarreehhoouussee—An EnterpriseData Warehouse is a Centralized Warehousewhich services the entire enterprise.Enterprise Data Warehouse are sometimesused to populate data marts.

Page 23: Guide for Datawarehousing

EExxeeccuuttiivvee iinnffoorrmmaattiioonn ssyysstteemm ((EEIISS))—Toolsprogrammed to provide canned reports orbriefing books to top-level executives. Theyoffer strong reporting and drill-down capabil-ities. Today these tools allow ad-hoc queryingagainst a multi-dimensional view of data, andmost offer analytical applications along func-tional lines such as sales or financial analysis.

EExxttrraacctt—The process of copying a subset ofdata from a source to a target environment.

IInnffoorrmmaattiioonn ddiirreeccttoorryy—A system for brows-ing Data Warehouse meta data. Sub-compo-nents usually include a business directoryand a technical directory. See the entries forBusiness Directory and Technical Directory.

IInnffoorrmmaattiioonn mmiinniinngg—The process of extract-ing previously unknown, comprehensible,and actionable information from any source—including transactions, documents, e-mail,web pages, etc., and using the information tomake crucial business decisions.

IInnffoorrmmaattiioonn WWaarreehhoouussee™™—IBM’s approach todeveloping an architecture for a data ware-house that supports the implementation ofeither functional, centralized, or decentral-ized warehouses.

IInnffoorrmmaattiioonnaall aapppplliiccaattiioonnss—Applicationswhich are written to analyze data from a DataWarehouse for Decision Support purposes.

IInnffoorrmmaattiioonnaall ddaattaa—Data which has beenextracted, summarized, and stored in a DataWarehouse for purposes of supportingInformational Applications.

MMeettaa ddaattaa—Data about data. For example,information about where the data is stored,who is responsible for maintaining the data,how often the data is refreshed, etc.

MMiiddddlleewwaarree—A communications layer thatallows applications to interact across hard-ware and network environments.

MMuullttii--ddiimmeennssiioonnaall aannaallyyssiiss ((MMDDAA))—Informational Analysis on data which takesinto account many different relationships,each of which represents a dimension. Forexample, a person doing an analysis of retailmay want to understand the relationshipsamong sales by region, by quarter, by demo-graphic distribution (income, education level,gender, or by product). Multi-DimensionalAnalysis will yield results for these complexrelationships. Multi-Dimensional Analysis issometimes referred to as On-Line AnalyticalProcessing or OLAP.

OOnn--lliinnee aannaallyyttiiccaall pprroocceessssiinngg ((OOLLAAPP))—Processing that supports the analysis of busi-ness trends and projections. It is also knownas Multi-Dimensional Analysis.

OOppeerraattiioonnaall aapppplliiccaattiioonnss—Applicationswhich support the daily operations of theenterprise. Usually included in this class ofapplications are Order Entry, AccountsPayable, Accounts Receivable, etc.

QQuueerryy—A request for information from theData Warehouse posed by the user or tooloperated by the user.

RReellaattiioonnaall ddaattaabbaassee mmaannaaggeemmeenntt ssyysstteemm((RRDDBBMMSS)—A database system built aroundthe relational model based on tables, columnsand views.

RReepplliiccaattiioonn—The process of keeping a copyof data.

SSoouurrccee ddaattaabbaassee—The database from whichdata will be extracted or copied into the Data warehouse.

SSttaarr sscchheemmaa—A modeling scheme that has asingle object in the middle connected to anumber of objects around it radially - hencethe name star. A fact such as sales, compensa-tion, payment, or invoices is qualified by oneor more dimensions such as by month, byproduct, by geographical region. The fact isrepresented by a fact table and the dimen-sions are represented by dimension tables.

TTaarrggeett ddaattaabbaassee—The database in which datawill be loaded or inserted.

TTeecchhnniiccaall ddiirreeccttoorryy—The portion of the Metadata Repository which deals with the techni-cal information about the data. Such infor-mation may include the field designation(alphanumeric, etc.), the number of charac-ters, range checks, etc.

18

Page 24: Guide for Datawarehousing

GC-26-9313-01

For more information aboutIBM Visual Warehouseplease contact your IBM marketing representative or IBM authorized softwarereseller, or visit our Web site atwww.ibm.com/software/vw or www.ibm.com/software/data