OLAP Course

1

OLAP

The best starting point to approach the multidimensional model - queries for which this model is

best suited (Jarke et al., 2000):

"What is the total amount of receipts recorded last year per state and per product

category?"

"What is the relationship between the trend of PC manufacturers' shares and quarter

gains over the last five years?"

"Which orders maximize receipts?"

"Which one of two new treatments will result in a decrease in the average period of

admission?"

"What is the relationship between profit gained by the shipments consisting of less than

10 items and the profit gained by the shipments of more than 10 items?"

It is clear that using traditional languages, such as SQL, to express these types of queries can

be a very difficult task for inexperienced users.

It is also clear that running these types of queries against operational databases would result

in an unacceptably long response time.

The multidimensional model begins with the observation that the factors affecting decision-

making processes are enterprise-specific facts, such as sales, shipments, hospital admissions,

surgeries, and so on.

Instances of a fact correspond to events that occurred. For example, every single sale or

shipment carried out is an event.

Each fact is described by the values of a set of relevant measures that provide a quantitative

description of events. For example, sales receipts, amounts shipped, hospital admission costs,

and surgery time are measures.

Obviously, a huge number of events occur in typical enterprisestoo many to analyze one

by one. Imagine placing them all into an n-dimensional space to help us quickly select and sort

them out.

The n-dimensional space axes are called analysis dimensions, and they define different

perspectives to single out events.

Examples:

- the sales in a store chain can be represented in a three-dimensional space whose dimensions

are products, stores, and dates.

- as far as shipments are concerned, products, shipment dates, orders, destinations, and terms

& conditions can be used as dimensions.

- hospital admissions can be defined by the department-date-patient combination, and you

would need to add the type of operation to classify surgery operations.

The concept of dimension gave life to the broadly used metaphor of cubes to represent

multidimensional data. According to this metaphor, events are associated with cube cells and

cube edges stand for analysis dimensions. If more than three dimensions exist, the cube is

called a hypercube. Each cube cell is given a value for each measure.

Its analysis dimensions are store, product and date. An event stands for a specific item

sold in a specific store on a specific date, and it is described by two measures: the quantity sold

2

and the receipts. This figure highlights that the cube is sparsethis means that many events did

not actually take place. Of course, you cannot sell every item every day in every store.

History

The first commercial multidimensional (OLAP) products appeared approximately 30

years ago (Express). When Edgar Codd introduced the OLAP definition in his 1993 white paper,

there were already dozens of OLAP.

After Codd's research appeared, the software industry began appreciating OLAP

functionality and many companies have integrated OLAP features into their products (RDBMS,

integrated business intelligence suites, reporting tools, portals, etc.). In addition, for the last

decade, pure OLAP tools have considerably improved and become cheaper and more user-

friendly. These developments brought OLAP functionality to a much broader range of users and

organizations.

Now OLAP is used not only for strategic decision-making in large corporations, but also

to make daily tactical decisions about how to better streamline business operations in

organizations of all sizes and shapes.

However, the acceptance of OLAP is far from maximized. For example, one year ago,

The OLAP Survey 2 found that only thirty percent of its participants actually used OLAP.

Definitions

An OLAP cube is an array of data understood in terms of its 0 or more dimensions.

OLAP is an acronym for online analytical processing. OLAP is a computer-based technique for

analyzing business data in the search for business intelligence.

Source: Wikipedia.org

An online analytical processing cube (OLAP cube) is a multidimensional array of data

that serves as a database optimized for OLAP applications and data warehousing. It is a way of

storing relevant data in a multidimensional form to make it appear more logical when used to

generate reports and facilitate more efficient analytics.

Source: http://www.techopedia.com/definition/21142/online-analytical-processing-cube-olap-cube

An OLAP cube is a multidimensional database that is optimized for data warehouse

and online analytical processing (OLAP) applications.

An OLAP cube is a method of storing data in a multidimensional form, generally for

reporting purposes. In OLAP cubes, data (measures) are categorized by dimensions. OLAP

cubes are often pre-summarized across dimensions to drastically improve query time over

relational databases. The query language used to interact and perform tasks with OLAP cubes is

multidimensional expressions (MDX). The MDX language was originally developed by

Microsoft in the late 1990s, and has been adopted by many other vendors of multidimensional

databases.

Although it stores data like a traditional database does, an OLAP cube is structured very

differently. Databases, historically, are designed according to the requirements of the IT systems

that use them. OLAP cubes, however, are used by business users for advanced analytics. Thus,
http://searchdatabase.techtarget.com/gDefinition/0,294236,sid13_gci214137,00.htmlhttp://www.survey.com/products/olap2/index.htmlhttp://en.wikipedia.org/wiki/Array_data_structurehttp://en.wikipedia.org/wiki/Online_analytical_processinghttp://en.wikipedia.org/wiki/Business_intelligencehttp://www.techopedia.com/definition/21142/online-analytical-processing-cube-olap-cubehttp://www.techopedia.com/definition/21142/online-analytical-processing-cube-olap-cubehttp://searchoracle.techtarget.com/definition/multidimensional-databasehttp://searchdatamanagement.techtarget.com/definition/OLAPhttp://searchsqlserver.techtarget.com/definition/multidimensional-expressions-MDX

3

OLAP cubes are designed using business logic and understanding. They are optimized for

analytical purposes, so that they can report on millions of records at a time.

Source: http://searchoracle.techtarget.com/tip/Why-OLAP-deserves-more-attention

Stands for "Online Analytical Processing." OLAP allows users to analyze database

information from multiple database systems at one time. While relational databases are

considered to be two-dimensional, OLAP data is multidimensional, meaning the information

can be compared in many different ways. For example, a company might compare their

computer sales in June with sales in July, then compare those results with the sales from another

location, which might be stored in a different database.

Source: http://www.techterms.com/definition/olap

Basically OLAP is an awful name, Nigel Pendse, author of the OLAP report calls the

same thing FASMI, which I think is a far better term :

Fast - 90% of queries back in under 10 secs and no query takes longer than 30 secs.

Analysis - Drill down, multiple aggregation techniques, sophisticated graphics, trends all form part of this

Shareable - good security at the back end and available to a wide community of users.also multi currency, multi lingual to cope with the global economy.

Multi-Dimensional - Excel pivot tables but more so. The ability to have any multiple dimensions of information on each axis of a cross-tab with other dimensions

being used to further filter the results returned.

Information - Real world KPI's rather than raw numbers. Source: Andrew.Fryer, OLAP, Cubes and Multidimensional Analysis, available at:

http://blogs.technet.com/b/andrew/archive/2007/08/22/olap-cubes-and-multidimensional-

analysis.aspx
http://searchoracle.techtarget.com/tip/Why-OLAP-deserves-more-attentionhttp://www.techterms.com/definition/olaphttp://www.olap-report.com/nigel_pendse.phphttp://blogs.technet.com/Andrew.Fryer/ProfileUrlRedirect.ashx

4

In OLAP the cube is the database structure that is queried on and to get a handle on how this

works below is a simple 3 dimensional cube

Exemple1.

The coordinate system in a cube not only has a reference to a point in multidimensional

space it also has an understanding of hierarchies. So the cube 'knows' that January 2007 has a

parent called 2007 in the example above. This forms a key part of the OLAP concept - that the

results of calculations can be stored at the parent level rather than using on the fly aggregation of

all the children e.g. the sales total for 2007 is stored in the cube for bike, components etc. as is

the cost of sale.

The profit margin % has to be worked out on the fly for bikes for 2007 but this is quick as

the cost of sales and the sales that contribute to this calculation are pre-calculated. This gives

OLAP it's speed while allowing for rich calculations to be stored. As always in IT there is a

catch, and in this case is the complexity of the language used to query a cube and that is MDX

or multi-dimensional expressions.

Exemple2

Years later, the technology has been sufficiently perfected to make OLAP against large data

warehouses feasible, truly bringing the "intelligence" to business intelligence. A huge departure

from traditional relational design, OLAP allows the data to be stored and accessed in the most
http://blogs.technet.com/blogfiles/andrew/WindowsLiveWriter/OLAPCubesandMultidimensionalAnalysis_10C03/cube.jpg

5

efficient mannerallowing end-users to traverse the edges of a hypothetical "cube" of many

dimensions. (See below for an example of such a data cube).

The cube's dimensions are associated with facts (also called "measures"). In relational terms, the

facts have a many-to-one relationship with the dimensions. For example, Acme Computer

Supplies may have a database for sales. Dimensions are usually Customers, Products, and Time

Element (month, quarter, etc.). The sales figure for a specific product (Cat5e cables) to a specific

customer (Oracle Corp.) during a specific time period (Aug 2008) is one measure. The

dimensions are stored on individual tables and so are the factsi.e. the sales figure. So the fact

table, in relational terminology, is a child table of the dimension tables.

But that's where the analogy ends. The access to the measures in relational design would have

been through indexes created on the customer, product, or time columns of the fact table. In the

OLAP approach, specific cells (the measures) are accessed by traversing the cube: in this

example, by going to the slice containing the time - Aug 08; then product - Cat5e; and finally the

customer - Oracle.

Oracle knows how to go to these slices by calculating the destination as in an array, not a table.

For instance, suppose the dimensions are organized as shown below:

Dimension Time := {'May','Jun','Jul','Aug'}

Dimension Customer := {'Microsoft','IBM','Oracle','HP'}

Dimension Product := {'Fiber','Cat6e','Cat5e','Serial'}

6

To find the measure for Oracle + Aug + Cat5e, the OLAP engine performs the navigation like

this:

1. Aug 08 is the fourth element of the array called Time, so travel to the fourth cell along the time dimension of the cube.

2. Cat5e is the third element of the Product array, so travel to the third element. 3. Oracle is the third element of the Customer array, so travel to the third element.

That's it; now you've arrived at the measure you want. This is done without indexes since the

dimension values serve as array pointers. Similarly, if you want to calculate the total sales to all

customers in Aug 08, you do the same thing as above, except that in Step 3 you total the

measures of the elements of the array without going to a specific cell.

OLAP versus OLTP

Hari Mailvaganam, Slice, Dice and Drill! , available at

http://www.dwreview.com/OLAP/Introduction_OLAP.html

OLAP allows business users to slice and dice data at will. Normally data in an organization is

distributed in multiple data sources and are incompatible with each other. A retail example:

Point-of-sales data and sales made via call-center or the Web are stored in different location and

formats. It would a time consuming process for an executive to obtain OLAP reports such as -

What are the most popular products purchased by customers between the ages 15 to 30?

Part of the OLAP implementation process involves extracting data from the various data

repositories and making them compatible. Making data compatible involves ensuring that the

meaning of the data in one repository matches all other repositories. An example of incompatible

data: Customer ages can be stored as birth date for purchases made over the web and stored as

age categories (i.e. between 15 and 30) for in store sales.

It is not always necessary to create a data warehouse for OLAP analysis. Data stored by

operational systems, such as point-of-sales, are in types of databases called OLTPs. OLTP,

Online Transaction Process, databases do not have any difference from a structural perspective

from any other databases. The main difference, and only, difference is the way in which data is

stored.

Examples of OLTPs can include ERP, CRM, SCM, Point-of-Sale applications, Call

Center.

OLTPs are designed for optimal transaction speed. When a consumer makes a purchase

online, they expect the transactions to occur instantaneously. With a database design, call data

modeling, optimized for transactions the record 'Consumer name, Address, Telephone, Order

Number, Order Name, Price, Payment Method' is created quickly on the database and the results

can be recalled by managers equally quickly if needed.
http://www.dwreview.com/OLAP/Introduction_OLAP.htmlhttp://www.dwreview.com/DW_Overview.html

7

Figure 1. Data Model for OLTP

Data are not typically stored for an extended period on OLTPs for storage cost and

transaction speed reasons.

OLAPs have a different mandate from OLTPs. OLAPs are designed to give an overview

analysis of what happened. Hence the data storage (i.e. data modeling) has to be set up

differently. The most common method is called the star design.
http://www.dwreview.com/Articles/Data_LifeCycle.html

8

Figure 2. Star Data Model for OLAP

The central table in an OLAP start data model is called the fact table. The surrounding

tables are called the dimensions.

Using the above data model, it is possible to build reports that answer questions such as:

The supervisor that gave the most discounts.

The quantity shipped on a particular date, month, year or quarter.

In which zip code did product A sell the most. To obtain answers, such as the ones above, from a data model OLAP cubes are created.

OLAP cubes are not strictly cuboids - it is the name given to the process of linking data from the

different dimensions. The cubes can be developed along business units such as sales or

marketing. Or a giant cube can be formed with all the dimensions.

Figure 3. OLAP Cube with Time, Customer and Product Dimensions

OLAP can be a valuable and rewarding business tool. Aside from producing reports,

OLAP analysis can aid an organization evaluate balanced scorecard targets.

9

Figure 4. Steps in the OLAP Creation Process

OLAP Storage

OLAP storage is one of the critical choices to be made when designing the solution.

OLAP storage comes in three forms:

MOLAP - Multidimensional OLAP. In MOLAP, both the source data and the

aggregations are stores in a multidimensional format. MOLAP is the fastest option for data

retrieval, but requires the most disk space. Disk space is less of a concern these days with

lowering storage and processing cost.

ROLAP - Relational OLAP. All data, including the aggregations are stored within the

source relational database. This will be a concern for larger data warehousing implementations

which have higher usage needs. ROLAP is the slowest for data retrieval. Whether an aggregation

exists or not, a ROLAP database must access the data warehouse itself. ROLAP is best suited for

smaller data warehousing implementations.

HOLAP - Hybrid OLAP. HOLAP is a combination of both the above storage

methodologies. HOLAP databases store the aggregations that exist within a multidimensional

structure, leaving the cell-level data itself in a relational form. Where the data is pre aggregated,

HOLAP offers the performance of MOLAP, where the data must be fetched from the tables.

HOLAP is as slow as ROLAP.

Due to shrinking hardware and processing cost, MOLAP are generally most often used.

HOLAP is a better solution if the solution is accessing a stand-alone database. ROLAP are more

convenient to set up when the query demands are relatively low and also on a stand-alone

database.

http://businessintelligence.ittoolbox.com/documents/advantagesdisadvantages-of-molap-

rolap-and-holap-15897

MOLAP Excellent performance- this is the more traditional way of OLAP analysis. In MOLAP, data is

stored in a multidimensional cube. The storage is not in the relational database, but in proprietary

formats.

Advantages:

MOLAP cubes are built for fast data retrieval, and are optimal for slicing and dicing operations.

They can also perform complex calculations. All calculations have been pre-generated when the

cube is created. Hence, complex calculations are not only doable, but they return quickly.
http://businessintelligence.ittoolbox.com/documents/advantagesdisadvantages-of-molap-rolap-and-holap-15897http://businessintelligence.ittoolbox.com/documents/advantagesdisadvantages-of-molap-rolap-and-holap-15897

10

Disadvantages:

It is limited in the amount of data it can handle. Because all calculations are performed when the

cube is built, it is not possible to include a large amount of data in the cube itself. This is not to

say that the data in the cube cannot be derived from a large amount of data. Indeed, this is

possible. But in this case, only summary-level information will be included in the cube itself.

It requires an additional investment. Cube technology are often proprietary and do not

already exist in the organization. Therefore, to adopt MOLAP technology, chances are additional

investments in human and capital resources are needed.

ROLAP This methodology relies on manipulating the data stored in the relational database to give the

appearance of traditional OLAP's slicing and dicing functionality. In essence, each action of

slicing and dicing is equivalent to adding a "WHERE" clause in the SQL statement.

Advantages:

It can handle large amounts of data. The data size limitation of ROLAP technology is the

limitation on data size of the underlying relational database. In other words, ROLAP itself places

no limitation on data amount.

It can leverage functionalities inherent in the relational database. Often, relational

database already comes with a host of functionalities. ROLAP technologies, since they sit on top

of the relational database, can therefore leverage these functionalities.

Disadvantages:

Performance can be slow. Because each ROLAP report is essentially a SQL query (or multiple

SQL queries) in the relational database, the query time can be long if the underlying data size is

large.

It has limited by SQL functionalities. Because ROLAP technology mainly relies on generating

SQL statements to query the relational database, and SQL statements do not fit all needs (for

example, it is difficult to perform complex calculations using SQL), ROLAP technologies are

therefore traditionally limited by what SQL can do. ROLAP vendors have mitigated this risk by

building into the tool out-of-the-box complex functions as well as the ability to allow users to

define their own functions.

HOLAP HOLAP technologies attempt to combine the advantages of MOLAP and ROLAP. For

summary-type information, HOLAP leverages cube technology for faster performance. When

detail information is needed, HOLAP can "drill through" from the cube into the underlying

relational data.

Disclaimer: Contents are not reviewed for correctness and are not endorsed or

recommended by Toolbox.com or any vendor.

Popular Q&A contents include summarized information from Business Intelligence

Career discussion unless otherwise noted.
http://it.toolbox.com/trd/95/7/9813/http://businessintelligence.ittoolbox.com/groups/career/bi-career/http://businessintelligence.ittoolbox.com/groups/career/bi-career/

11

Operations

The information in a multidimensional cube is very difficult for users to manage because

of its quantity, even if it is a concise version of the information stored to operational databases.

If, for example, a store chain includes 50 stores selling 1000 items, and a specific data warehouse

covers three-year-long transactions (approximately 1000 days), the number of potential events

totals 50 1000 1000 = 5 10(7th). Assuming that each store can sell only 10 percent of all

the available items per day, the number of events totals 5 10(6th). This is still too much data to

be analyzed by users without relying on automatic tools.

You have essentially two ways to reduce the quantity of data and obtain useful

information: restriction and aggregation. The cube metaphor offers an easy-to-use and intuitive

way to understand both of these methods, as we will discuss in the following paragraphs.

Restriction Restricting data means separating part of the data from a cube to mark out an analysis

field. In relational algebra terminology, this is called making selections and/or projections.

Selection has two forms: slicing and dicing.

Restriction - selections - slicing

dicing

- projections

Common operations include Slice and Dice, Drill-Down, Roll-Up, and Pivot:

Source: Multidimensional OLAP Cubes, available at: http://www.practicaldb.com/blog/cubes/

When you slice data, you decrease cube dimensionality by setting one or more

dimensions to a specific value. For example, if you set one of the sales cube dimensions to a

value, such as store='EverMore', this results in the set of events associated with the items sold in

the EverMore store.

According to the cube metaphor, this is simply a plane of cellsthat is, a data slice that

can be easily displayed in spreadsheets.

In the store chain example given earlier, approximately 10(5th) events still appear in your

result. If you set two dimensions to a value, such as store='EverMore' and date='4/5/2008', this

will result in all the different items sold in the EverMore store on April 5 (approximately 100

events). Graphically speaking, this information is stored at the intersection of two perpendicular

planes resulting in a line. If you set all the dimensions to a particular value, you will define just

one event that corresponds to a point in the three-dimensional space of sales.

Dicing is a generalization of slicing. It poses some constraints on dimensional attributes

to scale down the size of a cube. For example, you can select only the daily sales of the food

items in April 2008 in Florida. In this way, if five stores are located in Florida and 50 food

products are sold, the number of events to examine changes to 5 50 30 = 7500.

Finally, a projection can be referred to as a choice to keep just one subgroup of measures

for every event and reject other measures.
http://www.practicaldb.com/blog/cubes/

12

Slice: A slice is a subset of a multi-dimensional array corresponding to a single value for

one or more members of the dimensions not in the subset.

Slice is any two-dimensional slice of the data cube. You slice a data cube to filter information.

For example, the figure below shows a data cube with following dimensions: Retailer, Date and

Product

If you are interested only in the data for a specific retailer you can slice off a single (two

dimensional) layer. In our example the slice contains information on date and product for

department stores.
http://www.practicaldb.com/?attachment_id=118865

13

Dice: The dice operation is a slice on more than two dimensions of a data cube (or more

than two consecutive slices).

Dice is the "rotation" of the cube to reveal another, different slice of data.

For exploring data from various perspectives, you can dice a data cube by exchanging the

dimension for other dimensions.

For example, after exploring the data by date and product for a specific retailer (orange slice on

the left cube), you want to get deeper information on date and retailer for a specific product.

14

Drill Down/Up: Drilling down or up is a specific analytical technique whereby the user

navigates among levels of data ranging from the most summarized (up) to the most detailed

(down).

Drill Down is the exploration of data to subsequent levels of more detail along a

dimension.

For example, the dimension "Retailer" can be drilled-down to specific retailers, the

dimension "Date" can be drilled-down to months, and the dimension "Product" finally, can be

explored in more detail by single products.

15

Roll-up: (Aggregate, Consolidate) A roll-up involves computing all of the data

relationships for one or more dimensions. To do this, a computational relationship or formula

might be defined.

Roll-up is the aggregation of data to subsequent levels of summary, along a dimension. This

implies that dimensions are typically hierarchical in nature based on parent/child relationships

between dimension values.

Pivot: This operation is also called rotate operation. It rotates the data in order to provide

an alternative presentation of data the report or page display takes a different dimensional

orientation.

Conclusions

In summary, a multidimensional cube hinges on a fact relevant to decision-making.

It shows a set of events for which numeric measures provide a quantitative description.

Each cube axis shows a possible analysis dimension. Each dimension can be analyzed at

different detail levels specified by hierarchically structured attributes.
http://www.practicaldb.com/?attachment_id=118866http://www.practicaldb.com/?attachment_id=119733

16

OLAP Benefits

Successful OLAP applications increase the productivity of business managers, developers, and whole organizations. The inherent flexibility of OLAP systems means business

users of OLAP applications can become more self-sufficient. Managers are no longer dependent

on IT to make schema changes, to create joins, or worse. Perhaps more importantly, OLAP

enables managers to model problems that would be impossible using less flexible systems with

lengthy and inconsistent response times. More control and timely access to strategic information

equal more effective decision-making.

IT developers also benefit from using the right OLAP software. Although it is possible to build an OLAP system using software designed for transaction processing or data collection, it is

certainly not a very efficient use of developer time. By using software specifically designed for

OLAP, developers can deliver applications to business users faster, providing better service.

Faster delivery of applications also reduces the applications backlog.

OLAP reduces the applications backlog still further by making business users self-sufficient enough to build their own models. However, unlike standalone departmental

applications running on PC networks, OLAP applications are dependent on Data Warehouses

and transaction processing systems to refresh their source level data. As a result, IT gains more

self-sufficient users without relinquishing control over the integrity of the data.

IT also realizes more efficient operations through OLAP. By using software designed for OLAP, IT reduces the query drag and network traffic on transaction systems or the Data

Warehouse.

Lastly, by providing the ability to model real business problems and a more efficient use of people resources, OLAP enables the organization as a whole to respond more quickly to

market demands. Market responsiveness, in turn, often yields improved revenue and

profitability.

OLAP functionality is:

Multidimensional -- OLAP services provide a wide variety of possible views or a

multidimensional conceptual view of the data by supporting a dimensional aggregation path

or hierarchies and/or multiple hierarchies.

Easy to understand -- The data mart designed for OLAP analysis should handle any

business logic and statistical analysis that is relevant to the application and the developer,

while at the same time, keeps it easy enough for the target user.

Interactive -- OLAP helps the user synthesize business information through comparative,

personalized viewing, as well as thorough analysis of historical and projected data in various

"what-if" data model scenarios. The users are allowed to define new ad hoc calculations as

part of the analysis and can report on the data in any desired way.

Fast -- OLAP services are usually implemented in a multi-user client/server mode and offer

consistently rapid responses to queries, regardless of database size and complexity. The

consolidated business data can be pre-aggregated along with the hierarchies in all dimensions

to reduce the runtime calculation for building the OLAP reports.

17

Overview of the Dimensional Data Model

Available at:

http://docs.oracle.com/cd/B28359_01/olap.111/b28124/overview.htm#OLAUG9115

Dimensional objects are an integral part of OLAP. Because OLAP is on-line, it must

provide answers quickly; analysts pose iterative queries during interactive sessions, not in batch

jobs that run overnight. And because OLAP is also analytic, the queries are complex. The

dimensional objects and the OLAP engine are designed to solve complex queries in real time.

The dimensional objects include cubes, measures, dimensions, attributes, levels and

hierarchies.

The simplicity of the model is inherent because it defines objects that represent real-

world business entities.

Analysts know:

which business measures they are interested in examining

which dimensions and attributes make the data meaningful

how the dimensions of their business are organized into levels and hierarchies.

Figure 1. Diagram of the OLAP Dimensional Model

Description of Diagram of the OLAP Dimensional Model

The dimensional data model is highly structured. Structure implies rules that govern the

relationships among the data and control how the data can be queried. Cubes are the physical

implementation of the dimensional model, and thus are highly optimized for dimensional

queries. The OLAP engine leverages this innate dimensionality in performing highly efficient

cross-cube joins for inter-row calculations, outer joins for time series analysis, and indexing.

Dimensions are pre-joined to the measures. The technology that underlies cubes is based on an

indexed multidimensional array model, which provides direct cell access.

The OLAP engine manipulates dimensional objects in the same way that the SQL engine

manipulates relational objects. However, because the OLAP engine is optimized to calculate

analytic functions, and dimensional objects are optimized for analysis, analytic and row

functions can be calculated much faster in OLAP than in SQL.
http://docs.oracle.com/cd/B28359_01/olap.111/b28124/img_text/logicalm.htm

18

The dimensional model enables Oracle OLAP to support high-end business intelligence

tools and applications such as OracleBI Discoverer Plus OLAP, OracleBI Spreadsheet Add-In,

OracleBI Suite Enterprise Edition, BusinessObjects Enterprise, and Cognos ReportNet.

Cubes

Cubes provide a means of organizing measures that have the same shape, that is, they

have the exact same dimensions. Measures in the same cube can easily be analyzed and

displayed together. A cube usually corresponds to a single fact table or view.

Measures

Measures populate the cells of a cube with the facts collected about business operations.

Measures are organized by dimensions, which typically include a Time dimension.

An analytic database contains snapshots of historical data, derived from data in a

transactional database, legacy system, syndicated sources, or other data sources. Three years of

historical data is generally considered to be appropriate for analytic applications.

Measures are static and consistent while analysts are using them to inform their decisions.

They are updated in a batch window at regular intervals: weekly, daily, or periodically

throughout the day. Some administrators refresh their data by adding periods to the time

dimension of a measure, and may also roll off an equal number of the oldest time periods. Each

update provides a fixed historical record of a particular business activity for that interval. Other

administrators do a full rebuild of their data rather than performing incremental updates.

A critical decision in defining a measure is the lowest level of detail. Users may never

view this detail data, but it determines the types of analysis that can be performed. For example,

market analysts (unlike order entry personnel) do not need to know that Beth Miller in Ann

Arbor, Michigan, placed an order for a size 10 blue polka-dot dress on July 6, 2006, at 2:34 p.m.

But they might want to find out which color of dress was most popular in the summer of 2006 in

the Midwestern United States.

The base level determines whether analysts can get an answer to this question. For this

particular question, Time could be rolled up into months, Customer could be rolled up into

regions, and Product could be rolled up into items (such as dresses) with an attribute of color.

However, this level of aggregate data could not answer the question: At what time of day are

women most likely to place an order? An important decision is the extent to which the data has

been aggregated before being loaded into a data warehouse.

Dimensions

Dimensions contain a set of unique values that identify and categorize data. They form

the edges of a cube, and thus of the measures within the cube.

Because measures are typically multidimensional, a single value in a measure must be

qualified by a member of each dimension to be meaningful. For example, the Sales measure has

four dimensions: Time, Customer, Product, and Channel. A particular Sales value (43,613.50)

only has meaning when it is qualified by a specific time period (Feb-06), a customer (Warren

Systems), a product (Portable PCs), and a channel (Catalog).

Base-level dimension values correspond to the unique keys of a fact table.

19

Hierarchies and Levels

A hierarchy is a way to organize data at different levels of aggregation. In viewing data,

analysts use dimension hierarchies to recognize trends at one level, drill down to lower levels to

identify reasons for these trends, and roll up to higher levels to see what affect these trends have

on a larger sector of the business.

The elements of a dimension can be organized as a hierarchy, a set of parent-child

relationships, typically where a parent member summarizes its children. Parent elements can

further be aggregated as the children of another parent.

For example May 2005's parent is Second Quarter 2005 which is in turn the child of Year

2005. Similarly cities are the children of regions; products roll into product groups and

individual expense items into types of expenditure.

Level-Based Hierarchies

Each level represents a position in the hierarchy. Each level above the base (or most

detailed) level contains aggregate values for the levels below it. The members at different levels

have a one-to-many parent-child relation. For example, Q1-05 and Q2-05 are the children of

2005, thus 2005 is the parent of Q1-05 and Q2-05.

Suppose a data warehouse contains snapshots of data taken three times a day, that is,

every 8 hours. Analysts might normally prefer to view the data that has been aggregated into

days, weeks, quarters, or years. Thus, the Time dimension needs a hierarchy with at least five

levels.

Hierarchies and levels have a many-to-many relationship. A hierarchy typically contains

several levels, and a single level can be included in more than one hierarchy.

Each level typically corresponds to a column in a dimension table or view. The base level

is the primary key.

Value-Based Hierarchies

Although hierarchies are typically composed of named levels, they do not have to be. The

parent-child relations among dimension members may not define meaningful levels. For

example, in an employee dimension, each manager has one or more reports, which forms a

parent-child relation. Creating levels based on these relations (such as individual contributors,

first-level managers, second-level managers, and so forth) may not be meaningful for analysis.

Likewise, the line item dimension of financial data does not have levels. This type of hierarchy is

called a value-based hierarchy.

Attributes

An attribute provides additional information about the data. Some attributes are used

for display. You might have attributes like colors, flavors, or sizes. This type of attribute can be

used for data selection and answering questions such as: Which colors were the most popular in

women's dresses in the summer of 2005? How does this compare with the previous summer?

Time attributes can provide information about the Time dimension that may be useful in

some types of analysis, such as identifying the last day or the number of days in each time

period.

Each attribute typically corresponds to a column in dimension table or view.
http://docs.oracle.com/cd/B28359_01/olap.111/b28124/awgloss.htm#i432301http://en.wikipedia.org/wiki/Hierarchyhttp://docs.oracle.com/cd/B28359_01/olap.111/b28124/awgloss.htm#i432305http://docs.oracle.com/cd/B28359_01/olap.111/b28124/awgloss.htm#BABEJAGBhttp://docs.oracle.com/cd/B28359_01/olap.111/b28124/awgloss.htm#i433187

OLAP Course

Documents

Transcript of OLAP Course