OLAP Course

19
1 OLAP The best starting point to approach the multidimensional model - queries for which this model is best suited (Jarke et al., 2000): "What is the total amount of receipts recorded last year per state and per product category?" "What is the relationship between the trend of PC manufacturers' shares and quarter gains over the last five years?" "Which orders maximize receipts?" "Which one of two new treatments will result in a decrease in the average period of admission?" "What is the relationship between profit gained by the shipments consisting of less than 10 items and the profit gained by the shipments of more than 10 items?" It is clear that using traditional languages, such as SQL, to express these types of queries can be a very difficult task for inexperienced users. It is also clear that running these types of queries against operational databases would result in an unacceptably long response time. The multidimensional model begins with the observation that the factors affecting decision- making processes are enterprise-specific facts, such as sales, shipments, hospital admissions, surgeries, and so on. Instances of a fact correspond to events that occurred. For example, every single sale or shipment carried out is an event. Each fact is described by the values of a set of relevant measures that provide a quantitative description of events. For example, sales receipts, amounts shipped, hospital admission costs, and surgery time are measures. Obviously, a huge number of events occur in typical enterprisestoo many to analyze one by one. Imagine placing them all into an n-dimensional space to help us quickly select and sort them out. The n-dimensional space axes are called analysis dimensions, and they define different perspectives to single out events. Examples: - the sales in a store chain can be represented in a three-dimensional space whose dimensions are products, stores, and dates. - as far as shipments are concerned, products, shipment dates, orders, destinations, and terms & conditions can be used as dimensions. - hospital admissions can be defined by the department-date-patient combination, and you would need to add the type of operation to classify surgery operations. The concept of dimension gave life to the broadly used metaphor of cubes to represent multidimensional data. According to this metaphor, events are associated with cube cells and cube edges stand for analysis dimensions. If more than three dimensions exist, the cube is called a hypercube. Each cube cell is given a value for each measure. Its analysis dimensions are store, product and date. An event stands for a specific item sold in a specific store on a specific date, and it is described by two measures: the quantity sold

description

OLAP in BI

Transcript of OLAP Course

  • 1

    OLAP

    The best starting point to approach the multidimensional model - queries for which this model is

    best suited (Jarke et al., 2000):

    "What is the total amount of receipts recorded last year per state and per product

    category?"

    "What is the relationship between the trend of PC manufacturers' shares and quarter

    gains over the last five years?"

    "Which orders maximize receipts?"

    "Which one of two new treatments will result in a decrease in the average period of

    admission?"

    "What is the relationship between profit gained by the shipments consisting of less than

    10 items and the profit gained by the shipments of more than 10 items?"

    It is clear that using traditional languages, such as SQL, to express these types of queries can

    be a very difficult task for inexperienced users.

    It is also clear that running these types of queries against operational databases would result

    in an unacceptably long response time.

    The multidimensional model begins with the observation that the factors affecting decision-

    making processes are enterprise-specific facts, such as sales, shipments, hospital admissions,

    surgeries, and so on.

    Instances of a fact correspond to events that occurred. For example, every single sale or

    shipment carried out is an event.

    Each fact is described by the values of a set of relevant measures that provide a quantitative

    description of events. For example, sales receipts, amounts shipped, hospital admission costs,

    and surgery time are measures.

    Obviously, a huge number of events occur in typical enterprisestoo many to analyze one

    by one. Imagine placing them all into an n-dimensional space to help us quickly select and sort

    them out.

    The n-dimensional space axes are called analysis dimensions, and they define different

    perspectives to single out events.

    Examples:

    - the sales in a store chain can be represented in a three-dimensional space whose dimensions

    are products, stores, and dates.

    - as far as shipments are concerned, products, shipment dates, orders, destinations, and terms

    & conditions can be used as dimensions.

    - hospital admissions can be defined by the department-date-patient combination, and you

    would need to add the type of operation to classify surgery operations.

    The concept of dimension gave life to the broadly used metaphor of cubes to represent

    multidimensional data. According to this metaphor, events are associated with cube cells and

    cube edges stand for analysis dimensions. If more than three dimensions exist, the cube is

    called a hypercube. Each cube cell is given a value for each measure.

    Its analysis dimensions are store, product and date. An event stands for a specific item

    sold in a specific store on a specific date, and it is described by two measures: the quantity sold

  • 2

    and the receipts. This figure highlights that the cube is sparsethis means that many events did

    not actually take place. Of course, you cannot sell every item every day in every store.

    History

    The first commercial multidimensional (OLAP) products appeared approximately 30

    years ago (Express). When Edgar Codd introduced the OLAP definition in his 1993 white paper,

    there were already dozens of OLAP.

    After Codd's research appeared, the software industry began appreciating OLAP

    functionality and many companies have integrated OLAP features into their products (RDBMS,

    integrated business intelligence suites, reporting tools, portals, etc.). In addition, for the last

    decade, pure OLAP tools have considerably improved and become cheaper and more user-

    friendly. These developments brought OLAP functionality to a much broader range of users and

    organizations.

    Now OLAP is used not only for strategic decision-making in large corporations, but also

    to make daily tactical decisions about how to better streamline business operations in

    organizations of all sizes and shapes.

    However, the acceptance of OLAP is far from maximized. For example, one year ago,

    The OLAP Survey 2 found that only thirty percent of its participants actually used OLAP.

    Definitions

    An OLAP cube is an array of data understood in terms of its 0 or more dimensions.

    OLAP is an acronym for online analytical processing. OLAP is a computer-based technique for

    analyzing business data in the search for business intelligence.

    Source: Wikipedia.org

    An online analytical processing cube (OLAP cube) is a multidimensional array of data

    that serves as a database optimized for OLAP applications and data warehousing. It is a way of

    storing relevant data in a multidimensional form to make it appear more logical when used to

    generate reports and facilitate more efficient analytics.

    Source: http://www.techopedia.com/definition/21142/online-analytical-processing-cube-olap-cube

    An OLAP cube is a multidimensional database that is optimized for data warehouse

    and online analytical processing (OLAP) applications.

    An OLAP cube is a method of storing data in a multidimensional form, generally for

    reporting purposes. In OLAP cubes, data (measures) are categorized by dimensions. OLAP

    cubes are often pre-summarized across dimensions to drastically improve query time over

    relational databases. The query language used to interact and perform tasks with OLAP cubes is

    multidimensional expressions (MDX). The MDX language was originally developed by

    Microsoft in the late 1990s, and has been adopted by many other vendors of multidimensional

    databases.

    Although it stores data like a traditional database does, an OLAP cube is structured very

    differently. Databases, historically, are designed according to the requirements of the IT systems

    that use them. OLAP cubes, however, are used by business users for advanced analytics. Thus,

    http://searchdatabase.techtarget.com/gDefinition/0,294236,sid13_gci214137,00.htmlhttp://www.survey.com/products/olap2/index.htmlhttp://en.wikipedia.org/wiki/Array_data_structurehttp://en.wikipedia.org/wiki/Online_analytical_processinghttp://en.wikipedia.org/wiki/Business_intelligencehttp://www.techopedia.com/definition/21142/online-analytical-processing-cube-olap-cubehttp://www.techopedia.com/definition/21142/online-analytical-processing-cube-olap-cubehttp://searchoracle.techtarget.com/definition/multidimensional-databasehttp://searchdatamanagement.techtarget.com/definition/OLAPhttp://searchsqlserver.techtarget.com/definition/multidimensional-expressions-MDX
  • 3

    OLAP cubes are designed using business logic and understanding. They are optimized for

    analytical purposes, so that they can report on millions of records at a time.

    Source: http://searchoracle.techtarget.com/tip/Why-OLAP-deserves-more-attention

    Stands for "Online Analytical Processing." OLAP allows users to analyze database

    information from multiple database systems at one time. While relational databases are

    considered to be two-dimensional, OLAP data is multidimensional, meaning the information

    can be compared in many different ways. For example, a company might compare their

    computer sales in June with sales in July, then compare those results with the sales from another

    location, which might be stored in a different database.

    Source: http://www.techterms.com/definition/olap

    Basically OLAP is an awful name, Nigel Pendse, author of the OLAP report calls the

    same thing FASMI, which I think is a far better term :

    Fast - 90% of queries back in under 10 secs and no query takes longer than 30 secs.

    Analysis - Drill down, multiple aggregation techniques, sophisticated graphics, trends all form part of this

    Shareable - good security at the back end and available to a wide community of users.also multi currency, multi lingual to cope with the global economy.

    Multi-Dimensional - Excel pivot tables but more so. The ability to have any multiple dimensions of information on each axis of a cross-tab with other dimensions

    being used to further filter the results returned.

    Information - Real world KPI's rather than raw numbers. Source: Andrew.Fryer, OLAP, Cubes and Multidimensional Analysis, available at:

    http://blogs.technet.com/b/andrew/archive/2007/08/22/olap-cubes-and-multidimensional-

    analysis.aspx

    http://searchoracle.techtarget.com/tip/Why-OLAP-deserves-more-attentionhttp://www.techterms.com/definition/olaphttp://www.olap-report.com/nigel_pendse.phphttp://blogs.technet.com/Andrew.Fryer/ProfileUrlRedirect.ashx
  • 4

    In OLAP the cube is the database structure that is queried on and to get a handle on how this

    works below is a simple 3 dimensional cube

    Exemple1.

    The coordinate system in a cube not only has a reference to a point in multidimensional

    space it also has an understanding of hierarchies. So the cube 'knows' that January 2007 has a

    parent called 2007 in the example above. This forms a key part of the OLAP concept - that the

    results of calculations can be stored at the parent level rather than using on the fly aggregation of

    all the children e.g. the sales total for 2007 is stored in the cube for bike, components etc. as is

    the cost of sale.

    The profit margin % has to be worked out on the fly for bikes for 2007 but this is quick as

    the cost of sales and the sales that contribute to this calculation are pre-calculated. This gives

    OLAP it's speed while allowing for rich calculations to be stored. As always in IT there is a

    catch, and in this case is the complexity of the language used to query a cube and that is MDX

    or multi-dimensional expressions.

    Exemple2

    Years later, the technology has been sufficiently perfected to make OLAP against large data

    warehouses feasible, truly bringing the "intelligence" to business intelligence. A huge departure

    from traditional relational design, OLAP allows the data to be stored and accessed in the most

    http://blogs.technet.com/blogfiles/andrew/WindowsLiveWriter/OLAPCubesandMultidimensionalAnalysis_10C03/cube.jpg
  • 5

    efficient mannerallowing end-users to traverse the edges of a hypothetical "cube" of many

    dimensions. (See below for an example of such a data cube).

    The cube's dimensions are associated with facts (also called "measures"). In relational terms, the

    facts have a many-to-one relationship with the dimensions. For example, Acme Computer

    Supplies may have a database for sales. Dimensions are usually Customers, Products, and Time

    Element (month, quarter, etc.). The sales figure for a specific product (Cat5e cables) to a specific

    customer (Oracle Corp.) during a specific time period (Aug 2008) is one measure. The

    dimensions are stored on individual tables and so are the factsi.e. the sales figure. So the fact

    table, in relational terminology, is a child table of the dimension tables.

    But that's where the analogy ends. The access to the measures in relational design would have

    been through indexes created on the customer, product, or time columns of the fact table. In the

    OLAP approach, specific cells (the measures) are accessed by traversing the cube: in this

    example, by going to the slice containing the time - Aug 08; then product - Cat5e; and finally the

    customer - Oracle.

    Oracle knows how to go to these slices by calculating the destination as in an array, not a table.

    For instance, suppose the dimensions are organized as shown below:

    Dimension Time := {'May','Jun','Jul','Aug'}

    Dimension Customer := {'Microsoft','IBM','Oracle','HP'}

    Dimension Product := {'Fiber','Cat6e','Cat5e','Serial'}

  • 6

    To find the measure for Oracle + Aug + Cat5e, the OLAP engine performs the navigation like

    this:

    1. Aug 08 is the fourth element of the array called Time, so travel to the fourth cell along the time dimension of the cube.

    2. Cat5e is the third element of the Product array, so travel to the third element. 3. Oracle is the third element of the Customer array, so travel to the third element.

    That's it; now you've arrived at the measure you want. This is done without indexes since the

    dimension values serve as array pointers. Similarly, if you want to calculate the total sales to all

    customers in Aug 08, you do the same thing as above, except that in Step 3 you total the

    measures of the elements of the array without going to a specific cell.

    OLAP versus OLTP

    Hari Mailvaganam, Slice, Dice and Drill! , available at

    http://www.dwreview.com/OLAP/Introduction_OLAP.html

    OLAP allows business users to slice and dice data at will. Normally data in an organization is

    distributed in multiple data sources and are incompatible with each other. A retail example:

    Point-of-sales data and sales made via call-center or the Web are stored in different location and

    formats. It would a time consuming process for an executive to obtain OLAP reports such as -

    What are the most popular products purchased by customers between the ages 15 to 30?

    Part of the OLAP implementation process involves extracting data from the various data

    repositories and making them compatible. Making data compatible involves ensuring that the

    meaning of the data in one repository matches all other repositories. An example of incompatible

    data: Customer ages can be stored as birth date for purchases made over the web and stored as

    age categories (i.e. between 15 and 30) for in store sales.

    It is not always necessary to create a data warehouse for OLAP analysis. Data stored by

    operational systems, such as point-of-sales, are in types of databases called OLTPs. OLTP,

    Online Transaction Process, databases do not have any difference from a structural perspective

    from any other databases. The main difference, and only, difference is the way in which data is

    stored.

    Examples of OLTPs can include ERP, CRM, SCM, Point-of-Sale applications, Call

    Center.

    OLTPs are designed for optimal transaction speed. When a consumer makes a purchase

    online, they expect the transactions to occur instantaneously. With a database design, call data

    modeling, optimized for transactions the record 'Consumer name, Address, Telephone, Order

    Number, Order Name, Price, Payment Method' is created quickly on the database and the results

    can be recalled by managers equally quickly if needed.

    http://www.dwreview.com/OLAP/Introduction_OLAP.htmlhttp://www.dwreview.com/DW_Overview.html
  • 7

    Figure 1. Data Model for OLTP

    Data are not typically stored for an extended period on OLTPs for storage cost and

    transaction speed reasons.

    OLAPs have a different mandate from OLTPs. OLAPs are designed to give an overview

    analysis of what happened. Hence the data storage (i.e. data modeling) has to be set up

    differently. The most common method is called the star design.

    http://www.dwreview.com/Articles/Data_LifeCycle.html
  • 8

    Figure 2. Star Data Model for OLAP

    The central table in an OLAP start data model is called the fact table. The surrounding

    tables are called the dimensions.

    Using the above data model, it is possible to build reports that answer questions such as:

    The supervisor that gave the most discounts.

    The quantity shipped on a particular date, month, year or quarter.

    In which zip code did product A sell the most. To obtain answers, such as the ones above, from a data model OLAP cubes are created.

    OLAP cubes are not strictly cuboids - it is the name given to the process of linking data from the

    different dimensions. The cubes can be developed along business units such as sales or

    marketing. Or a giant cube can be formed with all the dimensions.

    Figure 3. OLAP Cube with Time, Customer and Product Dimensions

    OLAP can be a valuable and rewarding business tool. Aside from producing reports,

    OLAP analysis can aid an organization evaluate balanced scorecard targets.

  • 9

    Figure 4. Steps in the OLAP Creation Process

    OLAP Storage

    OLAP storage is one of the critical choices to be made when designing the solution.

    OLAP storage comes in three forms:

    MOLAP - Multidimensional OLAP. In MOLAP, both the source data and the

    aggregations are stores in a multidimensional format. MOLAP is the fastest option for data

    retrieval, but requires the most disk space. Disk space is less of a concern these days with

    lowering storage and processing cost.

    ROLAP - Relational OLAP. All data, including the aggregations are stored within the

    source relational database. This will be a concern for larger data warehousing implementations

    which have higher usage needs. ROLAP is the slowest for data retrieval. Whether an aggregation

    exists or not, a ROLAP database must access the data warehouse itself. ROLAP is best suited for

    smaller data warehousing implementations.

    HOLAP - Hybrid OLAP. HOLAP is a combination of both the above storage

    methodologies. HOLAP databases store the aggregations that exist within a multidimensional

    structure, leaving the cell-level data itself in a relational form. Where the data is pre aggregated,

    HOLAP offers the performance of MOLAP, where the data must be fetched from the tables.

    HOLAP is as slow as ROLAP.

    Due to shrinking hardware and processing cost, MOLAP are generally most often used.

    HOLAP is a better solution if the solution is accessing a stand-alone database. ROLAP are more

    convenient to set up when the query demands are relatively low and also on a stand-alone

    database.

    http://businessintelligence.ittoolbox.com/documents/advantagesdisadvantages-of-molap-

    rolap-and-holap-15897

    MOLAP Excellent performance- this is the more traditional way of OLAP analysis. In MOLAP, data is

    stored in a multidimensional cube. The storage is not in the relational database, but in proprietary

    formats.

    Advantages:

    MOLAP cubes are built for fast data retrieval, and are optimal for slicing and dicing operations.

    They can also perform complex calculations. All calculations have been pre-generated when the

    cube is created. Hence, complex calculations are not only doable, but they return quickly.

    http://businessintelligence.ittoolbox.com/documents/advantagesdisadvantages-of-molap-rolap-and-holap-15897http://businessintelligence.ittoolbox.com/documents/advantagesdisadvantages-of-molap-rolap-and-holap-15897
  • 10

    Disadvantages:

    It is limited in the amount of data it can handle. Because all calculations are performed when the

    cube is built, it is not possible to include a large amount of data in the cube itself. This is not to

    say that the data in the cube cannot be derived from a large amount of data. Indeed, this is

    possible. But in this case, only summary-level information will be included in the cube itself.

    It requires an additional investment. Cube technology are often proprietary and do not

    already exist in the organization. Therefore, to adopt MOLAP technology, chances are additional

    investments in human and capital resources are needed.

    ROLAP This methodology relies on manipulating the data stored in the relational database to give the

    appearance of traditional OLAP's slicing and dicing functionality. In essence, each action of

    slicing and dicing is equivalent to adding a "WHERE" clause in the SQL statement.

    Advantages:

    It can handle large amounts of data. The data size limitation of ROLAP technology is the

    limitation on data size of the underlying relational database. In other words, ROLAP itself places

    no limitation on data amount.

    It can leverage functionalities inherent in the relational database. Often, relational

    database already comes with a host of functionalities. ROLAP technologies, since they sit on top

    of the relational database, can therefore leverage these functionalities.

    Disadvantages:

    Performance can be slow. Because each ROLAP report is essentially a SQL query (or multiple

    SQL queries) in the relational database, the query time can be long if the underlying data size is

    large.

    It has limited by SQL functionalities. Because ROLAP technology mainly relies on generating

    SQL statements to query the relational database, and SQL statements do not fit all needs (for

    example, it is difficult to perform complex calculations using SQL), ROLAP technologies are

    therefore traditionally limited by what SQL can do. ROLAP vendors have mitigated this risk by

    building into the tool out-of-the-box complex functions as well as the ability to allow users to

    define their own functions.

    HOLAP HOLAP technologies attempt to combine the advantages of MOLAP and ROLAP. For

    summary-type information, HOLAP leverages cube technology for faster performance. When

    detail information is needed, HOLAP can "drill through" from the cube into the underlying

    relational data.

    Disclaimer: Contents are not reviewed for correctness and are not endorsed or

    recommended by Toolbox.com or any vendor.

    Popular Q&A contents include summarized information from Business Intelligence

    Career discussion unless otherwise noted.

    http://it.toolbox.com/trd/95/7/9813/http://businessintelligence.ittoolbox.com/groups/career/bi-career/http://businessintelligence.ittoolbox.com/groups/career/bi-career/
  • 11

    Operations

    The information in a multidimensional cube is very difficult for users to manage because

    of its quantity, even if it is a concise version of the information stored to operational databases.

    If, for example, a store chain includes 50 stores selling 1000 items, and a specific data warehouse

    covers three-year-long transactions (approximately 1000 days), the number of potential events

    totals 50 1000 1000 = 5 10(7th). Assuming that each store can sell only 10 percent of all

    the available items per day, the number of events totals 5 10(6th). This is still too much data to

    be analyzed by users without relying on automatic tools.

    You have essentially two ways to reduce the quantity of data and obtain useful

    information: restriction and aggregation. The cube metaphor offers an easy-to-use and intuitive

    way to understand both of these methods, as we will discuss in the following paragraphs.

    Restriction Restricting data means separating part of the data from a cube to mark out an analysis

    field. In relational algebra terminology, this is called making selections and/or projections.

    Selection has two forms: slicing and dicing.

    Restriction - selections - slicing

    dicing

    - projections

    Common operations include Slice and Dice, Drill-Down, Roll-Up, and Pivot:

    Source: Multidimensional OLAP Cubes, available at: http://www.practicaldb.com/blog/cubes/

    When you slice data, you decrease cube dimensionality by setting one or more

    dimensions to a specific value. For example, if you set one of the sales cube dimensions to a

    value, such as store='EverMore', this results in the set of events associated with the items sold in

    the EverMore store.

    According to the cube metaphor, this is simply a plane of cellsthat is, a data slice that

    can be easily displayed in spreadsheets.

    In the store chain example given earlier, approximately 10(5th) events still appear in your

    result. If you set two dimensions to a value, such as store='EverMore' and date='4/5/2008', this

    will result in all the different items sold in the EverMore store on April 5 (approximately 100

    events). Graphically speaking, this information is stored at the intersection of two perpendicular

    planes resulting in a line. If you set all the dimensions to a particular value, you will define just

    one event that corresponds to a point in the three-dimensional space of sales.

    Dicing is a generalization of slicing. It poses some constraints on dimensional attributes

    to scale down the size of a cube. For example, you can select only the daily sales of the food

    items in April 2008 in Florida. In this way, if five stores are located in Florida and 50 food

    products are sold, the number of events to examine changes to 5 50 30 = 7500.

    Finally, a projection can be referred to as a choice to keep just one subgroup of measures

    for every event and reject other measures.

    http://www.practicaldb.com/blog/cubes/
  • 12

    Slice: A slice is a subset of a multi-dimensional array corresponding to a single value for

    one or more members of the dimensions not in the subset.

    Slice is any two-dimensional slice of the data cube. You slice a data cube to filter information.

    For example, the figure below shows a data cube with following dimensions: Retailer, Date and

    Product

    If you are interested only in the data for a specific retailer you can slice off a single (two

    dimensional) layer. In our example the slice contains information on date and product for

    department stores.

    http://www.practicaldb.com/?attachment_id=118865
  • 13

    Dice: The dice operation is a slice on more than two dimensions of a data cube (or more

    than two consecutive slices).

    Dice is the "rotation" of the cube to reveal another, different slice of data.

    For exploring data from various perspectives, you can dice a data cube by exchanging the

    dimension for other dimensions.

    For example, after exploring the data by date and product for a specific retailer (orange slice on

    the left cube), you want to get deeper information on date and retailer for a specific product.

    http://www.practicaldb.com/?attachment_id=119731
  • 14

    Drill Down/Up: Drilling down or up is a specific analytical technique whereby the user

    navigates among levels of data ranging from the most summarized (up) to the most detailed

    (down).

    Drill Down is the exploration of data to subsequent levels of more detail along a

    dimension.

    For example, the dimension "Retailer" can be drilled-down to specific retailers, the

    dimension "Date" can be drilled-down to months, and the dimension "Product" finally, can be

    explored in more detail by single products.

    http://www.practicaldb.com/?attachment_id=119732
  • 15

    Roll-up: (Aggregate, Consolidate) A roll-up involves computing all of the data

    relationships for one or more dimensions. To do this, a computational relationship or formula

    might be defined.

    Roll-up is the aggregation of data to subsequent levels of summary, along a dimension. This

    implies that dimensions are typically hierarchical in nature based on parent/child relationships

    between dimension values.

    Pivot: This operation is also called rotate operation. It rotates the data in order to provide

    an alternative presentation of data the report or page display takes a different dimensional

    orientation.

    Conclusions

    In summary, a multidimensional cube hinges on a fact relevant to decision-making.

    It shows a set of events for which numeric measures provide a quantitative description.

    Each cube axis shows a possible analysis dimension. Each dimension can be analyzed at

    different detail levels specified by hierarchically structured attributes.

    http://www.practicaldb.com/?attachment_id=118866http://www.practicaldb.com/?attachment_id=119733
  • 16

    OLAP Benefits

    Successful OLAP applications increase the productivity of business managers, developers, and whole organizations. The inherent flexibility of OLAP systems means business

    users of OLAP applications can become more self-sufficient. Managers are no longer dependent

    on IT to make schema changes, to create joins, or worse. Perhaps more importantly, OLAP

    enables managers to model problems that would be impossible using less flexible systems with

    lengthy and inconsistent response times. More control and timely access to strategic information

    equal more effective decision-making.

    IT developers also benefit from using the right OLAP software. Although it is possible to build an OLAP system using software designed for transaction processing or data collection, it is

    certainly not a very efficient use of developer time. By using software specifically designed for

    OLAP, developers can deliver applications to business users faster, providing better service.

    Faster delivery of applications also reduces the applications backlog.

    OLAP reduces the applications backlog still further by making business users self-sufficient enough to build their own models. However, unlike standalone departmental

    applications running on PC networks, OLAP applications are dependent on Data Warehouses

    and transaction processing systems to refresh their source level data. As a result, IT gains more

    self-sufficient users without relinquishing control over the integrity of the data.

    IT also realizes more efficient operations through OLAP. By using software designed for OLAP, IT reduces the query drag and network traffic on transaction systems or the Data

    Warehouse.

    Lastly, by providing the ability to model real business problems and a more efficient use of people resources, OLAP enables the organization as a whole to respond more quickly to

    market demands. Market responsiveness, in turn, often yields improved revenue and

    profitability.

    OLAP functionality is:

    Multidimensional -- OLAP services provide a wide variety of possible views or a

    multidimensional conceptual view of the data by supporting a dimensional aggregation path

    or hierarchies and/or multiple hierarchies.

    Easy to understand -- The data mart designed for OLAP analysis should handle any

    business logic and statistical analysis that is relevant to the application and the developer,

    while at the same time, keeps it easy enough for the target user.

    Interactive -- OLAP helps the user synthesize business information through comparative,

    personalized viewing, as well as thorough analysis of historical and projected data in various

    "what-if" data model scenarios. The users are allowed to define new ad hoc calculations as

    part of the analysis and can report on the data in any desired way.

    Fast -- OLAP services are usually implemented in a multi-user client/server mode and offer

    consistently rapid responses to queries, regardless of database size and complexity. The

    consolidated business data can be pre-aggregated along with the hierarchies in all dimensions

    to reduce the runtime calculation for building the OLAP reports.

  • 17

    Overview of the Dimensional Data Model

    Available at:

    http://docs.oracle.com/cd/B28359_01/olap.111/b28124/overview.htm#OLAUG9115

    Dimensional objects are an integral part of OLAP. Because OLAP is on-line, it must

    provide answers quickly; analysts pose iterative queries during interactive sessions, not in batch

    jobs that run overnight. And because OLAP is also analytic, the queries are complex. The

    dimensional objects and the OLAP engine are designed to solve complex queries in real time.

    The dimensional objects include cubes, measures, dimensions, attributes, levels and

    hierarchies.

    The simplicity of the model is inherent because it defines objects that represent real-

    world business entities.

    Analysts know:

    which business measures they are interested in examining

    which dimensions and attributes make the data meaningful

    how the dimensions of their business are organized into levels and hierarchies.

    Figure 1. Diagram of the OLAP Dimensional Model

    Description of Diagram of the OLAP Dimensional Model

    The dimensional data model is highly structured. Structure implies rules that govern the

    relationships among the data and control how the data can be queried. Cubes are the physical

    implementation of the dimensional model, and thus are highly optimized for dimensional

    queries. The OLAP engine leverages this innate dimensionality in performing highly efficient

    cross-cube joins for inter-row calculations, outer joins for time series analysis, and indexing.

    Dimensions are pre-joined to the measures. The technology that underlies cubes is based on an

    indexed multidimensional array model, which provides direct cell access.

    The OLAP engine manipulates dimensional objects in the same way that the SQL engine

    manipulates relational objects. However, because the OLAP engine is optimized to calculate

    analytic functions, and dimensional objects are optimized for analysis, analytic and row

    functions can be calculated much faster in OLAP than in SQL.

    http://docs.oracle.com/cd/B28359_01/olap.111/b28124/img_text/logicalm.htm
  • 18

    The dimensional model enables Oracle OLAP to support high-end business intelligence

    tools and applications such as OracleBI Discoverer Plus OLAP, OracleBI Spreadsheet Add-In,

    OracleBI Suite Enterprise Edition, BusinessObjects Enterprise, and Cognos ReportNet.

    Cubes

    Cubes provide a means of organizing measures that have the same shape, that is, they

    have the exact same dimensions. Measures in the same cube can easily be analyzed and

    displayed together. A cube usually corresponds to a single fact table or view.

    Measures

    Measures populate the cells of a cube with the facts collected about business operations.

    Measures are organized by dimensions, which typically include a Time dimension.

    An analytic database contains snapshots of historical data, derived from data in a

    transactional database, legacy system, syndicated sources, or other data sources. Three years of

    historical data is generally considered to be appropriate for analytic applications.

    Measures are static and consistent while analysts are using them to inform their decisions.

    They are updated in a batch window at regular intervals: weekly, daily, or periodically

    throughout the day. Some administrators refresh their data by adding periods to the time

    dimension of a measure, and may also roll off an equal number of the oldest time periods. Each

    update provides a fixed historical record of a particular business activity for that interval. Other

    administrators do a full rebuild of their data rather than performing incremental updates.

    A critical decision in defining a measure is the lowest level of detail. Users may never

    view this detail data, but it determines the types of analysis that can be performed. For example,

    market analysts (unlike order entry personnel) do not need to know that Beth Miller in Ann

    Arbor, Michigan, placed an order for a size 10 blue polka-dot dress on July 6, 2006, at 2:34 p.m.

    But they might want to find out which color of dress was most popular in the summer of 2006 in

    the Midwestern United States.

    The base level determines whether analysts can get an answer to this question. For this

    particular question, Time could be rolled up into months, Customer could be rolled up into

    regions, and Product could be rolled up into items (such as dresses) with an attribute of color.

    However, this level of aggregate data could not answer the question: At what time of day are

    women most likely to place an order? An important decision is the extent to which the data has

    been aggregated before being loaded into a data warehouse.

    Dimensions

    Dimensions contain a set of unique values that identify and categorize data. They form

    the edges of a cube, and thus of the measures within the cube.

    Because measures are typically multidimensional, a single value in a measure must be

    qualified by a member of each dimension to be meaningful. For example, the Sales measure has

    four dimensions: Time, Customer, Product, and Channel. A particular Sales value (43,613.50)

    only has meaning when it is qualified by a specific time period (Feb-06), a customer (Warren

    Systems), a product (Portable PCs), and a channel (Catalog).

    Base-level dimension values correspond to the unique keys of a fact table.

  • 19

    Hierarchies and Levels

    A hierarchy is a way to organize data at different levels of aggregation. In viewing data,

    analysts use dimension hierarchies to recognize trends at one level, drill down to lower levels to

    identify reasons for these trends, and roll up to higher levels to see what affect these trends have

    on a larger sector of the business.

    The elements of a dimension can be organized as a hierarchy, a set of parent-child

    relationships, typically where a parent member summarizes its children. Parent elements can

    further be aggregated as the children of another parent.

    For example May 2005's parent is Second Quarter 2005 which is in turn the child of Year

    2005. Similarly cities are the children of regions; products roll into product groups and

    individual expense items into types of expenditure.

    Level-Based Hierarchies

    Each level represents a position in the hierarchy. Each level above the base (or most

    detailed) level contains aggregate values for the levels below it. The members at different levels

    have a one-to-many parent-child relation. For example, Q1-05 and Q2-05 are the children of

    2005, thus 2005 is the parent of Q1-05 and Q2-05.

    Suppose a data warehouse contains snapshots of data taken three times a day, that is,

    every 8 hours. Analysts might normally prefer to view the data that has been aggregated into

    days, weeks, quarters, or years. Thus, the Time dimension needs a hierarchy with at least five

    levels.

    Hierarchies and levels have a many-to-many relationship. A hierarchy typically contains

    several levels, and a single level can be included in more than one hierarchy.

    Each level typically corresponds to a column in a dimension table or view. The base level

    is the primary key.

    Value-Based Hierarchies

    Although hierarchies are typically composed of named levels, they do not have to be. The

    parent-child relations among dimension members may not define meaningful levels. For

    example, in an employee dimension, each manager has one or more reports, which forms a

    parent-child relation. Creating levels based on these relations (such as individual contributors,

    first-level managers, second-level managers, and so forth) may not be meaningful for analysis.

    Likewise, the line item dimension of financial data does not have levels. This type of hierarchy is

    called a value-based hierarchy.

    Attributes

    An attribute provides additional information about the data. Some attributes are used

    for display. You might have attributes like colors, flavors, or sizes. This type of attribute can be

    used for data selection and answering questions such as: Which colors were the most popular in

    women's dresses in the summer of 2005? How does this compare with the previous summer?

    Time attributes can provide information about the Time dimension that may be useful in

    some types of analysis, such as identifying the last day or the number of days in each time

    period.

    Each attribute typically corresponds to a column in dimension table or view.

    http://docs.oracle.com/cd/B28359_01/olap.111/b28124/awgloss.htm#i432301http://en.wikipedia.org/wiki/Hierarchyhttp://docs.oracle.com/cd/B28359_01/olap.111/b28124/awgloss.htm#i432305http://docs.oracle.com/cd/B28359_01/olap.111/b28124/awgloss.htm#BABEJAGBhttp://docs.oracle.com/cd/B28359_01/olap.111/b28124/awgloss.htm#i433187