A Procedure of Conversion of Relational into Multidimensional Database … · 2017-05-03 ·...

16
Journal of Computing and Information Technology - CIT 10, 2002, 2, 69–84 69 A Procedure of Conversion of Relational into Multidimensional Database Schema Mladen Varga Faculty of Economics, University of Zagreb, Croatia It is universally recognized that operational information systems lean on the relational model and data warehouses on the multidimensional model. The phrase On-Line Analytical Processing OLAP means summarizing, con- solidating, viewing, and synthesizing data according to multiple dimensions. The process of modeling data warehouse may start from operational system’s database. It may be helpful to convert a relational database schema into a multidimensional database schema in order to dis- cover dimensions that are hidden in a relational database. However, only a few efforts have been done investigat- ing the conversion of relational into multidimensional database schema. This paper proposes the general proce- dure of such a conversion. The procedure can be partly automated because some decisions of attribute type must be made during conversion. Keywords: relational database, relational schema, mul- tidimensional database, multidimensional schema, con- version. 1. Introduction Operational information systems that lean on the relational model may be a starting point for the process of modeling data warehouse. Most data warehouses rely on the multidimensional model and it may be helpful to convert a rela- tional database schema into a multidimensional database schema. In this section, some general facts of relational and dimensional model are mentioned. In Sec- tion 2 the procedure of conversion is presented and in Section 3 a small example of the conver- sion is shown. 1.1. Relational Model A relational database consists of a set of rela- tions. A relational schema which is used to de- scribe a relation r, denoted by RA 1 A 2 A n is made up of a relation named R and a list of attributes A 1 , A 2 , , A n . Each attribute A i is the name of a role played by some domain D in the relation R. A relation r of the rela- tional schema RA 1 A 2 A n is a set of tu- ples r t 1 t 2 t m . A relational database is described by a relational database schema that consists of a set of relational schemas. If for any two distinct tuples t 1 and t 2 in the relation r of R there exists attribute set of at- tributes K such that t 1 K t 2 K , then such attribute set of attributes is called key. It is common to designate one of possible keys as the primary key, which is used to identify tuples in the relation. A set of attributes FK in the relational schema R is a foreign key of R 1 if two rules are satisfied: The attributes in FK have the same domain as the primary key PK of another relational schema attributes FK are said to reference relation R 2 , A value of FK in a tuple t 1 of R 1 either oc- curs as a value of PK for some tuple t 2 in R 2 or is null. 1.2. Dimensional Model Let us suppose that a hospital database contains admission data such as number of days and

Transcript of A Procedure of Conversion of Relational into Multidimensional Database … · 2017-05-03 ·...

Page 1: A Procedure of Conversion of Relational into Multidimensional Database … · 2017-05-03 · tidimensional database, multidimensional schema, con-version. 1. Introduction Operational

Journal of Computing and Information Technology - CIT 10, 2002, 2, 69–84 69

A Procedure of Conversionof Relational into MultidimensionalDatabase Schema

Mladen VargaFaculty of Economics, University of Zagreb, Croatia

It is universally recognized that operational informationsystems lean on the relational model and data warehouseson the multidimensional model. The phrase On-LineAnalytical Processing �OLAP� means summarizing, con-solidating, viewing, and synthesizing data according tomultiple dimensions. The process of modeling datawarehouse may start from operational system’s database.It may be helpful to convert a relational database schemainto a multidimensional database schema in order to dis-cover dimensions that are hidden in a relational database.However, only a few efforts have been done investigat-ing the conversion of relational into multidimensionaldatabase schema. This paper proposes the general proce-dure of such a conversion. The procedure can be partlyautomated because some decisions of attribute type mustbe made during conversion.

Keywords: relational database, relational schema, mul-tidimensional database, multidimensional schema, con-version.

1. Introduction

Operational information systems that lean onthe relational model may be a starting point forthe process of modeling data warehouse. Mostdata warehouses rely on the multidimensionalmodel and it may be helpful to convert a rela-tional database schema into a multidimensionaldatabase schema.

In this section, some general facts of relationaland dimensional model are mentioned. In Sec-tion 2 the procedure of conversion is presentedand in Section 3 a small example of the conver-sion is shown.

1.1. Relational Model

A relational database consists of a set of rela-tions. A relational schema which is used to de-scribe a relation r, denoted byR�A1� A2� � � � � An�is made up of a relation named R and a listof attributes A1, A2,� � � , An. Each attribute Aiis the name of a role played by some domainD in the relation R. A relation r of the rela-tional schema R�A1� A2� � � � � An� is a set of tu-ples r � ft1� t2� � � � � tmg. A relational databaseis described by a relational database schema thatconsists of a set of relational schemas.

If for any two distinct tuples t1 and t2 in therelation r of R there exists attribute �set of at-tributes� K such that t1�K � �� t2�K �, then suchattribute �set of attributes� is called key. It iscommon to designate one of possible keys asthe primary key, which is used to identify tuplesin the relation.

A set of attributes FK in the relational schema Ris a foreign key of R1 if two rules are satisfied:

� The attributes in FK have the same domainas the primary key PK of another relationalschema attributes FK are said to referencerelation R2,

� A value of FK in a tuple t1 of R1 either oc-curs as a value of PK for some tuple t2 in R2or is null.

1.2. Dimensional Model

Let us suppose that a hospital database containsadmission data �such as number of days and

Page 2: A Procedure of Conversion of Relational into Multidimensional Database … · 2017-05-03 · tidimensional database, multidimensional schema, con-version. 1. Introduction Operational

70 A Procedure of Conversion of Relational into Multidimensional Database Schema

value� of the patient, the date of admission, andthe diagnosis. The admission data is determinedby three attributes Patient, Diagnosis and Time.They are referred to as dimensions while thedetermined admission attributes are referred toas measures or facts or fact attributes. The factsare mostly numerical, preferably continuouslyvalued and additive. They vary over time. In thestatistical database field �Shosani, 1997� the di-mension corresponds to category attribute, andthe measure to summary attribute. The cubemodel, shown in Fig. 1, is very useful in present-ing the concept of the multidimensional space.There is no a priori distinction between dimen-sions and measures while any attribute can playeither role �Gyssens, 1996�. There is no formalway to decide which attributes are dimensionsand which attributes are measures. This deci-sion has to be solved during database design.

Fig. 1. Cube model.

The structure of the dimensional model can berepresented by the star join schema �Kimball,1996� shown in Fig. 2. The center of the schemais the fact table, which is the only table inthe schema with multiple joins connecting it toother tables. The fact table is where the facts of

the business are stored �Kimball, 1996� such asNumberOfDays and Value in the hospital database.The other tables are the dimension tables. Di-mension attributes describe the item in the di-mension, and are virtually constant over time.The primary key of the fact table is compositeor concatenated key, which is the combinationof as many foreign keys as many dimensionsthere are in the schema. Each component ofthe composite key is a foreign key referencingthe primary key of a dimension table. In otherwords, every fact table represents a many-to-many relationship, that is, it contains as manyforeign keys as many dimensions there are inthe schema. A multidimensional database mayconsist of any number of star join schemas withsome dimension tables overlapping.

According to �Lehner, 1998� we shall intro-duce some definitions which will help placingand solving the problem of conversion of rela-tional database schema into multidimensionaldatabase schema.

The notion of functional dependency is of fun-damental importance: If A and B are �sets of�attributes in a database, B is functionally de-pendent on A or A functionally determines B,denoted by A � B, if and only if for each spe-cific value a � A, there exists at most one valueb � B.

Dimensions are usually organized into hierar-chies that specify aggregation level and hencegranularity of viewing data. A dimension isdefined over a dimensional schema. A dimen-sional schema is a set of dimensional attributesD � fD1� � � � � Dkg where for each Di � D thereexists a Dj � D�fDig such that either Di � Djor Dj � Di. A categorization C of a dimen-sional schema D is a sequence of functional de-

Fig. 2. Star join schema.

Page 3: A Procedure of Conversion of Relational into Multidimensional Database … · 2017-05-03 · tidimensional database, multidimensional schema, con-version. 1. Introduction Operational

A Procedure of Conversion of Relational into Multidimensional Database Schema 71

pendencies D1 � D2,� � � , Dk�1 � Dk whereDi � D for each i � 1,� � � , k. The attribute Diis the ith categorization level. A dimensionalattribute Dt � D is called terminal, if there is noD � D� fDtg such that D � Dt.

Example:

A simple dimensional schema of the productdimension consists of the dimension attributesProduct � fProduct, ProductGroup, ProductTypeg.The hierarchy on the product dimension Product� ProductGroup � ProductType is a set of func-tionally interrelated dimensional attributes Prod-uct, ProductGroup and ProductType. The attributeProduct is terminal and belongs to 1st categorylevel, ProductGroup to 2nd and ProductType to 3rd

category level.

A multidimensional database is defined over aset of multidimensional schemas. A multidi-mensional schema M � �fD1� � � � � Dng� S� con-sists of a set of dimensional schemas fD1� � � �,Dng, measures or facts S and a functional depen-dency fD1� � � � � Dng � S. Dimension tables aredescribed by dimensional schemas and a facttable by the dependency fD1� � � � � Dng � S.

A dimension defined by a dimensional schemaD is in dimensional normal form �DNF� if thefollowing conditions are satisfied:

� There exists exactly one terminal dimensionattribute Dt � D

� The values of Dt are complete, i.e. each ob-ject has to be assigned to some value of Dt�see explanation of the completeness condi-tion in Section on Summarizability�

Amultidimensional schema M � �fD1� � � � � Dng,S� is in multidimensional normal form if the fol-lowing conditions are satisfied:

� Each Di is in dimensional normal form

� The dimensions are orthogonal to each other,i.e. there exists no functional dependencyC � D where C � Di and D � Dj and i �� j

� The fact attribute S is fully functionally de-termined by the set of the terminal dimensionattributes of the dimensions

Summarizability of Fact Attributes

Multidimensional databases are mostly used toperform statistical analysis. Summarizability of

fact attributes is an extremely important prop-erty of a multidimensional schema. In gen-eral, summarizability �Lehner, 1998� is neededto ensure correct automatic aggregation oper-ations that are important in OLAP drill-downand roll-up operations. An incorrect summa-rization that is not obviously seen can lead toerroneous conclusions. The following example�Lenz, Shoshani, 1997� illustrates the problem.

Fig. 3. Summarizability example.

Data in Fig. 3 provide a count of the numberof university students organized by departmentand year. Suppose we wish to find out the num-ber of students that attended each departmentover the whole period of 3 years. If studentsattended the department for more than one year,then adding the counts for each year is incorrect.The total formath department 14�20�18 � 52is incorrect. If the math department started in1997 there were 14 students in 1997 who alsoattended it in 1998. Consequently, there wereonly 6 new students in 1998. These 6 studentscontinued in 1999, and 12 new students joinedfor a total of 18. Thus, the total number ofstudents who attended the math dept. over 3-year period is 14 � 6 � 12 � 32. Likewise,for chemistry dept. it is: 10 � 2 � 7 � 19.The problem is more complicated and, in gen-eral, it is not possible to compute summaries fora non-summarizable fact because the semanticrule may not be known. In the example above,it is assumed that all students attend a 2-yearprogram but in reality this is not true. Some stu-dents attend more than 2 years some drop afterthe first year. Thus, for non-summarizable factthe totals can be extracted only from the basedata, which includes the information about eachindividual student. This non-summarizabilityof the number of students is not absolute forboth dimensions. The number of students issummarizable to the department dimension. Inthe year 1997 there were 24 students, 14 in themath department and 10 in the chemistry de-partment.

Page 4: A Procedure of Conversion of Relational into Multidimensional Database … · 2017-05-03 · tidimensional database, multidimensional schema, con-version. 1. Introduction Operational

72 A Procedure of Conversion of Relational into Multidimensional Database Schema

The summarizability problem is more com-plicated than shown in the previous example.�Lenz, Shoshani, 1997� discuss problem morethoroughly and argue that three conditions aresufficient for solving summarizability problem.The necessary conditions for summarizabilityare:

� Disjointness of dimensions �dimension at-tributes�

� Completeness

� Type compatibility of function and fact at-tribute

Disjointness of dimension attributes states thatthe dimension attributes must form disjoint sub-sets. This condition is satisfied if categorizationof a dimensional schema D is a sequence of func-tional dependencies D1 � D2� � � � � Dk�1 � Dkwhere Di � D for each i � 1� � � � � k. The ex-ample is the hierarchy on the Time dimensionDay� Month�Year. The year 1999 consists ofa disjoint set of month attribute: Jan 1999, Feb1999,� � � , Dec 1999. The month Jan 1999 con-sists of a disjoint set of day attribute: 1st day ofJan 1999, 2nd day of Jan 1999 etc.

Once the disjointness condition is satisfied, itis necessary to test whether the grouping of ob-jects into clusters is complete. First, all ele-ments of the cluster must exist, i.e. the unionof all clusters must constitute the entire set ofobjects. There must be no missing object, i.e.the object that does not belong to a cluster. Inother words, completeness states that each ob-ject must be assigned to some category. Forexample, if some product is not assigned to aproduct group, the condition is violated. Sec-ond, allmembers of subset valuesmust exist and

their union must constitute the entire set. Fig. 4shows examples of violation of the disjointnessand the completeness conditions.

Type compatibility condition depends on thetype of fact attribute as well as on the statisticalfunction applied. For every fact attribute and forevery dimension it must be determined whethersummarization is allowed or not. Fact attributes�Kimball, 1996; Lenz, Shoshani, 1997� may beclassified as:

� additive, if it is additive along all dimen-sions �such as flow or rate�: annual income,weekly number of rainy days

� semiadditive, if it is not additive along oneor more dimensions, usually temporal di-mension �such as level or stock�: number ofstudents in a class, inventory of goods

� nonadditive, if it is additive along no dimen-sion �such as value-per-unit�: item price

Flow refers to periods and is recorded at theend of each period. It records some cumula-tive effect over a period. Level is measured andrecorded at a particular point of time. It recordsthe state at the specific point in time. Value-per-unit is usually the current value of the measuredunit.

The most important function is sum. Sum isprohibited to value-per-unit data in general andlikewise to level data along temporal dimen-sions. Therefore, summarization is performeddifferently to temporal dimensions as opposedto non-temporal dimensions. Next table �Fig.5� �Lenz, Shoshani, 1997� shows how the typeof attribute affects the sum function along tem-poral and non-temporal dimensions.

Fig. 4. Examples of violation of the disjointness and the completeness conditions.

Page 5: A Procedure of Conversion of Relational into Multidimensional Database … · 2017-05-03 · tidimensional database, multidimensional schema, con-version. 1. Introduction Operational

A Procedure of Conversion of Relational into Multidimensional Database Schema 73

Fig. 5. Sum function along temporal and non-temporal dimensions.

In the design phase of amultidimensional schemathe type of the dimension �temporal, non-tem-poral�, the type of fact attribute �additive: flowor rate, semiadditive: level or stock, nonad-ditive: value-per-unit� has to be determined.�Lehner et al., 1998� suggest that in the graphi-cal notation every temporal dimension ismarkedby a dot.

If we look at the definition of the multidimen-sional normal form, we can see that it ensuresdisjointness of dimension attributes and com-pleteness property. However, it does not auto-matically ensure type compatibility because itis dynamic property. Therefore, type compati-bility cannot be checked in the static conditionsof multidimensional normal form but in the dy-namic conditionswhen statistical operations areperformed.

2. Procedure of Conversion

The conversion of a relational into a multidi-mensional database schema has the followinggoals:

� Find all dimensions that exist but are hiddenin a relational database schema.

� Get the multidimensional database schemathat is equally expressive as its initial re-lational schema, i.e. the multidimensionalschema must reflect the same applicationknowledge.

� The resultingmultidimensional schemamustbe in the multidimensional normal form toensure summarizability �Lehner et al., 1998�.

Input relational database schema must be nor-malized. Since this procedure is based on thenotion of functional dependencies the neces-sary normal form is Boyce-Codd normal form�BCNF�. A relation R is in BCNF if whenevera functional dependency X � A holds in R, Xis a superkey of R.

For the sake of clarity in the following procedurewe shall use the term relation in the relationalmodel and the term table for the same entityin the dimensional model. We assume that allprimary and foreign keys of relational schemaare known.

The procedure of conversion of a relational intoa multidimensional database schema consists ofthese steps:

A. Discovering potential dimensions

B. Defining fact and dimension attributes

C. Defining fact and dimension tables

D. Defining attributes in fact tables

E. Defining attributes in dimension tables

F. Checking multidimensional schema�s�

A. Discovering Potential Dimensions

Each row in a relation represents an object in-stance, which is an instance of a simple object�such as PATIENTobject� or an instance of a com-position of simple objects �such as RESERVA-TION which is the relationship between PERSON,HOTEL and DATE object instances�. The objectinstance’s properties are represented by object’sattributes. Primary key uniquely identifies theobject instance while the other attributes referto mapping instances that the object instancehas with instances of the other objects �Martin,Odell, 1995�. Foreign keys refer to mappinginstances that the object instance has with in-stances of existing objects. Nonkey attributesrefer to mapping instances that the object in-stance has with nonexisting objects instancesand play the role of property attributes of theobject. For example, the Date attribute in theOPERATION relation represents the date of oper-ation performed on the patient. It represents themapping of the OPERATION instance object to

Page 6: A Procedure of Conversion of Relational into Multidimensional Database … · 2017-05-03 · tidimensional database, multidimensional schema, con-version. 1. Introduction Operational

74 A Procedure of Conversion of Relational into Multidimensional Database Schema

the DATE object instance. Usually, in the opera-tional �transactional� databases the DATE objectdoes not exist so the DATE relation does not ex-ist either. Therefore, the Date attribute becomesthe attribute of the OPERATION object. In datawarehouses the Date is an existing object, calleddimension, having the property attributes suchas Week, WeekendFlag, HolidayFlag, Season, Monthand Year in the dimension relation.

Looking for nonkey attributes in a relationalschema, i.e. attributes that are not part of pri-mary or foreign keys, we may discover potentialforeign keys to hidden dimensions, i.e. dimen-sions that have no dimension relation. We mayopen such a hidden dimension by creating a newdimension relation. The potential foreign keybecomes the real foreign key referencing theprimary key of newly created dimension rela-tion.

Example:

Let us assume a relation described by its rela-tional schema

ADMISSION�Admission#, Ward#�WARD.Ward#�,Doctor#�DOCTOR.Doctor#�, Date,NumberOfDays, Value, Result�

where the primary key is underlined and foreignkeys are italicized and followed by the refer-enced relation and its primary key �referentialintegrity of foreign key to its primary key�.

In the ADMISSION relation we have discovered apotential foreign key Date to the time dimension.We may open this dimension by creating thenew TIME relation with properties �attributes�Date, Month, Quarter and Year. The attribute Datein the ADMISSION relation becomes a foreignkey to the new TIME relation. The result is

ADMISSION�Admission#, Ward#�WARD.Ward#�,Doctor#�DOCTOR.Doctor#�, Date�TIME.Date�,NumberOfDays, Value, Result�

TIME�Date, Month, Quarter, Year�

At the same time we may define hierarchy onthe time dimension by functional dependenciesDate�Month�Quarter�Year.

B. Defining Fact and Dimension Attributes

By inspecting all nonkey attributes in every rela-tional schema, i.e. the attributes that are neither

primary nor foreign keys, we shall define thenonkey attributes that are fact attributes and thenonkey attributes that are dimension attributes.This is the key decision while there is no for-mal way to decide which attributes are fact at-tributes. If an attribute value shows some fact ofthe business, varies over time, and is preferablycontinuosly valued, it is a strong candidate fora fact attribute.

Example:

Let us assume relations

ORDER-ITEM�Order#, Product#, Quantity�PRODUCT�Product#, ProductName�

There are two nonkey attributes: the Quantity inthe ORDER-ITEM relation and the ProductName inthe PRODUCT relation. At this step we must de-cide whether these attributes are fact attributesor dimension attributes. The Quantity attributeshows a fact or measure of the business, thenumber of products ordered. The attribute valueis continuosly valued and summarizable, there-fore the Quantity is a fact attribute. On the otherhand, the ProductName attribute is not a measure.It is a property attribute of the product, and thedimension attribute of the product dimension.

Nonkey attributes in a normalized relation areall fact or all dimension attributes. A normal-ized relation represents either an object whereall nonkey attributes are property attributes ofthe object or a composition of two or more ob-jects where nonkey attributes are property at-tributes of the relationship of these objects. Theformer attributes are dimension attributes, thelatter are fact attributes.

C. Defining Fact and Dimension Tables

Each relation must be considered individually:

� If there exists any fact attribute in the rela-tion, the fact table will be created.

� If there exists no fact attribute in the rela-tion and the relation has more than one for-eign key corresponding to many-to-many re-lationship, a fact table will be created. Thisis a factless fact table, i.e. a fact table thathas no fact attribute.

� If there exists any dimension attribute in therelation, the dimension table will be created.

Page 7: A Procedure of Conversion of Relational into Multidimensional Database … · 2017-05-03 · tidimensional database, multidimensional schema, con-version. 1. Introduction Operational

A Procedure of Conversion of Relational into Multidimensional Database Schema 75

D. Defining Attributes in Fact Tables

For each relation that gives a fact table all for-eign keys, primary keys and nonkey attributesmust be particularly considered in the followingprocedure.

1. Foreign keys of the relationEither step 1.1 or step 1.2 is executed:

1.1 If a foreign key of the relation referencesthe primary key of the relation that gives adimension table, the following options arepossible:

a� The foreign key becomes the primarykey of the fact table. At the same timeit references the primary key of the cor-responding dimension table.

b� The foreign key is dropped from thefact table. In this way the dimensionis dropped. The corresponding dimen-sion table must be dropped if at the endof the conversion process there are noforeign keys referencing this dimensiontable.

1.2 If the foreign key of the relation referencesthe primary key of the relation that gives afact table, the foreign key becomes the pri-mary key of the fact table. But its foreign

This rule defines existing dimensionwithexisting dimension table.

In the following example relationsof the relational schema are in the firstrow, and tables in the dimensional sche-ma are in the second row. We are con-sidering primarily the fact table in thefirst column of the figure and the trans-formation of its foreign keys. Rela-tionships between the relations or tablesare shown in UML cardinality notation.Here, one admission is required by oneand only one doctor. The foreign keyof the ADMISSION relation is Doctor#. Itbecomes part of the primary key in theADMISSION fact table and remains theforeign key referencing the dimensiontable DOCTOR.

This rule defines dropping existing di-mension and its corresponding dimen-sion table.

Here is the previous example withthe existing dimension dropped.

key role must be canceled because the pri-mary key it references to is changed afterstep D of the procedure is applied to theprimary key’s fact table.

Page 8: A Procedure of Conversion of Relational into Multidimensional Database … · 2017-05-03 · tidimensional database, multidimensional schema, con-version. 1. Introduction Operational

76 A Procedure of Conversion of Relational into Multidimensional Database Schema

This rule defines finding and defining a de-generate dimension.

In the following example the Admission#

foreign key in the OPERATION relation arechanged to primary key of the OPERATIONfact table. Its foreign key role is canceled.

1.3 After step 1.1 or 1.2 is executed all for-eign keys of the referenced relation mustbe removed to the considered fact table andtreated as foreign keys described in this step�step 1.�.

This rule defines finding and definingnew dimensions.

In the following example after the Ad-mission# foreign key in the OPERATION re-

The explanation of foreign keys transition to theconsidered fact table is based on applying thetransition rule among functional dependencieson considered keys before the foreign key tran-sition is applied. In the example mentioned,functional dependencies are as follows: OPER-ATION.Operation#� OPERATION.Admission# �pri-mary key functionally determines nonkey at-tribute in the same relation because the rela-tion is in 3NF�, OPERATION.Admission# � AD-MISSION.Admission# �referential integrity of the

lation has become the primary key in theOPERATION fact table all foreign keys of theADMISSION relation are removed to the OP-ERATION fact table. Here the only foreignkey removed is the Diagnosis#. It referencesthe primary key of the DIAGNOSIS dimen-sion table, so the new DIAGNOSIS dimen-sion table is defined and connected to theOPERATION fact table.

Admission# foreign key in the OPERATION rela-tion to the Admission# primary key in the AD-MISSION relation�, ADMISSION.Admission � AD-MISSION.Diagnosis# �primary key functionallydetermines nonkey attribute in the same re-lation�, and ADMISSION.Diagnosis# � DIAGNO-SIS.Diagnosis# �referential integrity of the Diag-nosis# foreign key in the ADMISSION relationto the Diagnosis# primary key in the DIAGNO-SIS relation�. Consequently, by applying tran-sition rules: OPERATION.Operation# � DIAGNO-

Page 9: A Procedure of Conversion of Relational into Multidimensional Database … · 2017-05-03 · tidimensional database, multidimensional schema, con-version. 1. Introduction Operational

A Procedure of Conversion of Relational into Multidimensional Database Schema 77

SIS.Diagnosis#. The same functional dependencyexists after the Diagnosis# foreign key is removedfrom the ADMISSION relation into the OPERA-TION fact table. Now the diagnosis dimension isa dimensionwhich is achieved from OPERATIONfact table directly and not transitively as was thecase with foreign key.

2. The primary key, i.e. every attribute that ispart of the primary key and is not part of theforeign key of the relation becomes the primarykey of the fact table without referencing to anydimension table.

This rule defines finding and defining a degen-erate dimension �Kimball, 1996�.

Identifiers of many documents are of such a na-ture, such as Order# and Ticket#. It is importantnot to drop these attributes in order to retain theexpressiveness of dimensional schema equal tothe relational schema.

3. Nonkey attributes may be:

3.1 dropped from further processing, i.e. willnot be included into the fact table.

3.2 included into the fact table as fact attributes.

E. Defining Attributes in Dimension Tables

All foreign keys may be dropped from the di-mension tables because they are not necessaryafter step D.1.3 is applied. In this step all pos-sible foreign keys from referenced relations areremoved to the considered fact table. Therefore,all necessary foreign keys are after step D.1.3settled in each fact table and there is no need touse foreign keys of dimension tables.

The attributes that belong to the primary key orare nonkey attributes retain their roles.

F. Checking Multidimensional Schema(s)

Every multidimensional schema must be con-sidered for its expressiveness, normal form andsummarizability.

Expressiveness of Multidimensional Schema

The expressiveness ofmultidimensional schemais retained if steps D.1.1b and D.3.1 are not ap-plied while in these steps some information isdropped.

Multidimensional Normal Form

The procedure does not automatically guaranteethat multidimensional database schema is in themultidimensional normal form. We mustmanu-ally check every dimensional schema to ensurethat it satisfies dimensional normal form, i.e.that there exists exactly one terminal attribute,and that its values are complete.

Summarizability

The type of every fact attribute �flow, level orvalue-per-unit� must be specified and declaredin the multidimensional schema. Also, in thegraphical notation all temporal dimensionsmustbe specified and marked by a dot.

3. Example

The entity-relationship model of a simplifiedhospital example, similar to the example inGolfarelli �1998a�, is shown in Fig. 6. Ad-mission �entity ADMISSION� of a patient �en-tity PATIENT and TOWN� to a hospital is de-scribed by diagnosis �DIAGNOSIS, ADMISSION-SECONDARY-DIAGNOSIS�, a ward in the hospital�WARD� where a patient is transfered, operation�OPERATION, OP-THEATRE� that was performedon a patient and doctors �DOCTOR� who diag-nose, cure and operate on a patient. Primarykeys are underlined and foreign keys are itali-cized. In such a diagram foreign keys are notentity attributes. They are the relational imple-mentation of entity relationships and they arehere only for the sake of clarity of the problem.

Page 10: A Procedure of Conversion of Relational into Multidimensional Database … · 2017-05-03 · tidimensional database, multidimensional schema, con-version. 1. Introduction Operational

78 A Procedure of Conversion of Relational into Multidimensional Database Schema

Fig. 6. Hospital example.

Its relational database schema is:

DOCTOR�Doctor#, DoctorName)OPERATION�Operation#, Date, OperationType,

Admission#�ADMISSION.Admission#�,OpTheatre#�OP-THEATRE.OpTheatre#�,Doctor#Operator�DOCTOR.Doctor#��

PATIENT�Patient#, Name, Sex, Age,Doctor#InCure�DOCTOR.Doctor#�,Town�TOWN.Town��

ADMISSION�Admission#, Date, NumberOfDays,Value, Result, Patient#�PATIENT.Patient#�,Diagnosis#�DIAGNOSIS.Diagnosis#�,Doctor#Admission�DOCTOR.Doctor#�,Ward#�WARD.Ward#��

WARD�Ward#, WardName�OP-THEATRE�OpTheatre#, OpTheatreType,

Ward#�WARD.Ward#��DIAGNOSIS�Diagnosis#, DiagnosisName�TOWN�Town, TownName, Region�ADMISSION-SECONDARY-DIAGNOSIS

�Admission#�ADMISSION.Admission#�,Diagnosis#Secondary�DIAGNOSIS.Diagnosis#��

The procedure of conversion of its relationaldatabase into amultidimensional database schemafollows.

A. Discovering Potential Dimensions

In some relations we may find few foreign keysto potential dimensions:

The new relations are created:

TIME�Date, Month, Quarter, Year�with Date�Month�Quarter�Year hierarchy

SEX�Sex, Name�AGE�Age, Age10�RESULT�Result, ResultDescription�

Page 11: A Procedure of Conversion of Relational into Multidimensional Database … · 2017-05-03 · tidimensional database, multidimensional schema, con-version. 1. Introduction Operational

A Procedure of Conversion of Relational into Multidimensional Database Schema 79

Now, the relational schema consists of the fol-lowing relations:

DOCTOR�Doctor#, DoctorName�OPERATION�Operation#, Date�TIME.Date�,

OperationType,Admission#�ADMISSION.Admission#�,OpTheatre#�OP-THEATRE.OpTheatre#�,Doctor#Operator�DOCTOR.Doctor#��

PATIENT�Patient#, Name, Sex�SEX.Sex�,Age�AGE.Age�,Doctor#InCure�DOCTOR.Doctor#�,Town�TOWN.Town��

ADMISSION�Admission#, Date�TIME.Date�,NumberOfDays, Value,Result�RESULT.Result�,Patient#�PATIENT.Patient#�,Diagnosis#�DIAGNOSIS.Diagnosis#�,Doctor#Admission�DOCTOR.Doctor#�,Ward#�WARD.Ward#��

WARD�Ward#, WardName�OP-THEATRE�OpTheatre#, OpTheatreType,

Ward#�WARD.Ward#��DIAGNOSIS�Diagnosis#, DiagnosisName�TOWN�Town, TownName, Region�ADMISSION-SECONDARY-DIAGNOSIS

�Admission#�ADMISSION.Admission#�,Diagnosis#Secondary�DIAGNOSIS.Diagnosis#��

TIME�Date, Month, Quarter, Year�SEX�Sex, Name�AGE�Age, Age10�RESULT�Result, ResultDescription�

B. Defining Fact and Dimension Attributes

There is no formal way to decide which at-tributes are fact attributes and which attributesare dimension attributes because nonkey at-tributes in the relational schema can play eitherrole. Inspecting all nonkey attributes in the rela-tional schemawe shouldmanually decidewhichattributes are fact attributes and which attributesare dimension attributes . The fact attributes showmeasures of the business. They are mostly nu-merical, continuously valued and additive. Inthe example they are depicted in light gray. Thesummarizability type �flow, level or value-per-unit� of the fact attribute is also determined.Dimension attributes are property attributes ofdimensions. In the example they are depictedin dark gray.

DOCTOR�Doctor#, DoctorName �OPERATION�Operation#, Date�TIME.Date�,

OperationType :value-per-unit,Admission#�ADMISSION.Admission#�,OpTheatre#�OP-THEATRE.OpTheatre#�,Doctor#Operator�DOCTOR.Doctor#��

PATIENT�Patient#, Name , Sex�SEX.Sex�,Age�AGE.Age�,Doctor#InCure�DOCTOR.Doctor#�,Town�TOWN.Town��

ADMISSION�Admission#, Date�TIME.Date�,NumberOfDays :level, Value :level,Result�RESULT.Result�,Patient#�PATIENT.Patient#�,Diagnosis#�DIAGNOSIS.Diagnosis#�,Doctor#Admission�DOCTOR.Doctor#�,Ward#�WARD.Ward#��

WARD�Ward#, WardName �OP-THEATRE�OpTheatre#, OpTheatreType ,

Ward#�WARD.Ward#��DIAGNOSIS�Diagnosis#, DiagnosisName �TOWN�Town, TownName , Region �ADMISSION-SECONDARY-DIAGNOSIS

�Admission#�ADMISSION.Admission#�,Diagnosis#Secondary�DIAGNOSIS.Diagnosis#��

Obviously the attributes of new dimensions aredimension attributes:

TIME�Date, Week , Quarter , Year �SEX�Sex, Name �AGE�Age, Age10 �RESULT�Result, ResultDescription �

C. Defining Fact and Dimension Tables

Because of the fact attributes the OPERATIONand ADMISSION fact tables are created. In theADMISSION-SECONDARY-DIAGNOSIS relation thereis no fact attribute, but there are two foreignkeys that correspond to many-to-many relation-ship. Therefore, the fact table with no fact at-tribute is created. Relations DOCTOR, PATIENT,WARD, OP-THEATRE, DIAGNOSIS, TOWN, TIME,SEX, AGE and RESULT give dimension tables.

D. Defining Attributes in Fact Tables

In the relation

OPERATION�Operation#, Date�TIME.Date�,OperationType:value-per-unit,

Page 12: A Procedure of Conversion of Relational into Multidimensional Database … · 2017-05-03 · tidimensional database, multidimensional schema, con-version. 1. Introduction Operational

80 A Procedure of Conversion of Relational into Multidimensional Database Schema

Admission#�ADMISSION.Admission#�,OpTheatre#�OP-THEATRE.OpTheatre#�,Doctor#Operator�DOCTOR.Doctor#��

the attributes are as follows, with the followingrules applied:

— By rule 2 the primary key Operation#becomesthe primary key of the degenerate operationdimension.

— By rule 1.1a foreign key Date becomes theprimary key of the fact table and foreign keyreferencing TIME.Date. It is renamed in Oper-ationDate.

— By rule 3.2 OperationType is the fact attributeof value-per-unit type.

— By rule 1.2 foreign key Admission# becomesthe primary key of the fact table and foreignkey to degenerate admission dimension.

— By rule 1.1a foreign key OpTheatre# becomesthe primary key of the fact table and foreignkey referencing OP-THEATRE.OpTheatre#.

— By rule 1.1a foreign key Doctor#Operator be-comes the primary key of the fact table andforeign key referencing DOCTOR.Doctor#.

— By rule 1.3 all foreign keys of the referencedrelations are removed to the OPERATION facttable. From the ADMISSION relation Date,Result, Doctor#Admission, Patient#, Ward# andDiagnosis# are removed. By rule 1.1a Datebecomes the primary key of the fact tableand foreign key referencing TIME.Date. Itis renamed to AdmissionDate. By the samerule Result becomes the primary key andforeign key referencing RESULT.Result, Doc-tor#Admission �doctor who admits the patientto hospital� becomes the primary key andforeign key referencing DOCTOR.Doctor#, Pa-tient# becomes the primary key and foreignkey referencing PATIENT.Patient#, Ward# �thehospital ward where the patient is trans-ferred� becomes the primary key and for-eign key referencing WARD.Ward# and Diag-nosis# �main diagnosis� becomes the pri-mary key and foreign key referencing DIAG-NOSIS.Diagnosis#. Ward# �the ward wherethe operation on the patient is undertaken�is removed from the OP-THEATRE. Ward# isrenamed to OperationWard# which becomesprimary key and foreign key to WARD.Ward#.From the PATIENT relation removed are Sex,

Age, Doctor#InCure and Town. By rule 1.1a Sexbecomes the primary key and foreign keyreferencing SEX.Sex. By the same rule Agebecomes primary key and foreign key refer-encing AGE.Age, Doctor#InCure becomes pri-mary key and foreign key referencing DOC-TOR.Doctor# and Town primary key and for-eign key referencing to TOWN.Town.

The fact table OPERATION is now:

OPERATION�Operation#, OperationDate�TIME.Date�,OperationType:value-per-unit, Admission#,OpTheatre#�OP-THEATRE.OpTheatre#�,Doctor#Operator�DOCTOR.Doctor#�,AdmissionDate�TIME.Date�,Result�RESULT.Result�,Doctor#Admission�DOCTOR.Doctor#�,Patient#�PATIENT.Patient�,Ward#�WARD.Ward#�,Diagnosis#�DIAGNOSIS.Diagnosis#�,OperationWard#�WARD.Ward#�,Sex�SEX.Sex�, Age�AGE.Age�,Doctor#InCure�DOCTOR.Doctor#�,Town�TOWN.Town�,�

In the relation

ADMISSION�Admission#, Date�TIME.Date�,NumberOfDays:level, Value:level,Result�RESULT.Result�,Patient#�PATIENT.Patient#�,Diagnosis#�DIAGNOSIS.Diagnosis#�,Doctor#Admission�DOCTOR.Doctor#�,Ward#�WARD.Ward#��

the attributes are as follows, with the followingrules applied:

— By rule 2 the primary key Admission# be-comes the primary key of the degenerate ad-mission dimension.

— By rule 1.1a foreign key Date becomes theprimary key of the fact table and foreign keyreferencing TIME.Date. It is renamed in Ad-missionDate.

— By rule 3.2 NumberOfDays and Value are factattributes, both of type level.

— By rule 1.1a foreign key Result becomes theprimary key of the fact table and foreign keyreferencing RESULT.Result.

Page 13: A Procedure of Conversion of Relational into Multidimensional Database … · 2017-05-03 · tidimensional database, multidimensional schema, con-version. 1. Introduction Operational

A Procedure of Conversion of Relational into Multidimensional Database Schema 81

— By rule 1.1a foreign key Patient# becomesthe primary key of the fact table and foreignkey referencing PATIENT.Patient#.

— By rule 1.1a foreign key Diagnosis# becomesthe primary key of the fact table and foreignkey referencing DIAGNOSIS.Diagnosis#.

— By rule 1.1a foreign key Doctor#Admission be-comes the primary key of the fact table andforeign key referencing DOCTOR.Doctor#.

— By rule 1.1a foreign key Ward# becomes theprimary key of the fact table and foreign keyreferencing WARD.Ward#.

— By rule 1.3 all foreign keys of the referencedrelations are removed to the ADMISSION facttable. From the PATIENT relation Sex, Age,Doctor#InCure and Town are removed. By rule1.1a Sex becomes the primary key and for-eign key referencing SEX.Sex. By the samerule Age becomes primary key and foreignkey referencing AGE.Age, Doctor#InCure be-comes primary key and foreign key referenc-ing DOCTOR.Doctor# and Town primary keyand foreign key referencing TOWN.Town.

Now the ADMISSION table is

ADMISSION�Admission#, AdmissionDate�TIME.Date�,NumberOfDays:level, Value:level,Result�RESULT.Result�,Patient#�PATIENT.Patient#�,Diagnosis#�DIAGNOSIS.Diagnosis#�,Doctor#Admission�DOCTOR.Doctor#�,Ward#�WARD.Ward#�,Sex�SEX.Sex�, Age�AGE.Age�,Doctor#InCure�DOCTOR.Doctor#�,Town�TOWN.Town��

In the relation

ADMISSION-SECONDARY-DIAGNOSIS�Admission#�ADMISSION.Admission#�,Diagnosis#Secondary�DIAGNOSIS.Diagnosis#��

the attributes are as follows, with the followingrules applied:

— By rule 1.2 foreign key Admission# becomesthe primary key of the fact table and foreignkey to degenerate admission dimension.

— By rule 1.1a foreign key Diagnosis#Secondarybecomes the primary key of the fact ta-ble and foreign key referencing DIAGNO-SIS.Diagnosis#.

— By rule 1.3 all foreign keys of the referencedrelations are removed to the ADMISSION-SE-CONDARY-DIAGNOSIS fact table. From theADMISSION relation Date, Result, Doctor#Ad-mission, Patient#, Ward# and Diagnosis# areremoved. By rule 1.1a Date becomes theprimary key of the fact table and foreignkey referencing TIME.Date. It is renamed inAdmissionDate. By the same rule Result be-comes the primary key and foreign key refer-encing RESULT.Result, Doctor#Admission �doc-tor who admits the patient to hospital� be-comes the primary key and foreign key ref-erencing DOCTOR.Doctor#, Patient# becomesthe primary key and foreign key referencingPATIENT.Patient#, Ward# �the hospital wardwhere the patient is transferred� becomesthe primary key and foreign key referenc-ing WARD.Ward# and Diagnosis# �main diag-nosis� becomes the primary key and foreignkey referencing DIAGNOSIS.Diagnosis#. Fromthe PATIENT relation Sex, Age, Doctor#InCureand Town are removed. By rule 1.1a Sexbecomes the primary key and foreign keyreferencing SEX.Sex. By the same rule Agebecomes primary key and foreign key refer-encing AGE.Age, Doctor#InCure becomes pri-mary key and foreign key referencing DOC-TOR.Doctor# and Town primary key and for-eign key referencing TOWN.Town.

Now the ADMISSION-SECONDARY-DIAGNOSIStable is

ADMISSION-SECONDARY-DIAGNOSIS�Admission#,Diagnosis#Secondary�DIAGNOSIS.Diagnosis#�,AdmissionDate�TIME.Date�,Result�RESULT.Result�,Doctor#Admission�DOCTOR.Doctor#�,Patient#�PATIENT.Patient#�,Ward#�WARD.Ward#�,Diagnosis#�DIAGNOSIS.Diagnosis#�,Sex�SEX.Sex�, Age�AGE.Age�,Doctor#InCure�DOCTOR.Doctor#�,Town�TOWN.Town��

E. Defining Attributes in Dimension Tables

All foreign keys may be dropped from the di-mension tables since they are not necessary.Now the dimension tables are:

Page 14: A Procedure of Conversion of Relational into Multidimensional Database … · 2017-05-03 · tidimensional database, multidimensional schema, con-version. 1. Introduction Operational

82 A Procedure of Conversion of Relational into Multidimensional Database Schema

DOCTOR �Doctor#, DoctorName�PATIENT�Patient#, Name�WARD�Ward#, WardName�OP-THEATRE�OpTheatre#, OpTheatreType�DIAGNOSIS�Diagnosis#, Name�TIME�Date, Month, Quarter, Year� with

Date�Month�Quarter�Year hierarchyTOWN�Town, TownName, Region� with

Town�Region hierarchySEX�Sex, Name�AGE�Age, Age10� with Age�Age10 hierarchyRESULT�Result, ResultDescription�

The fact tables are:

OPERATION�Operation#, OperationDate�TIME.Date�,OperationType:value-per-unit, Admission#,OpTheatre# �OP-THEATRE.OpTheatre#�,Doctor#Operator �DOCTOR.Doctor#�,AdmissionDate�TIME.Date�,Result�RESULT.Result�,Doctor#Admission�DOCTOR.Doctor#�,Patient#�PATIENT.Patient�,Ward#�WARD.Ward#�,Diagnosis#�DIAGNOSIS.Diagnosis#�,OperationWard#�WARD.Ward#�,Sex�SEX.Sex), Age�AGE.Age�,Doctor#InCure�DOCTOR.Doctor#�,Town�TOWN.Town��

ADMISSION�Admission#, AdmissionDate�TIME.Date�,NumberOfDays:level, Value:level,Result�RESULT.Result�,Patient#�PATIENT.Patient#�,Diagnosis#�DIAGNOSIS.Diagnosis#�,Doctor#Admission�DOCTOR.Doctor#�,Ward#�WARD.Ward#�,Sex�SEX.Sex�, Age�AGE.Age�,Doctor#InCure�DOCTOR.Doctor#�,Town�TOWN.Town��

ADMISSION-SECONDARY-DIAGNOSIS�Admission#,Diagnosis#Secondary�DIAGNOSIS.Diagnosis#�,AdmissionDate�TIME.Date�,Result�RESULT.Result�,Doctor#Admission�DOCTOR.Doctor#�,Patient#�PATIENT.Patient#�,Ward#�WARD.Ward#�,Diagnosis#�DIAGNOSIS.Diagnosis#�,Sex�SEX.Sex�, Age�AGE.Age�,Doctor#InCure�DOCTOR.Doctor#�,Town�TOWN.Town��

F. Checking Multidimensional Schema(s)

Expressiveness of Multidimensional Schema

The expressiveness ofmultidimensional schemais retained while steps 1.1b and 3.1 in D are notdone.

Multidimensional Normal Form

Wemay check that every dimensional schemaofthe example satisfies dimensional normal form,while in each dimensional schema there existsexactly one terminal attribute and its values arecomplete.

Summarizability

The type of every fact attribute �flow, level orvalue-per-unit� is specified and declared in themultidimensional schema. Also, in the graph-ical notation the temporal dimension time ismarked by a dot.

The resulting entity-relationship-like model isshown in Fig. 7. showing fact tables in gray.Various arrows are used as pointers from facttables to their dimension tables or as pointersshowing dimension hierarchies. Dotted arrowshows degenerate dimension.

4. Validation of the Procedure

How do we validate that the output multidi-mensional schema is valid? The easiest wayis to validate that all primary functional de-pendencies of the relational schema are con-tained in the multidimensional schema. Pri-mary functional dependency is an interentityfunctional dependency that exists among for-eign key of an entitity and primary key of an-other entity,which is referred to as referential in-tegrity. Primary functional dependencies imple-ment relationships among entities of an entity-relationship model. If the mentioned conditionis satisfied, the multidimensional schema pro-duced by the procedure is at same time equallyexpressive as its initial relation schema.

In the multidimensional schema all primaryfunctional dependencies result from fact tables

Page 15: A Procedure of Conversion of Relational into Multidimensional Database … · 2017-05-03 · tidimensional database, multidimensional schema, con-version. 1. Introduction Operational

A Procedure of Conversion of Relational into Multidimensional Database Schema 83

Fig. 7. Multidimensional model of the hospital example.

to dimension tables. The step D.1.3 of the pro-cedure ensures by the foreign key transition thatall primary functional dependencies that exist inthe relational schema are transferred into func-tional dependencies resulting from fact tables inthe multidimensional schema.

For example, in the example the following func-tional dependencies of foreign keys to primarykeys in the relational schema

OPERATION.Admission#�ADMISSION.Admission#ADMISSION.Patient#�PATIENT.PatientPATIENT.Town�TOWN.Town

are substituted by direct functional dependency

OPERATION.Town�TOWN.Town

in the multidimensional schema.

5. Conclusion

In the paper, the general procedure of conver-sion of relational intomultidimensional databaseschema is presented. It consists of the followingsteps: discovering potential dimensions, defin-ing fact and dimension attributes, defining factand dimension tables and defining attributesin tables. The presented procedure is semi-automated while in the course of conversionsome decisions must be made.

Acknowledgement

Many thanks to the anonymous referees for theirhelpful and insightful comments.

Page 16: A Procedure of Conversion of Relational into Multidimensional Database … · 2017-05-03 · tidimensional database, multidimensional schema, con-version. 1. Introduction Operational

84 A Procedure of Conversion of Relational into Multidimensional Database Schema

References

�1� M.GOLFARELLI, S. RIZZI, A MethodologicalFrame-work for Data Warehouse Design. Proc. 1st Int.ACM Workshop on Data Warehousing and OLAP,�1998�, Washington.

�2� M.GOLFARELLI, D. MAIO, S. RIZZI, Conceptual De-sign of Data Warehouses from E/R Schemes. Proc.of the Hawaii Int. Conf. on System Science, �1998a�,Kona.

�3� M. GYSSENS, L. LAKSHMANAN, A Foundation forMulti-Dimensional Databases. Proc. 22nd VLDBConference, �1996�, Mumbai �Bombay�.

�4� R. KIMBALL, The Data Warehouse Toolkit. JohnWiley, �1996�, New York.

�5� W. LEHNER, H. ALBRECHT, H. WEDEKIND, Nor-mal Forms for Multidimensional Databases. Proc.10th Int. Conf. on Scient. and Statistical DatabaseManagement, �1998�, pp. 63-72, Capri.

�6� J. MARTIN, J. ODELL, Object Oriented Methods;A Foundation. Prentice-Hall, �1995�, EnglewoodCliffs.

�7� H-J. LENZ, A. SHOSHANI, Summarizability in OLAPand Statistical Data Bases. 9th International Con-ference on Statistical and Scientific Database Man-agement (SSDBM), �1997�.

�8� A. SHOSHANI, OLAP and Statistical Databases:Similarities and Differences. 16th ACK SIGACT-SIGMOD-SIGARTSymp. on Principles of DatabaseSystems, �1997�, pp. 185-196.

Received: November, 1999Revised: March, 2002Accepted: May, 2002

Contact address:

Mladen VargaGraduate School of Economics & Business

University of ZagrebTrg J. F. Kennedyja 6

10000 ZagrebCroatia

Phone: �385 1 2383279Fax: �385 1 2335633

e-mail: mladen�varga�efzg�hr

MLADEN VARGA obtained his B. Sc. in Electronics, M. Sc. and Ph. D. inComputer Science from the Faculty of Electrical Engineering and Com-puter Science at the University of Zagreb. Currently he is a professorat the Department of Computing at the Graduate School of Economics& Business, University of Zagreb. His main research interests are datamodelling, data base systems, data warehousing systems and softwareengineering methods. He has published over 40 papers and severalbooks.