Research Article

36
1 Advanced Databases Advanced Databases Presentation on Presentation on A Suggested Model based on the open A Suggested Model based on the open standards in Data Warehouse standards in Data Warehouse Presented by: Presented by: Shamama Tul Umber Parwaiz (0772122) Shamama Tul Umber Parwaiz (0772122)

description

Research Article:A Suggested Model based on the open standards in Data Warehouse presented at 12th NRC.

Transcript of Research Article

Page 1: Research Article

1

Advanced DatabasesAdvanced Databases

Presentation onPresentation on

““A Suggested Model based on the A Suggested Model based on the open standards in Data Warehouseopen standards in Data Warehouse””

Presented by: Presented by: Shamama Tul Umber Parwaiz (0772122)Shamama Tul Umber Parwaiz (0772122)

Page 2: Research Article

22

ContentsContents

AbstractAbstract Introduction Introduction Literature Review Literature Review Research Methodology Research Methodology ConclusionConclusion Future WorkFuture Work Acknowledgement Acknowledgement References References

Page 3: Research Article

33

AbstractAbstract

The current data warehouse technology is not based on the The current data warehouse technology is not based on the

open standards; there exist several proprietary standards, open standards; there exist several proprietary standards,

but the unified agreed upon standards for data but the unified agreed upon standards for data

warehousing are still lacking, this is a driving force for warehousing are still lacking, this is a driving force for

certain issues like security, interoperability and integration certain issues like security, interoperability and integration

etc. in this research we have presented a model that etc. in this research we have presented a model that

describes the core layers in the data warehouses that are describes the core layers in the data warehouses that are

supposed to be based on the open standards and discusses supposed to be based on the open standards and discusses

some of those in detail. some of those in detail.

Page 4: Research Article

44

IntroductionIntroduction

In recent years the Data Warehousing has become a very useful In recent years the Data Warehousing has become a very useful

technology for integrating the operational data sources in a way technology for integrating the operational data sources in a way

that it gives the decision making capabilities to the top level that it gives the decision making capabilities to the top level

management of the organization. While designing and management of the organization. While designing and

developing the data warehouse the IT experts have to choose developing the data warehouse the IT experts have to choose

an appropriate approach, but unluckily the approaches are not an appropriate approach, but unluckily the approaches are not

based on open standards. based on open standards.

Continued on next slide …

Page 5: Research Article

55

Introduction Introduction Contd…Contd…

Continued on next slide …

The data warehouse technology is facing many challenges The data warehouse technology is facing many challenges

related to Design, Security, Performance, Data Cleaning, related to Design, Security, Performance, Data Cleaning,

Storage, Integration, Extraction, Transformation, Loading, Data Storage, Integration, Extraction, Transformation, Loading, Data

Refreshing, Schemas, Rollup, Drill down, and Interoperability, Refreshing, Schemas, Rollup, Drill down, and Interoperability,

after presenting the framework that gives the meta-after presenting the framework that gives the meta-

requirements for the data warehouse design, now we are requirements for the data warehouse design, now we are

moving towards a design model for data warehouses, this moving towards a design model for data warehouses, this

model will help in designing good data warehouses.model will help in designing good data warehouses.

Page 6: Research Article

66

Introduction Introduction Contd…Contd…

Continued on next slide …

This research is primarily focused to introduce a data This research is primarily focused to introduce a data

warehouse design model that is based on open standards and warehouse design model that is based on open standards and

meets the meta-requirements that we have presented in our meets the meta-requirements that we have presented in our

previous work, developing a data warehouses based on open previous work, developing a data warehouses based on open

standards will lead to a harmony and compatibility of several standards will lead to a harmony and compatibility of several

industry products.industry products.

Page 7: Research Article

77

Introduction Introduction Contd…Contd…

Data warehouse is not just a collection of data from several Data warehouse is not just a collection of data from several

operational data sources; rather we can consider the data operational data sources; rather we can consider the data

warehouse as a defined process containing three major steps: warehouse as a defined process containing three major steps:

• Extract data from the distributed operational sources, most of the Extract data from the distributed operational sources, most of the

times it is extracted from the legacy systems. times it is extracted from the legacy systems.

• Transforming and aggregating data consistently into warehouseTransforming and aggregating data consistently into warehouse

• Accessing the data in an efficient and flexible mannerAccessing the data in an efficient and flexible manner

The main contribution of the data warehouse is its power to The main contribution of the data warehouse is its power to

convert the data into information that can be used in strategic convert the data into information that can be used in strategic

decision making among the organizations [4]. decision making among the organizations [4].

Page 8: Research Article

88

Literature Review Literature Review

Continued on next slide …

Lack of standards Lack of standards

• There is lack of standards between industry and There is lack of standards between industry and

researchers as the have not yet agreed on a researchers as the have not yet agreed on a

unified standard, more over no standards for unified standard, more over no standards for

modeling data warehouse security exist as yet, the modeling data warehouse security exist as yet, the

design of data warehouse is not mining aware, the design of data warehouse is not mining aware, the

data warehouse design generally fulfills the OLAP data warehouse design generally fulfills the OLAP

requirements but do not address the Data Mining requirements but do not address the Data Mining

requirements [1]. requirements [1].

Page 9: Research Article

99

Literature Review Literature Review Contd…Contd…

Continued on next slide …

Lack of standards Lack of standards

• There is lack of standards between industry and There is lack of standards between industry and

researchers as the have not yet agreed on a researchers as the have not yet agreed on a

unified standard, more over no standards for unified standard, more over no standards for

modeling data warehouse security exist as yet, the modeling data warehouse security exist as yet, the

design of data warehouse is not mining aware, the design of data warehouse is not mining aware, the

data warehouse design generally fulfills the OLAP data warehouse design generally fulfills the OLAP

requirements but do not address the Data Mining requirements but do not address the Data Mining

requirements [1]. requirements [1].

Page 10: Research Article

1010

Literature Review Literature Review Contd…Contd…

Continued on next slide …

Integration problemIntegration problem• Integration issues related to data warehouses have Integration issues related to data warehouses have

also got the vital importance, some organizations also got the vital importance, some organizations

are settling data marts which are departmental are settling data marts which are departmental

subsets focused on selected subjects e.g., a subsets focused on selected subjects e.g., a

marketing data mart may include customer, marketing data mart may include customer,

product, and sales information, these data marts product, and sales information, these data marts

enable faster roll out, since they do not require enable faster roll out, since they do not require

enterprise-wide processing, but they lead to enterprise-wide processing, but they lead to

complex integration problems in the long run [2].complex integration problems in the long run [2].

Page 11: Research Article

1111

Literature Review Literature Review Contd…Contd…

Security Issues in data warehouse Security Issues in data warehouse • Security issues in data warehousing have also got Security issues in data warehousing have also got

vital importance, data from different systems vital importance, data from different systems having different security policies is integrated, the having different security policies is integrated, the users of the operational systems are not the same users of the operational systems are not the same as the users of the data warehouse, Access control as the users of the data warehouse, Access control schemes of Operational database objects (e.g., schemes of Operational database objects (e.g., tables) cannot be mapped easily to Data tables) cannot be mapped easily to Data warehouse items like dimensions, hierarchies etc. warehouse items like dimensions, hierarchies etc. therefore need for proper OLAP security design therefore need for proper OLAP security design arises [3].arises [3].

Page 12: Research Article

1212

MethodologyMethodology

The ETL Filter The ETL Filter

• We have proposed the ETL Filter in our design We have proposed the ETL Filter in our design

model, the functionality of this filter would be to model, the functionality of this filter would be to

only allow the data with agreed upon data types to only allow the data with agreed upon data types to

the data warehouse, in order to do this task the the data warehouse, in order to do this task the

ETL Filter will have a repository of standards that ETL Filter will have a repository of standards that

will be populated with the data types. will be populated with the data types.

Continued on next slide …

Page 13: Research Article

1313

Methodology Methodology Contd…Contd…

The ETL Filter The ETL Filter

Continued on next slide …

Page 14: Research Article

1414

Methodology Methodology Contd…Contd…

Platform Independent APIsPlatform Independent APIs• As per our research we have to come to a As per our research we have to come to a

conclusion that there is a strong need to develop conclusion that there is a strong need to develop

APIs for accessing data from a warehouse, these APIs for accessing data from a warehouse, these

APIs should be platform independent so that any APIs should be platform independent so that any

of the programming language can be used to of the programming language can be used to

connect the data warehouse, these APIs also help connect the data warehouse, these APIs also help

us in preventing the changes to applications if the us in preventing the changes to applications if the

underlying data warehouse is changed and vice underlying data warehouse is changed and vice

versa. versa.

Continued on next slide …

Page 15: Research Article

1515

Methodology Methodology Contd…Contd…

Platform Independent APIsPlatform Independent APIs

Continued on next slide …

Page 16: Research Article

1616

Methodology Methodology Contd…Contd…

Dimensional Security ManagementDimensional Security Management• Our approach for security is relatively simple, we Our approach for security is relatively simple, we

have introduces a layer known as “Dimensional have introduces a layer known as “Dimensional

Security Management Layer”, this layer manages Security Management Layer”, this layer manages

the dimension level security, there could be the dimension level security, there could be

several dimensions in a data warehouse data e.g., several dimensions in a data warehouse data e.g.,

sales, cities, profit, products etc. the users will be sales, cities, profit, products etc. the users will be

only allowed to query the dimensions for which only allowed to query the dimensions for which

have been permitted by the dimensional security have been permitted by the dimensional security

management layer. management layer.

Continued on next slide …

Page 17: Research Article

1717

Methodology Methodology Contd…Contd…

Dimensional Security Management Dimensional Security Management

Continued on next slide …

Page 18: Research Article

1818

Methodology Methodology Contd…Contd…

Dimensional Security ManagementDimensional Security Management

• In the initial stages the data warehouses were used and queried by In the initial stages the data warehouses were used and queried by

executive management and business analysts only. But now-a-days executive management and business analysts only. But now-a-days

the range of users with data warehouse access is increasing; the the range of users with data warehouse access is increasing; the

supposition that only limited users will access the data warehouse supposition that only limited users will access the data warehouse

is no longer appropriate and the need of proper security and access is no longer appropriate and the need of proper security and access

control mechanisms is becoming more and more important. Data control mechanisms is becoming more and more important. Data

warehouses have become open systems, especially OLAP analysis warehouses have become open systems, especially OLAP analysis

requires this open nature [3]. requires this open nature [3].

Continued on next slide …

Page 19: Research Article

1919[6]

Page 20: Research Article

2020[6]

Page 21: Research Article

2121[6]

Page 22: Research Article

2222

Page 23: Research Article

2323

Methodology Methodology Contd…Contd…

Dimensional Security ManagementDimensional Security Management

• The table – 1 is proposed to store the security The table – 1 is proposed to store the security

information in a very simple way, the table information in a very simple way, the table

contains two parts first part contains the header contains two parts first part contains the header

information, and the second contains the security information, and the second contains the security

information.information.

• The header information section contains the The header information section contains the

“Attribute and Value” pairs that contain attributes “Attribute and Value” pairs that contain attributes

like OLAP Server name, version etc. like OLAP Server name, version etc.

Continued on next slide …

Page 24: Research Article

2424

Methodology Methodology Contd…Contd…

Dimensional Security ManagementDimensional Security Management

• The security information section of the table contains the list of The security information section of the table contains the list of

users in the rows and the dimensions in the columns, the users in the rows and the dimensions in the columns, the

intersection between the rows and columns that are basically the intersection between the rows and columns that are basically the

‘cells’ contain the access rights of the specific user over the ‘cells’ contain the access rights of the specific user over the

particular dimension.particular dimension.

• If a particular user wants information regarding the sale of a If a particular user wants information regarding the sale of a

particular product in different cities over specified time period, then particular product in different cities over specified time period, then

he must have the access rights for the three dimensions, i.e., he must have the access rights for the three dimensions, i.e.,

product, city and time, if he does not possess the access rights to product, city and time, if he does not possess the access rights to

any of them he will not be able to view the specified report. any of them he will not be able to view the specified report.

Continued on next slide …

Page 25: Research Article

2525

Methodology Methodology Contd…Contd…

Dimensional Security ManagementDimensional Security Management• The same table can be used to define the roles, which The same table can be used to define the roles, which

simplify the access control management; a role is defined simplify the access control management; a role is defined

once and can be assigned to multiple users, and at the same once and can be assigned to multiple users, and at the same

time one user may possess multiple roles. time one user may possess multiple roles.

• Now we describe an algorithm that will determine the access Now we describe an algorithm that will determine the access

of a particular user over a dimension set. The algorithm of a particular user over a dimension set. The algorithm

takes the User and Dimension list as input and returns the takes the User and Dimension list as input and returns the

access rights to perform any operation on the given access rights to perform any operation on the given

dimension set in form of true or false. dimension set in form of true or false.

Continued on next slide …

Page 26: Research Article

2626

Page 27: Research Article

2727

Page 28: Research Article

2828

Flow of AlgorithmFlow of AlgorithmStart

User: ‘u’Dimensions: (d1, d2, …, dn)

i = Index of user in table

j = Index of next dimension ‘d’ in table

j <= n ?

table [i][j] <> R

Yes

NoA

Return ‘Failure’

No

End

AReturn ‘Success’

Page 29: Research Article

2929

Methodology Methodology Contd…Contd…

Meta-data Definition Language (MDL)Meta-data Definition Language (MDL)

• Our study shows that the meta data in different data Our study shows that the meta data in different data

warehouses is stored in different ways, here we want to warehouses is stored in different ways, here we want to

introduce the concept of MDL i.e., Meta data Definition introduce the concept of MDL i.e., Meta data Definition

Language, if every data warehouse follows this language, Language, if every data warehouse follows this language,

the integration problems can be resolved, in existing data the integration problems can be resolved, in existing data

warehouse solutions we have introduced an MDL translator warehouse solutions we have introduced an MDL translator

layer that can work with the existing system without making layer that can work with the existing system without making

much changes. much changes.

Continued on next slide …

Page 30: Research Article

3030

Page 31: Research Article

3131

ConclusionConclusion

We have described the substantial technical challenges in developing and We have described the substantial technical challenges in developing and

deploying data warehouses in our research. While many commercial deploying data warehouses in our research. While many commercial

products and services exist, there is lack of standards at the same time; products and services exist, there is lack of standards at the same time;

there are still several interesting areas for research in developing open there are still several interesting areas for research in developing open

standards for data warehouses.standards for data warehouses.

After going through the literature review, we have come to point that some After going through the literature review, we have come to point that some

efforts have already been taken regarding platform independent design efforts have already been taken regarding platform independent design

and their respective implementations, but not much work has been done in and their respective implementations, but not much work has been done in

defining the open standards for data warehouses as there have been defining the open standards for data warehouses as there have been

efforts in defining open standards for web services. efforts in defining open standards for web services.

Page 32: Research Article

3232

Future WorkFuture Work

We have presented a big picture for designing We have presented a big picture for designing

data warehouse based on open standards that do data warehouse based on open standards that do

not exist today, how ever there is need to explore not exist today, how ever there is need to explore

and materialize each of these.and materialize each of these.

Methodology Contd…Methodology Contd…

Page 33: Research Article

3333

ACKNOWLEDGMENTACKNOWLEDGMENT

Authors of this paper pay special thanks to their most Authors of this paper pay special thanks to their most

respectable instructor and supervisor for this work Mr. respectable instructor and supervisor for this work Mr.

Aslam Parvez for his contributions and guidance through Aslam Parvez for his contributions and guidance through

out the work.out the work.

Page 34: Research Article

3434

References References [1] [1] Stefano Rizzi, "Research in Data Warehouse Modeling Stefano Rizzi, "Research in Data Warehouse Modeling

and and Design: Dead or Alive?" Design: Dead or Alive?" DOLAP’06DOLAP’06, November , November 10, 2006, 10, 2006, Arlington, Virginia, USAArlington, Virginia, USA

[2][2]Surajit Chaudhuri and Umeshwar Dayal, “An Overview Surajit Chaudhuri and Umeshwar Dayal, “An Overview of of Data Warehousing and OLAP Technology” Data Warehousing and OLAP Technology” ACM ACM digital Librarydigital Library, , research.microsoft.comresearch.microsoft.com

[3][3]Torsten Priebe, “Towards OLAP Security Design – Survey and Torsten Priebe, “Towards OLAP Security Design – Survey and Research Issues” Research Issues” DOLAP 2000 McLeanDOLAP 2000 McLean, VA, USA, ACM ISBN , VA, USA, ACM ISBN 1-1-58113-323-558113-323-5

[4][4]Fabio Rilston, Jaelson Freire “DWARF: An Approach for Fabio Rilston, Jaelson Freire “DWARF: An Approach for Requirements Definition and Management of Data Warehouse Requirements Definition and Management of Data Warehouse Systems” Systems” 11th IEEE International Requirements Engineering 11th IEEE International Requirements Engineering ConferenceConference 1090-705X/03, 2003. 1090-705X/03, 2003.

[5][5]Emilio Soler, Juan Trujillo, “A framework for developing Emilio Soler, Juan Trujillo, “A framework for developing secure secure data warehouses based on MDA and QVT” data warehouses based on MDA and QVT” 2nd 2nd International International Conference on availability, reliability and Conference on availability, reliability and security (ARES 07)security (ARES 07) 0-7695-2775-2/07, 2007 IEEE. 0-7695-2775-2/07, 2007 IEEE.

Page 35: Research Article

3535

ReferencesReferences [6] [6] Roger Fang, Sama Tuladhar, "Teaching Data Warehousing Roger Fang, Sama Tuladhar, "Teaching Data Warehousing

& Data Mining in a Graduate Program of & Data Mining in a Graduate Program of Information Information Technology" Mid-South Conference, JCSC 21, 5 (May 2006)Technology" Mid-South Conference, JCSC 21, 5 (May 2006)

Page 36: Research Article

36

??

?? ??Thanks …