Applied Big Data and Visualization - University of...

26
Data Warehousing Applied Big Data and Visualization P. Healy CS1-08 Computer Science Bldg. tel: 202727 [email protected] Spring 2019–2020 P. Healy (University of Limerick) CS6502 Spring 2019–2020 1 / 20

Transcript of Applied Big Data and Visualization - University of...

Page 1: Applied Big Data and Visualization - University of Limerickgarryowen.csisdmz.ul.ie/~cs6502/resources/cs6502-lect13.pdf · day-to-day transactions of an organization efficiently and

Data Warehousing

Applied Big Data and Visualization

P. Healy

CS1-08Computer Science Bldg.

tel: [email protected]

Spring 2019–2020

P. Healy (University of Limerick) CS6502 Spring 2019–2020 1 / 20

Page 2: Applied Big Data and Visualization - University of Limerickgarryowen.csisdmz.ul.ie/~cs6502/resources/cs6502-lect13.pdf · day-to-day transactions of an organization efficiently and

Data Warehousing

Outline

1 Data WarehousingOriginsIssues

P. Healy (University of Limerick) CS6502 Spring 2019–2020 2 / 20

Page 3: Applied Big Data and Visualization - University of Limerickgarryowen.csisdmz.ul.ie/~cs6502/resources/cs6502-lect13.pdf · day-to-day transactions of an organization efficiently and

Data WarehousingOriginsIssues

Outline

1 Data WarehousingOriginsIssues

P. Healy (University of Limerick) CS6502 Spring 2019–2020 3 / 20

Page 4: Applied Big Data and Visualization - University of Limerickgarryowen.csisdmz.ul.ie/~cs6502/resources/cs6502-lect13.pdf · day-to-day transactions of an organization efficiently and

Data WarehousingOriginsIssues

The Evolution of Data Warehousing

In the 1970s organisations accumulated growing amountsof data stored in their operational databasesThe very large collections of data in more recent timescreate possibility use the data for decision makingChallenges:

operational systems were not designed for decision makingdata from various operational systems within the sameorganisation may need to be integrated/consolidated

P. Healy (University of Limerick) CS6502 Spring 2019–2020 4 / 20

Page 5: Applied Big Data and Visualization - University of Limerickgarryowen.csisdmz.ul.ie/~cs6502/resources/cs6502-lect13.pdf · day-to-day transactions of an organization efficiently and

Data WarehousingOriginsIssues

Some Terminology

Database: Collection of information thatExists over a long period of timeStored on secondary storage in a structured wayManaged by a computer program called DatabaseManagement System (DBMS)

(Ullman and Widom)Operational System: A system that is used to process theday-to-day transactions of an organization efficiently andwith preserving the integrity of the transactional data

(Wikipedia)

P. Healy (University of Limerick) CS6502 Spring 2019–2020 5 / 20

Page 6: Applied Big Data and Visualization - University of Limerickgarryowen.csisdmz.ul.ie/~cs6502/resources/cs6502-lect13.pdf · day-to-day transactions of an organization efficiently and

Data WarehousingOriginsIssues

Database System

P. Healy (University of Limerick) CS6502 Spring 2019–2020 6 / 20

Page 7: Applied Big Data and Visualization - University of Limerickgarryowen.csisdmz.ul.ie/~cs6502/resources/cs6502-lect13.pdf · day-to-day transactions of an organization efficiently and

Data WarehousingOriginsIssues

Operational System

P. Healy (University of Limerick) CS6502 Spring 2019–2020 7 / 20

Page 8: Applied Big Data and Visualization - University of Limerickgarryowen.csisdmz.ul.ie/~cs6502/resources/cs6502-lect13.pdf · day-to-day transactions of an organization efficiently and

Data WarehousingOriginsIssues

Data Warehouse: Definition

Data Warehouse: A subject-oriented, integrated,time-variant and nonvolatile collection of data in support ofmanagement’s decision-making process. (Bill Inmon,1993)

subject-oriented: organised around the major subjects ofthe enterprise such as customers, products, salesintegrated: data coming from different sourcestime-variant: data is associated with a specific time periodnonvolatile: data is not updated in real time but refreshedfrom operational systems on regular basis

P. Healy (University of Limerick) CS6502 Spring 2019–2020 8 / 20

Page 9: Applied Big Data and Visualization - University of Limerickgarryowen.csisdmz.ul.ie/~cs6502/resources/cs6502-lect13.pdf · day-to-day transactions of an organization efficiently and

Data WarehousingOriginsIssues

Data Warehouse: Alternative Definition

A data warehouse is a database used for reporting andanalysis. The data stored in the warehouse is uploadedfrom the operational systems. It may pass through anoperational data store for additional operations before it isused in the data warehouse for reporting (Wikipedia)

Operational Data Store (ODS): a database designed tointegrate data from multiple sources for additionaloperations on the data. The data is then passed back tooperational systems for further operations and to the datawarehouse for reporting

A data warehouse is a copy of transaction data specificallystructured for query and analysis. (Ralph Kimball)A data warehouse is a copy of transaction data specificallystructured for querying and reporting. DW Info Center

P. Healy (University of Limerick) CS6502 Spring 2019–2020 9 / 20

Page 10: Applied Big Data and Visualization - University of Limerickgarryowen.csisdmz.ul.ie/~cs6502/resources/cs6502-lect13.pdf · day-to-day transactions of an organization efficiently and

Data WarehousingOriginsIssues

Data Warehouse: Beyond Definitions

A data warehouse maintains its functions in three layers:staging, integration, and access.

Staging is used to store raw data for use by developersThe integration layer is used to integrate data and to have alevel of abstraction from usersThe access layer is for getting data out for users

Further point about data warehouses: they can besubdivided into data marts. With data marts it storessubsets of data from a warehouse, which focuses on aspecific aspect of a company like sales or a marketingprocess

P. Healy (University of Limerick) CS6502 Spring 2019–2020 10 / 20

Page 11: Applied Big Data and Visualization - University of Limerickgarryowen.csisdmz.ul.ie/~cs6502/resources/cs6502-lect13.pdf · day-to-day transactions of an organization efficiently and

Data WarehousingOriginsIssues

Data Warehouse: Beyond Definitions

A data warehouse maintains its functions in three layers:staging, integration, and access.Further point about data warehouses: they can besubdivided into data marts. With data marts it storessubsets of data from a warehouse, which focuses on aspecific aspect of a company like sales or a marketingprocess

P. Healy (University of Limerick) CS6502 Spring 2019–2020 10 / 20

Page 12: Applied Big Data and Visualization - University of Limerickgarryowen.csisdmz.ul.ie/~cs6502/resources/cs6502-lect13.pdf · day-to-day transactions of an organization efficiently and

Data WarehousingOriginsIssues

Data Warehouse Architecture

P. Healy (University of Limerick) CS6502 Spring 2019–2020 11 / 20

Page 13: Applied Big Data and Visualization - University of Limerickgarryowen.csisdmz.ul.ie/~cs6502/resources/cs6502-lect13.pdf · day-to-day transactions of an organization efficiently and

Data WarehousingOriginsIssues

Data Warehouse Architecture (contd.)

Operational Data Store:Repository of current and integrated operational data usedfor analysisActs as a staging area for data to be moved into the DWOften employed when legacy operational systems areincapable of achieving reporting requirements

Managers:DBMS / DW Database

(see over)

P. Healy (University of Limerick) CS6502 Spring 2019–2020 12 / 20

Page 14: Applied Big Data and Visualization - University of Limerickgarryowen.csisdmz.ul.ie/~cs6502/resources/cs6502-lect13.pdf · day-to-day transactions of an organization efficiently and

Data WarehousingOriginsIssues

Data Warehouse Architecture (contd.)

Operational Data Store:Managers:

ETL Manager: performs all operations associated withextraction of the data from data sources, transformation ofthe data and loading it into the DWWarehouse Manager:

analysis of data to ensure consistencytransformation and merging of source data from temporarystorage into data warehouse tablescreation of indexes and views on base tablesgeneration of denormalizationsgeneration of aggregationsbacking up and archiving data

Query Manager:performs all operations associated with the management ofuser queriesin particular, it schedules the queries for execution

DBMS / DW Database(see over)

P. Healy (University of Limerick) CS6502 Spring 2019–2020 12 / 20

Page 15: Applied Big Data and Visualization - University of Limerickgarryowen.csisdmz.ul.ie/~cs6502/resources/cs6502-lect13.pdf · day-to-day transactions of an organization efficiently and

Data WarehousingOriginsIssues

Data Warehouse Architecture (contd.)

Operational Data Store:Managers:DBMS / DW Database

(see over)

P. Healy (University of Limerick) CS6502 Spring 2019–2020 12 / 20

Page 16: Applied Big Data and Visualization - University of Limerickgarryowen.csisdmz.ul.ie/~cs6502/resources/cs6502-lect13.pdf · day-to-day transactions of an organization efficiently and

Data WarehousingOriginsIssues

Data Warehouse Architecture

P. Healy (University of Limerick) CS6502 Spring 2019–2020 13 / 20

Page 17: Applied Big Data and Visualization - University of Limerickgarryowen.csisdmz.ul.ie/~cs6502/resources/cs6502-lect13.pdf · day-to-day transactions of an organization efficiently and

Data WarehousingOriginsIssues

DW Database

Metadata – data about dataManagement of metadata is extremely complex and difficultMajor purpose: to show the pathway back to where thedata began, the history of any item in the DWMetadata associated with data transformation and loadingmust describe the source of the data and any changesmadeMetadata associated with data management describes thedata as it is stored in the DW

Detailed datathe integrated data that comes from the operationalsystems

Lightly and highly summarized datasummarizing: aggregating, sorting, groupingspeeds up the performance of queriessubject to change on ongoing basis

Archive/Backup datadetailed datasummarized data

P. Healy (University of Limerick) CS6502 Spring 2019–2020 14 / 20

Page 18: Applied Big Data and Visualization - University of Limerickgarryowen.csisdmz.ul.ie/~cs6502/resources/cs6502-lect13.pdf · day-to-day transactions of an organization efficiently and

Data WarehousingOriginsIssues

DW Database

Metadata – data about dataDetailed data

the integrated data that comes from the operationalsystems

Lightly and highly summarized datasummarizing: aggregating, sorting, groupingspeeds up the performance of queriessubject to change on ongoing basis

Archive/Backup datadetailed datasummarized data

P. Healy (University of Limerick) CS6502 Spring 2019–2020 14 / 20

Page 19: Applied Big Data and Visualization - University of Limerickgarryowen.csisdmz.ul.ie/~cs6502/resources/cs6502-lect13.pdf · day-to-day transactions of an organization efficiently and

Data WarehousingOriginsIssues

DW Database

Metadata – data about dataDetailed data

the integrated data that comes from the operationalsystems

Lightly and highly summarized datasummarizing: aggregating, sorting, groupingspeeds up the performance of queriessubject to change on ongoing basis

Archive/Backup datadetailed datasummarized data

P. Healy (University of Limerick) CS6502 Spring 2019–2020 14 / 20

Page 20: Applied Big Data and Visualization - University of Limerickgarryowen.csisdmz.ul.ie/~cs6502/resources/cs6502-lect13.pdf · day-to-day transactions of an organization efficiently and

Data WarehousingOriginsIssues

DW Database

Metadata – data about dataDetailed data

the integrated data that comes from the operationalsystems

Lightly and highly summarized datasummarizing: aggregating, sorting, groupingspeeds up the performance of queriessubject to change on ongoing basis

Archive/Backup datadetailed datasummarized data

P. Healy (University of Limerick) CS6502 Spring 2019–2020 14 / 20

Page 21: Applied Big Data and Visualization - University of Limerickgarryowen.csisdmz.ul.ie/~cs6502/resources/cs6502-lect13.pdf · day-to-day transactions of an organization efficiently and

Data WarehousingOriginsIssues

Data Warehouse Architecture

P. Healy (University of Limerick) CS6502 Spring 2019–2020 15 / 20

Page 22: Applied Big Data and Visualization - University of Limerickgarryowen.csisdmz.ul.ie/~cs6502/resources/cs6502-lect13.pdf · day-to-day transactions of an organization efficiently and

Data WarehousingOriginsIssues

End-User Access Tools

Reporting and query toolsAccept or generate SQL statements to query a relationalDW

Application development toolsOnline analytical processing (OLAP) tools:

allow user to analyse data using complex multidimensionalviewsretrospective models

Data miningpredictive models

P. Healy (University of Limerick) CS6502 Spring 2019–2020 16 / 20

Page 23: Applied Big Data and Visualization - University of Limerickgarryowen.csisdmz.ul.ie/~cs6502/resources/cs6502-lect13.pdf · day-to-day transactions of an organization efficiently and

Data WarehousingOriginsIssues

Outline

1 Data WarehousingOriginsIssues

P. Healy (University of Limerick) CS6502 Spring 2019–2020 17 / 20

Page 24: Applied Big Data and Visualization - University of Limerickgarryowen.csisdmz.ul.ie/~cs6502/resources/cs6502-lect13.pdf · day-to-day transactions of an organization efficiently and

Data WarehousingOriginsIssues

Data Marts

A DB that contains a subset of corporate data to support:

the analytical requirements of a particular business unit(such as the Sales department), orusers who share the same requirements to analyse aparticular business process (such as property sales)Two methodologies for building data marts

Data mart receives data from the DW (Inmon, 2001)DW as an integration of data marts for analysing differentbusiness processes (Kimball, 2006)

P. Healy (University of Limerick) CS6502 Spring 2019–2020 18 / 20

Page 25: Applied Big Data and Visualization - University of Limerickgarryowen.csisdmz.ul.ie/~cs6502/resources/cs6502-lect13.pdf · day-to-day transactions of an organization efficiently and

Data WarehousingOriginsIssues

Real-Time Data Warehousing

Initially DW were recognised as systems that heldhistorical dataIn recent years, DW technology has been developed toallow for closer synchronization between operational dataand warehouse data – real-time (RT) or near-real time(NRT) DWsProblem: The problems because of which the operationalsystems were separated from the DW systems are broughtback into the DW

P. Healy (University of Limerick) CS6502 Spring 2019–2020 19 / 20

Page 26: Applied Big Data and Visualization - University of Limerickgarryowen.csisdmz.ul.ie/~cs6502/resources/cs6502-lect13.pdf · day-to-day transactions of an organization efficiently and

Data WarehousingOriginsIssues

Benefits of Data Warehousing

Potential high returns on investmentCompetitive advantageIncreased productivity of corporate decision makersUseful links:

DW Tools

P. Healy (University of Limerick) CS6502 Spring 2019–2020 20 / 20