datawarehousing chap01
-
Upload
rahulmhatre26 -
Category
Documents
-
view
226 -
download
0
Transcript of datawarehousing chap01
-
8/7/2019 datawarehousing chap01
1/27
Data Warehousing
-
8/7/2019 datawarehousing chap01
2/27
Outline What is data warehousing The benefit of data warehousing
Differences between OLTP and data warehousing
The architecture of data warehouse
The main components Data flows
Tools and technologies
Integration
The importance of managing meta-data
Data marts
-
8/7/2019 datawarehousing chap01
3/27
What is data warehousing? data warehousing is subject-oriented,
integrated, time-variant, and non-volatile
collection of data in support of managementsdecision-making process.
a data warehouse is data management anddata analysis
data webhouse is a distributed datawarehouse that is implement over the webwith no central data repository
goal: is to integrate enterprise wide corporate
data into a single reository from which userscan easil run ueries
-
8/7/2019 datawarehousing chap01
4/27
-
8/7/2019 datawarehousing chap01
5/27
The benefits of data
warehousing The potential benefits of data
warehousing are high returns on
investment.. substantial competitive advantage.. increased productivity of corporate
decision-makers..
-
8/7/2019 datawarehousing chap01
6/27
The difference bewteen OLTP
and data warehousing A DBMS built for online transaction
processing (OLTP) is generally
regarded as unsuitable for datawarehousing because each system isdesigned with a differing set of
requirements in mind example: OLTP systems are design to maximize the transaction
processing capacity, while data warehouses are designed tosupport ad hoc query processing
-
8/7/2019 datawarehousing chap01
7/27
comparision ofOLTP systems and datawarehousing system
OLTP systems Data warehousingsystems
Hold current dataStores detailed dataData is dynamic
Repetitive processingHigh level of transaction throughputPredictable pattern of usageTransaction-drivenApplication-orentedSupports day-to-day decisionsServes large number of clerical/operation
users
Holds historical dataStores detailed, lightly, and highlysummarized data
Data is largely staticAd hoc, unstructured, and heuristicprocessingMedium to how level of transactionthroughputUnpredictable pattern of usageAnalysis driven
Subject-orientedsupports strategic decisionsServes relatively how number ofmanagerial users
-
8/7/2019 datawarehousing chap01
8/27
Problems Underestimation of resources for data loading Hidden problems with source systems Required data not captured Increased end-user demands Data homogenization High demand for resources Data ownership High maintenance
Long-duration projects Complexity of integration
-
8/7/2019 datawarehousing chap01
9/27
Operationaldata
source1
The architectureQuery Manage
Warehouse ManagerDBMS
Operationaldata
source
2
Meta-data
Highsumma
rizeddata
Detailed data
Lightlysumma
rizeddata
Operationaldatastore
(ods)
Operationaldata
sourcen
Archive/back
updata
Load Manager
Data mining
OLAP(online analyticalprocessing) tools
Reporting, query,application development,and EIS(executiveinformation system) tools
End-useraccesstoolsTypical architecture of a data
warehouse
Operational data store(ODS)
-
8/7/2019 datawarehousing chap01
10/27
The main components Operational data sourcesfor the DW is supplied from
mainframe operational data held in first generation hierarchicaland network databases, departmental data held in proprietary file
systems, private data held on workstaions and private servesand external systems such as the Internet, commerciallyavailable DB, or DB assoicated with and organizations suppliersor customers
Operational datastore(ODS)is a repository of
current and integrated operational data used for analysis. It isoften structured and supplied with data in the same way as thedata warehouse, but may in fact simply act as a staging area fordata to be moved into the warehouse
-
8/7/2019 datawarehousing chap01
11/27
The main components load manageralso called the frontendcomponent, it
performance all the operations associated with the extraction andloading of data into the warehouse. These operations include
simple transformations of the data to prepare the data for entryinto the warehouse
warehouse managerperforms all the operations associatedwith the management of the data in the warehouse. Theoperations performed by this component include analysis of datato ensure consistency, transformation and merging of sourcedata, creation of indexes and views, generation ofdenormalizations and aggregations, and archiving and backing-up data
-
8/7/2019 datawarehousing chap01
12/27
The main components query manageralso called backend component, it performs
all the operations associated with the management of userqueries. The operations performed by this component include
directing queries to the appropriate tables and scheduling theexecution of queries
detailed, lightly and lightly summarizeddata,archive/backup data
meta-data
end-user access toolscan be categorized into five maingroups: data reporting and query tools, application developmenttools, executive information system (EIS) tools, online analyticalprocessing (OLAP) tools, and data mining tools
-
8/7/2019 datawarehousing chap01
13/27
Data flows Inflow- The processes associated with the extraction, cleansing, and
loading of the data from the source systems into the data warehouse.
upflow- The process associated with adding value to the data in thewarehouse through summarizing, packaging , packaging, and distribution of thedata
downflow- The processes associated with archiving and backing-upof data in the warehouse
outflow- The process associated with making the data availabe to theend-users Meta-flow- The processes associated with the management of the
meta-data
-
8/7/2019 datawarehousing chap01
14/27
Operationaldata
source1
Warehouse Manager
DBMS
Meta-data
Highsumma
rizeddata
Detailed data
Lightlysumma
rizeddata
Operationaldatastore
(ods)
Operationaldata
sourcen
Archive/back
updata
LoadManager
Data mining tools
OLAP (onlineanalytical processing)tools
End-useraccesstools
Information flows of a data
warehouse
Reporting, query,applicationdevelopment, and EIS (executiveinformation system) tools
Downflow
Inflow
Meta-flow
UpflowQuery Manage
Outflow
Warehouse Manager
-
8/7/2019 datawarehousing chap01
15/27
Tools and Technologies The critical steps in the construction of a data
warehouse:
a. Extractionb. Cleansingc. Transformation after the critical steps, loading the results into
target system can be carried out either byseparate products, or by a single, categories: code generators database data replication tools dynamic transformation engines
-
8/7/2019 datawarehousing chap01
16/27
Data Warehouse
DBSM(integration) due to the maturity of such products, most
relational databases will integrate predictably
with other types of software The reqirements for data warehose RDBMS Load performance Load processing Data quality management Query perfomance Terabyte scalability
Mass user scalability Networked data warehouse Warehouse administration Integrated dimensional analysis Advanced query funtionlity
-
8/7/2019 datawarehousing chap01
17/27
The importance of managing
meta-data(integration) The integration of meta-data, that is data about data Meta-data is used for a variety of purposes and the management
of it is a critical issue in achieving a fully integrated data
warehouse The major purpose of meta-data is to show the pathway back towhere the data began, so that the warehouse administratorsknow the history of any item in the warehouse
The meta-data associated with data transformation and loadingmust describe the source data and any changes that were made
to the data The meta-data associated with data management describes the
data as it is stored in the warehouse The meta-data is required by the query manager to generate
appropriate queries, also is associated with the user of queries
-
8/7/2019 datawarehousing chap01
18/27
The major integration issue is how to synchronize the varioustypes of meta-data use throughout the data warehouse. Thechallenge is to synchronize meta-data between different products
from different vendors using different meta-data stores Two major standards for meta-data and modeling in the areas ofdata warehousing and component-based development-MDC(Meta Data Coalition) and OMG(Object ManagementGroup)
-
8/7/2019 datawarehousing chap01
19/27
Administration and
Management Tools a data warehouse requires tools to support the
administration and management of such complexenviroment.
for the various types of meta-data and the day-to-dayoperations of the data warehouse, the administrationand management tools must be capable ofsupporting those tasks:
monitoring data loading from multiple sources data quality and integrity checks managing and updating meta-data monitoring database performance to ensure efficient query
response times and resource utilization
-
8/7/2019 datawarehousing chap01
20/27
auditing data warehouse usage to provide user chargebackinformation
replicating, subsetting, and distributing data
maintaining effient data storage management purging data; archiving and backing-up data implementing recovery following failure security management
-
8/7/2019 datawarehousing chap01
21/27
Data mart data mart a subset of a data
warehouse that supports the
requirements of particular departmentor business function
The characteristics that differentiate
data marts and data warehousesinclude: a data mart focuses on only the requirements of users
associated with one department or business function
-
8/7/2019 datawarehousing chap01
22/27
data marts do not normally contain detailed operational data,unlike data warehouses
as data marts contain less data compared with data warehouses,
data marts are more easily understood and navigated
-
8/7/2019 datawarehousing chap01
23/27
Operationaldata
source1
Warehouse Manager
DBMS
Operati
onaldatasource
2
Meta-data
Highsumma
rizeddata
Detailed data
Lightlysumma
rizeddata
Operati
onaldatastore(ods)
Operationaldata
sourcen
Archive/back
updata
LoadManager
Data mining
OLAP(online analyticalprocessing) tools
Reporting, query,application developmenand EIS(executive information system) to
End-useraccesstools
Typical data warehouse adn data mart
architecture
Operational data store (ODS)
QueryManage
summarized
data(Relational
database)Summ
arizeddata
(Multi-dimension
database)
Data Mart
(First Tier)(Third Tier)
(Second Tier)
Warehouse Manager
-
8/7/2019 datawarehousing chap01
24/27
Reasons for creating a data
mart To give users access to the data they need to analyze most often To provide data in a form that matches the collective view of the
data by a group of users in a department or business function
To improve end-user response time due to the reduction in thevolume of data to be accessed To provide appropriately structured data as ditated by the
requirements of end-user access tools Normally use less data so tasks such as data cleansing, loading,
transformation, and integration are far easier, and hence
implementing and setting up a data mart is simpler thanestablishing a corporate data warehouse
-
8/7/2019 datawarehousing chap01
25/27
The cost of implementing data marts is normally less than thatrequired to establish a data warehouse
The potential users of a data mart are more clearly defined and
can be more easily targeted to obtain support for a data martproject rather than a corporate data warehouse project
-
8/7/2019 datawarehousing chap01
26/27
data marts issues data mart functionalitythe capabilities of data
marts have increased with the growth in their popularity
data mart sizethe performance deteriorates asdata marts grow in size, so need to reduce the sizeof data marts to gain improvements in performance
data mart load performancetwo criticalcomponents: end-user response time and dataloading performanceto increment DB updating sothat only cells affected by the change are updatedand not the entire MDDB structure
-
8/7/2019 datawarehousing chap01
27/27
users access to data in multiple martsoneapproach is to replicate data between different data marts or,alternatively, build virtual data martit is views ofseveralphysical data marts or the corporate data warehouse tailored tomeet the requirements ofspecific groups ofusers
data mart internet/intranet accessits products sitbetween a web server and the data analysis product.Internet/intranetoffers users low-cost access to data marts and the data WH using web
browsers. data mart administrationorganization can not easily
perform administration of multiple data marts, giving rise to issues suchas data mart versioning, data and meta-data consistency and integrity,enterprise-wide security, and performance tuning . Data martadministrative tools are commerciallly available
data mart installationdata marts are becomingincreasingly complex to build. Vendors are offering productsreferred to as data mart in a box that provide a low-cost sourceof data mart tools