DataWarehouse-OLAP-1
-
Upload
johncena-rocky -
Category
Documents
-
view
215 -
download
0
Transcript of DataWarehouse-OLAP-1
-
7/29/2019 DataWarehouse-OLAP-1
1/91
1
OLAP and Data Mining
Transparencies
-
7/29/2019 DataWarehouse-OLAP-1
2/91
2
Objectives
Purpose of online analytical processing (OLAP)and how OLAP differs from data warehousing.
Key features of OLAP applications.
Potential benefits associated with successfulOLAP applications.
Rules for OLAP tools and main types of toolsincluding: multi-dimensional OLAP (MOLAP),relational OLAP (ROLAP), and managed queryenvironment (MQE).
-
7/29/2019 DataWarehouse-OLAP-1
3/91
3
Objectives
OLAP extensions to SQL.
Concepts associated with data mining.
Main data mining operations includingpredictive modeling, database segmentation,link analysis, and deviation detection.
Relationship between data mining and datawarehousing.
-
7/29/2019 DataWarehouse-OLAP-1
4/91
4
Data warehousing and end-user access tools
Accompanying growth in data warehouses isincreasing demands for more powerful accesstools providing advanced analytical
capabilities.
Key developments include:
Online analytical processing (OLAP)
SQL extensions for complex data analysis Data mining tools.
-
7/29/2019 DataWarehouse-OLAP-1
5/91
5
Introducing OLAP
The dynamic synthesis, analysis, and
consolidation of large volumes of multi-
dimensional data, Codd (1993).
Describes a technology that uses a multi-
dimensional view of aggregate data to provide
quick access to strategic information for
purposes of advanced analysis.
-
7/29/2019 DataWarehouse-OLAP-1
6/91
6
Introducing OLAP
Enables users to gain a deeper understanding
and knowledge about various aspects of their
corporate data through fast, consistent,
interactive access to a wide variety of possibleviews of the data.
Allows users to view corporate data in such a
way that it is a better model of the true
dimensionality of the enterprise.
-
7/29/2019 DataWarehouse-OLAP-1
7/91
7
Introducing OLAP
Can easily answer who? and what?
questions, however, ability to answer what if?
and why? type questions distinguishes OLAP
from general-purpose query tools.
Types of analysis ranges from basic navigation
and browsing (slicing and dicing) tocalculations, to more complex analyses such as
time series and complex modeling.
-
7/29/2019 DataWarehouse-OLAP-1
8/91
8
OLAP Benchmarks
OLAP Council published an analytical
processing benchmark referred to as the APB-
1 (OLAP Council, 1998).
Aim is to measure a servers overall OLAP
performance rather than the performance of
individual tasks.
-
7/29/2019 DataWarehouse-OLAP-1
9/91
9
OLAP Benchmarks
APB-1 assesses most common business operationsincluding:
bulk loading of data from internal/external data sources
incremental loading of data from operational systems;
aggregation of input level data along hierarchies;
calculation of new data based on business models;
time series analysis;
queries with a high degree of complexity;
drill-down through hierarchies;
ad hoc queries;
multiple online sessions.
-
7/29/2019 DataWarehouse-OLAP-1
10/91
10
OLAP Benchmarks
OLAP applications are judged on their abilityto provide just-in-time (JIT) information, acore requirement of supporting effectivedecision-making.
Assessing a servers ability to satisfy thisrequirement is more than measuring
processing performance but includes itsabilities to model complex businessrelationships and to respond to changingbusiness requirements.
-
7/29/2019 DataWarehouse-OLAP-1
11/91
11
OLAP Benchmarks
APB-1 uses a standard benchmark metric
called AQM (Analytical Queries per Minute).
AQM represents number of analytical queries
processed per minute including data loading
and computation time. Thus, AQM
incorporates data loading performance,calculation performance, and query
performance into a singe metric.
-
7/29/2019 DataWarehouse-OLAP-1
12/91
12
OLAP Benchmarks
Publication of APB-1 benchmark results must
include both the database schema and all code
required for executing the benchmark.
An essential requirement of all OLAP
applications is ability to provide users with JIT
information, to make effective decisions aboutan organization's strategic directions.
-
7/29/2019 DataWarehouse-OLAP-1
13/91
13
OLAP Applications
JIT information is computed data that usually
reflects complex relationships and is often
calculated on the fly.
Also as data relationships may not be known inadvance, the data model must be flexible.
-
7/29/2019 DataWarehouse-OLAP-1
14/91
14
Examples of OLAP applications in various
functional areas
-
7/29/2019 DataWarehouse-OLAP-1
15/91
15
OLAP Applications
Although OLAP applications are found in
widely divergent functional areas, all have
following key features:
multi-dimensional views of data
support for complex calculations
time intelligence.
-
7/29/2019 DataWarehouse-OLAP-1
16/91
16
OLAP Applications - multi-dimensional
views of data
Core requirement of building a realisticbusiness model.
Provides basis for analytical processingthrough flexible access to corporate data.
The underlying database design that provides
the multi-dimensional view of data should treatall dimensions equally.
-
7/29/2019 DataWarehouse-OLAP-1
17/91
17
OLAP Applications - support for complex
calculations
Must provide a range of powerful
computational methods such as that required
by sales forecasting, which uses trend
algorithms such as moving averages andpercentage growth.
Mechanisms for implementing computationalmethods should be clear and non-procedural.
-
7/29/2019 DataWarehouse-OLAP-1
18/91
18
OLAP Applicationstime intelligence
Key feature of almost any analyticalapplication as performance is almost always
judged over time.
Time hierarchy is not always used in samemanner as other hierarchies.
Concepts such as year-to-date and period-over-period comparisons should be easily defined.
-
7/29/2019 DataWarehouse-OLAP-1
19/91
19
OLAP Benefits
Increased productivity of end-users.
Reduced backlog of applications development
for IT staff.
Retention of organizational control over the
integrity of corporate data.
Reduced query drag and network traffic on
OLTP systems or on the data warehouse. Improved potential revenue and profitability.
-
7/29/2019 DataWarehouse-OLAP-1
20/91
20
Representing Multi-dimensional Data
Example of two-dimensional query.
What is the total revenue generated by property
sales in each city, in each quarter of 1997?
Choice of representation is based on types of
queries end-user may ask.
Compare representation - three-field relationaltable versus two-dimensional matrix.
-
7/29/2019 DataWarehouse-OLAP-1
21/91
21
Multi-dimensional Data as Three-field table
versus Two-dimensional Matrix
-
7/29/2019 DataWarehouse-OLAP-1
22/91
22
Representing Multi-dimensional Data
Example of three-dimensional query.
What is the total revenue generated by property
sales for each type of property (Flat or House) in
each city, in each quarter of 1997?
Compare representation - four-field relational
table versus three-dimensional cube.
-
7/29/2019 DataWarehouse-OLAP-1
23/91
23
Multi-dimensional Data as Four-field Table
versus Three-dimensional Cube
-
7/29/2019 DataWarehouse-OLAP-1
24/91
24
Representing Multi-dimensional Data
Cube represents data as cells in an array.
Relational table only represents multi-
dimensional data in two dimensions.
-
7/29/2019 DataWarehouse-OLAP-1
25/91
25
Multi-dimensional OLAP Servers
Use multi-dimensional structures to store data
and relationships between data.
Multi-dimensional structures are bestvisualized as cubes of data, and cubes within
cubes of data. Each side of cube is a dimension.
A cube can be expanded to include otherdimensions.
-
7/29/2019 DataWarehouse-OLAP-1
26/91
26
Multi-dimensional OLAP Servers
A cube supports matrix arithmetic.
Multi-dimensional query response time
depends on how many cells have to be addedon the fly.
As number of dimensions increases, number of
the cubes cells increases exponentially.
-
7/29/2019 DataWarehouse-OLAP-1
27/91
27
Multi-dimensional OLAP Servers
However, majority of multi-dimensionalqueries use summarized, high-level data.
Solution is to pre-aggregate (consolidate) alllogical subtotals and totals along alldimensions.
Pre-aggregation is valuable, as typical
dimensions are hierarchical in nature. (e.g. Time dimension hierarchy - years, quarters,
months, weeks, and days)
-
7/29/2019 DataWarehouse-OLAP-1
28/91
28
Multi-dimensional OLAP Servers
Predefined hierarchy allows logical pre-
aggregation and, conversely, allows for a
logical drill-down.
Supports common analytical operations
Consolidation
Drill-down
Slicing and dicing.
-
7/29/2019 DataWarehouse-OLAP-1
29/91
29
Multi-dimensional OLAP Servers
Consolidation - aggregation of data such assimple roll-ups or complex expressionsinvolving inter-related data.
Drill-Down - is reverse of consolidation andinvolves displaying the detailed data thatcomprises the consolidated data.
Slicing and Dicing - (also called pivoting) refersto the ability to look at the data from differentviewpoints.
-
7/29/2019 DataWarehouse-OLAP-1
30/91
30
Multi-dimensional OLAP servers
Can store data in a compressed form by
dynamically selecting physical storage
organizations and compression techniques that
maximize space utilization.
Dense data (ie., data that exists for high
percentage of cells) can be stored separately
from sparse data (ie., significant percentage of
cells are empty).
-
7/29/2019 DataWarehouse-OLAP-1
31/91
31
Multi-dimensional OLAP Servers
Ability to omit empty or repetitive cells can
greatly reduce the size of the cube and the
amount of processing.
Allows analysis of exceptionally large amounts
of data.
-
7/29/2019 DataWarehouse-OLAP-1
32/91
32
Multi-dimensional OLAP Servers
In summary, pre-aggregation, dimensional
hierarchy, and sparse data management can
significantly reduce the size of the cube and the
need to calculate values on-the-fly.
Removes need for multi-table joins and
provides quick and direct access to arrays of
data, thus significantly speeding up execution
of multi-dimensional queries.
-
7/29/2019 DataWarehouse-OLAP-1
33/91
33
Codds Rules for OLAP Systems
In 1993, E.F. Codd formulated twelve rules as
the basis for selecting OLAP tools.
Multi-dimensional conceptual view
Transparency
Accessibility
Consistent reporting performance
Client-server architectureGeneric dimensionality
-
7/29/2019 DataWarehouse-OLAP-1
34/91
34
Codds rules for OLAP
Dynamic sparse matrix handling
Multi-user support
Unrestricted cross-dimensional operations
Intuitive data manipulation
Flexible reporting
Unlimited dimensions and aggregation levels.
-
7/29/2019 DataWarehouse-OLAP-1
35/91
35
Codds Rules for OLAP Systems
There are proposals to re-defined or extended
the rules. For example, to also include:
Comprehensive database management tools
Ability to drill down to detail (source record) level Incremental database refresh
SQL interface to the existing enterprise
environment
-
7/29/2019 DataWarehouse-OLAP-1
36/91
36
Categories of OLAP Tools
OLAP tools are categorized according to the
architecture of the underlying database.
Three main categories of OLAP tools include Multi-dimensional OLAP (MOLAP or MD-OLAP)
Relational OLAP (ROLAP), also called multi-
relational OLAP
Managed query environment (MQE)
-
7/29/2019 DataWarehouse-OLAP-1
37/91
37
Multi-dimensional OLAP (MOLAP)
Use specialized data structures and multi-
dimensional Database Management Systems
(MDDBMSs) to organize, navigate, and
analyze data.
Data is typically aggregated and stored
according to predicted usage to enhance query
performance.
-
7/29/2019 DataWarehouse-OLAP-1
38/91
38
Multi-dimensional OLAP (MOLAP)
Use array technology and efficient storage
techniques that minimize the disk space
requirements through sparse data
management.
Provides excellent performance when data is
used as designed, and the focus is on data for a
specific decision-support application.
-
7/29/2019 DataWarehouse-OLAP-1
39/91
39
Multi-dimensional OLAP (MOLAP)
Traditionally, require a tight coupling with the
application layer and presentation layer.
Recent trends segregate the OLAP from the
data structures through the use of published
application programming interfaces (APIs).
-
7/29/2019 DataWarehouse-OLAP-1
40/91
40
Typical Architecture for MOLAP Tools
-
7/29/2019 DataWarehouse-OLAP-1
41/91
41
MOLAP Tools - Development Issues
Underlying data structures are limited in their
ability to support multiple subject areas and to
provide access to detailed data.
Navigation and analysis of data is limited
because the data is designed according to
previously determined requirements.
-
7/29/2019 DataWarehouse-OLAP-1
42/91
42
MOLAP Tools - Development Issues
MOLAP products require a different set of
skills and tools to build and maintain the
database, thus increasing the cost and
complexity of support.
-
7/29/2019 DataWarehouse-OLAP-1
43/91
43
Relational OLAP (ROLAP)
Fastest-growing style of OLAP technology.
Supports RDBMS products using a metadata
layer - avoids need to create a static multi-
dimensional data structure - facilitates the
creation of multiple multi-dimensional views of
the two-dimensional relation.
-
7/29/2019 DataWarehouse-OLAP-1
44/91
44
Relational OLAP (ROLAP)
To improve performance, some products use
SQL engines to support complexity of multi-
dimensional analysis, while others recommend,
or require, the use of highly denormalizeddatabase designs such as the star schema.
-
7/29/2019 DataWarehouse-OLAP-1
45/91
45
Typical Architecture for ROLAP Tools
-
7/29/2019 DataWarehouse-OLAP-1
46/91
46
ROLAP Tools - Development Issues
Middleware to facilitate the development of
multi-dimensional applications. (Software that
converts the two-dimensional relation into a
multi-dimensional structure).
Development of an option to create persistent,
multi-dimensional structures with facilities to
assist in the administration of these structures.
-
7/29/2019 DataWarehouse-OLAP-1
47/91
47
Managed Query Environment (MQE)
Relatively new development.
Provide limited analysis capability, either
directly against RDBMS products, or by usingan intermediate MOLAP server.
-
7/29/2019 DataWarehouse-OLAP-1
48/91
48
Managed Query Environment (MQE)
Deliver selected data directly from DBMS or
via a MOLAP server to desktop (or local
server) in form of a datacube, where it is
stored, analyzed, and maintained locally.
Promoted as being relatively simple to install
and administer with reduced cost and
maintenance.
-
7/29/2019 DataWarehouse-OLAP-1
49/91
49
Typical Architecture for MQE Tools
-
7/29/2019 DataWarehouse-OLAP-1
50/91
50
MQE Tools - Development Issues
Architecture results in significant dataredundancy and may cause problems fornetworks that support many users.
Ability of each user to build a custom datacubemay cause a lack of data consistency amongusers.
Only a limited amount of data can beefficiently maintained.
-
7/29/2019 DataWarehouse-OLAP-1
51/91
51
OLAP Extensions to SQL
SQL promoted as easy-to-learn, nonprocedural,free-format, DBMS-independent, andinternational standard.
However, major disadvantage has been inabilityto represent many of the questions mostcommonly asked by business analysts.
IBM and Oracle jointly proposed OLAPextensions to SQL early in 1999, adopted as anamendment to SQL.
-
7/29/2019 DataWarehouse-OLAP-1
52/91
52
OLAP Extensions to SQL
Many database vendors including IBM,
Oracle, Informix, and Red Brick Systems have
already implemented portions of specifications
in their DBMSs.
Red Brick Systems was first to implement
many essential OLAP functions (as Red Brick
Intelligent SQL (RISQL)), albeit in advance of
the standard.
-
7/29/2019 DataWarehouse-OLAP-1
53/91
53
OLAP Extensions to SQL - RISQL
Designed for business analysts.
Set of extensions that augments SQL with a
variety of powerful operations appropriate todata analysis and decision-support applications
such as ranking, moving averages,
comparisons, market share, this year versus
last year.
-
7/29/2019 DataWarehouse-OLAP-1
54/91
54
Use of the RISQL CUME Function
Show the quarterly sales for branch office B003,along with the monthly year-to-date figures.
SELECT quarter, quarterlySales, CUME(quarterlySales)
AS Year-to-DateFROM BranchSales
WHERE branchNo = 'B003';
-
7/29/2019 DataWarehouse-OLAP-1
55/91
55
Use of the RISQL MOVINGAVG / MOVINGSUM
Function
Show the first six monthly sales for branch office
B003 without the effect of seasonality.
SELECT month, monthlySales,
MOVINGAVG(monthlySales) AS 3-MonthMovingAvg,MOVINGSUM(monthlySales) AS 3-MonthMovingSum
FROM BranchSales
WHERE branchNo = 'B003';
-
7/29/2019 DataWarehouse-OLAP-1
56/91
56
Data Mining
The process of extracting valid, previously
unknown, comprehensible, and actionable
information from large databases and using it
to make crucial business decisions,(Simoudis,1996).
Involves analysis of data and use of software
techniques for finding hidden and unexpected
patterns and relationships in sets of data.
-
7/29/2019 DataWarehouse-OLAP-1
57/91
57
Data Mining
Reveals information that is hidden andunexpected, as little value in finding patternsand relationships that are already intuitive.
Patterns and relationships are identified byexamining the underlying rules and features inthe data.
Tends to work from the data up and most
accurate results normally require largevolumes of data to deliver reliable conclusions.
-
7/29/2019 DataWarehouse-OLAP-1
58/91
58
Data Mining
Starts by developing an optimal representationof structure of sample data, during which timeknowledge is acquired and extended to largersets of data.
Data mining can provide huge paybacks forcompanies who have made a significantinvestment in data warehousing.
Relatively new technology, however alreadyused in a number of industries.
-
7/29/2019 DataWarehouse-OLAP-1
59/91
59
Examples of Applications of Data Mining
Retail / Marketing
Identifying buying patterns of customers
Finding associations among customer
demographic characteristics
Predicting response to mailing campaigns
Market basket analysis
-
7/29/2019 DataWarehouse-OLAP-1
60/91
60
Examples of Applications of Data Mining
Banking
Detecting patterns of fraudulent credit card use
Identifying loyal customers
Predicting customers likely to change their creditcard affiliation
Determining credit card spending by customer
groups
-
7/29/2019 DataWarehouse-OLAP-1
61/91
61
Examples of Applications of Data Mining
Insurance
Claims analysis
Predicting which customers will buy new policies.
Medicine
Characterizing patient behavior to predict surgery
visits
Identifying successful medical therapies fordifferent illnesses.
-
7/29/2019 DataWarehouse-OLAP-1
62/91
62
Data Mining Operations
Four main operations include: Predictive modeling
Database segmentation
Link analysis
Deviation detection.
There are recognized associations between theapplications and the corresponding operations.
e.g. Direct marketing strategies use databasesegmentation.
-
7/29/2019 DataWarehouse-OLAP-1
63/91
63
Data Mining Techniques
Techniques are specific implementations of the
data mining operations.
Each operation has its own strengths and
weaknesses.
Data mining tools sometimes offer a choice of
operations to implement a technique.
-
7/29/2019 DataWarehouse-OLAP-1
64/91
64
Data Mining Techniques
Criteria for selection of tool includes
Suitability for certain input data types
Transparency of the mining output
Tolerance of missing variable values Level of accuracy possible
Ability to handle large volumes of data.
Data Mining Operations and Associated
-
7/29/2019 DataWarehouse-OLAP-1
65/91
65
Data Mining Operations and Associated
Techniques
-
7/29/2019 DataWarehouse-OLAP-1
66/91
66
Predictive Modeling
Similar to the human learning experience
uses observations to form a model of the important
characteristics of some phenomenon.
Uses generalizations of real world and abilityto fit new data into a general framework.
Can analyze a database to determine essential
characteristics (model) about the data set.
-
7/29/2019 DataWarehouse-OLAP-1
67/91
67
Predictive Modeling
Model is developed using a supervised learning
approach, which has two phases: training and
testing.
Training builds a model using a large sample ofhistorical data called a training set.
Testing involves trying out the model on new,
previously unseen data to determine its accuracy
and physical performance characteristics.
-
7/29/2019 DataWarehouse-OLAP-1
68/91
68
Predictive Modeling
Applications of predictive modeling include
customer retention management, credit
approval, cross selling, and direct marketing.
Two techniques associated with predictive
modeling: classification and value prediction,
distinguished by nature of the variable being
predicted.
-
7/29/2019 DataWarehouse-OLAP-1
69/91
69
Predictive Modeling - Classification
Used to establish a specific predetermined class
for each record in a database from a finite set
of possible, class values.
Two specializations of classification: tree
induction and neural induction.
-
7/29/2019 DataWarehouse-OLAP-1
70/91
70
Example of Classification using Tree Induction
Example of Classification using Neural
-
7/29/2019 DataWarehouse-OLAP-1
71/91
71
Example of Classification using Neural
Induction
-
7/29/2019 DataWarehouse-OLAP-1
72/91
72
Predictive Modeling - Value Prediction
Used to estimate a continuous numeric value
that is associated with a database record.
Uses the traditional statistical techniques oflinear regression and nonlinear regression.
Relatively easy-to-use and understand.
-
7/29/2019 DataWarehouse-OLAP-1
73/91
73
Predictive Modeling - Value Prediction
Linear regression attempts to fit a straight line
through a plot of the data, such that the line is
the best representation of the average of all
observations at that point in the plot.
Problem is that the technique only works well
with linear data and is sensitive to the presence
of outliers (ie., data values, which do notconform to the expected norm).
-
7/29/2019 DataWarehouse-OLAP-1
74/91
74
Predictive Modeling - Value Prediction
Although nonlinear regression avoids the main
problems of linear regression, still not flexible
enough to handle all possible shapes of the data
plot.
Statistical measurements are fine for building
linear models that describe predictable data
points, however, most data is not linear in
nature.
-
7/29/2019 DataWarehouse-OLAP-1
75/91
75
Predictive Modeling - Value Prediction
Data mining requires statistical methods that
can accommodate non-linearity, outliers, and
non-numeric data.
Applications of value prediction include credit
card fraud detection or target mailing list
identification.
-
7/29/2019 DataWarehouse-OLAP-1
76/91
76
Database Segmentation
Aim is to partition a database into an unknown
number of segments, or clusters, of similar
records.
Uses unsupervised learning to discover
homogeneous sub-populations in a database to
improve the accuracy of the profiles.
-
7/29/2019 DataWarehouse-OLAP-1
77/91
77
Database Segmentation
Less precise than other operations thus lesssensitive to redundant and irrelevant features.
Sensitivity can be reduced by ignoring a subset
of the attributes that describe each instance orby assigning a weighting factor to eachvariable.
Applications of database segmentation includecustomer profiling, direct marketing, and crossselling.
Example of Database Segmentation using a
-
7/29/2019 DataWarehouse-OLAP-1
78/91
78
Example of Database Segmentation using a
Scatterplot
-
7/29/2019 DataWarehouse-OLAP-1
79/91
79
Database Segmentation
Associated with demographic or neural
clustering techniques, distinguished by:
Allowable data inputs
Methods used to calculate the distancebetween records
Presentation of the resulting segments for
analysis.
-
7/29/2019 DataWarehouse-OLAP-1
80/91
80
Link Analysis
Aims to establish links (associations) betweenrecords, or sets of records, in a database.
There are three specializations
Associations discoverySequential pattern discovery
Similar time sequence discovery
Applications include product affinity analysis,direct marketing, and stock price movement.
-
7/29/2019 DataWarehouse-OLAP-1
81/91
81
Link Analysis - Associations Discovery
Finds items that imply the presence of other
items in the same event.
Affinities between items are represented by
association rules.
e.g. When customer rents property for more than 2
years and is more than 25 years old, in 40% of cases,
customer will buy a property. Association happens in
35% of all customers who rent properties.
-
7/29/2019 DataWarehouse-OLAP-1
82/91
82
Link Analysis - Sequential Pattern Discovery
Finds patterns between events such that the
presence of one set of items is followed by
another set of items in a database of events
over a period of time. e.g. Used to understand long term customer buying
behavior.
-
7/29/2019 DataWarehouse-OLAP-1
83/91
83
Link Analysis - Similar Time Sequence Discovery
Finds links between two sets of data that are
time-dependent, and is based on the degree of
similarity between the patterns that both time
series demonstrate.
e.g. Within three months of buying property, new
home owners will purchase goods such as cookers,
freezers, and washing machines.
-
7/29/2019 DataWarehouse-OLAP-1
84/91
84
Deviation Detection
Relatively new operation in terms of
commercially available data mining tools.
Often a source of true discovery because itidentifies outliers, which express deviation
from some previously known expectation and
norm.
-
7/29/2019 DataWarehouse-OLAP-1
85/91
85
Deviation Detection
Can be performed using statistics and
visualization techniques or as a by-product of
data mining.
Applications include fraud detection in the use
of credit cards and insurance claims, quality
control, and defects tracing.
Example of Database Segmentation using a
-
7/29/2019 DataWarehouse-OLAP-1
86/91
86
Example of Database Segmentation using a
Visualization
-
7/29/2019 DataWarehouse-OLAP-1
87/91
87
Data Mining Tools
There are a growing number of commercialdata mining tools on the marketplace.
Important characteristics of data mining toolsinclude:
Data preparation facilities
Selection of data mining operations
Product scalability and performance Facilities for visualization of results.
-
7/29/2019 DataWarehouse-OLAP-1
88/91
88
Data Mining and Data Warehousing
Major challenge to exploit data mining is
identifying suitable data to mine.
Data mining requires single, separate, clean,integrated, and self-consistent source of data.
-
7/29/2019 DataWarehouse-OLAP-1
89/91
89
Data Mining and Data Warehousing
A data warehouse is well equipped for
providing data for mining.
Data quality and consistency is a pre-requisitefor mining to ensure the accuracy of the
predictive models. Data warehouses are
populated with clean, consistent data.
-
7/29/2019 DataWarehouse-OLAP-1
90/91
90
Data Mining and Data Warehousing
Advantageous to mine data from multiplesources to discover as many interrelationships
as possible. Data warehouses contain data from
a number of sources.
Selecting relevant subsets of records and fields
for data mining requires query capabilities of
the data warehouse.
-
7/29/2019 DataWarehouse-OLAP-1
91/91
Data Mining and Data Warehousing
Results of a data mining study are useful if
there is some way to further investigate the
uncovered patterns. Data warehouses provide
capability to go back to the data source.