Chapter 3 Database Support in Data Mining Types of database systems How relate to data mining.
-
Upload
merryl-rogers -
Category
Documents
-
view
224 -
download
3
Transcript of Chapter 3 Database Support in Data Mining Types of database systems How relate to data mining.
![Page 1: Chapter 3 Database Support in Data Mining Types of database systems How relate to data mining.](https://reader036.fdocuments.us/reader036/viewer/2022062308/56649eda5503460f94be9bbb/html5/thumbnails/1.jpg)
Chapter 3Chapter 3Database Support in Data MiningDatabase Support in Data Mining
Types of database systems
How relate to data mining
![Page 2: Chapter 3 Database Support in Data Mining Types of database systems How relate to data mining.](https://reader036.fdocuments.us/reader036/viewer/2022062308/56649eda5503460f94be9bbb/html5/thumbnails/2.jpg)
結束
3-2
ContentsContents
Describes data warehousing and related database system.Describes data warehousing and related database system.
Discusses feature of data found in data warehouseDiscusses feature of data found in data warehouse
Describes how data warehouses are typically implemented Describes how data warehouses are typically implemented and operatedand operated
Defines metadata in the context of data warehousesDefines metadata in the context of data warehouses
Show how different data systems are typically used in data Show how different data systems are typically used in data miningmining
Provides real examples of database systems used in data Provides real examples of database systems used in data miningmining
Discusses the concept of data qualityDiscusses the concept of data quality
Reviews the database software marketReviews the database software market
![Page 3: Chapter 3 Database Support in Data Mining Types of database systems How relate to data mining.](https://reader036.fdocuments.us/reader036/viewer/2022062308/56649eda5503460f94be9bbb/html5/thumbnails/3.jpg)
結束
3-3
Data managementData management
Retail organization generate masses of data that require very advanced data storage system.
Wal-Mart relied on modern data management to engage with SCM.
The manipulation of data is a key element in the data mining process.
Data mining and other analysis can draw upon data collected in internal systems and external sources.
![Page 4: Chapter 3 Database Support in Data Mining Types of database systems How relate to data mining.](https://reader036.fdocuments.us/reader036/viewer/2022062308/56649eda5503460f94be9bbb/html5/thumbnails/4.jpg)
結束
3-4
Data accessData access
Data warehouses are not requirements to do data mining, data warehouses store massive amounts of data that can be used for data mining.
Data mining analyses also use smaller sets of data that can be organized in online analytic processing (OLAP) systems of in data mining.
OLAP: provides access to report generators and graphical support.
![Page 5: Chapter 3 Database Support in Data Mining Types of database systems How relate to data mining.](https://reader036.fdocuments.us/reader036/viewer/2022062308/56649eda5503460f94be9bbb/html5/thumbnails/5.jpg)
結束
3-5
Contemporary DatabaseContemporary Database
Gain competitive advantage customer information systems
data mining
Develop and market new productsmicromarketing
![Page 6: Chapter 3 Database Support in Data Mining Types of database systems How relate to data mining.](https://reader036.fdocuments.us/reader036/viewer/2022062308/56649eda5503460f94be9bbb/html5/thumbnails/6.jpg)
結束
3-6
SystemsSystems
DatabasePersonal, small business level
On-Line Analytic Processing (OLAP)Ability to use many dimensions, reports & graphics
Data MartUsually temporary analysis
Data WarehouseUsually permanent repository
![Page 7: Chapter 3 Database Support in Data Mining Types of database systems How relate to data mining.](https://reader036.fdocuments.us/reader036/viewer/2022062308/56649eda5503460f94be9bbb/html5/thumbnails/7.jpg)
結束
3-7
Data WarehousingData Warehousing
Price Waterhouse definition:A data warehouse is an orderly and accessible
repository of known facts and related data that is used as a basis for making better management decisions. The data warehouse provides a unified repository of consistent data for decision making that is subject oriented, integrated, time variant, and nonvolatile.
![Page 8: Chapter 3 Database Support in Data Mining Types of database systems How relate to data mining.](https://reader036.fdocuments.us/reader036/viewer/2022062308/56649eda5503460f94be9bbb/html5/thumbnails/8.jpg)
結束
3-8
Data WarehousingData Warehousing
Data warehouses are used to store massive quantities of data that can be updated and allow quick retrieval of specific types of data.
Not just a technology; an architecture and process designed to support decision making
special-purpose database systems to improve query performance significantly
Three general data warehouse processes: 1. warehouse generation is the process of designing the
warehouse and loading the data.
2. Data management is the process of storing the data.
3. Information analysis is the process of using the data to support organization decision making.
![Page 9: Chapter 3 Database Support in Data Mining Types of database systems How relate to data mining.](https://reader036.fdocuments.us/reader036/viewer/2022062308/56649eda5503460f94be9bbb/html5/thumbnails/9.jpg)
結束
3-9
Benefits from Data WarehousingBenefits from Data Warehousing
Provide business users views of data appropriate to mission
Consolidate & reconcile (consistent) data
Give macro views of critical aspects
Timely & detailed access to information
Provide specific information to particular groups
Ability to identify trends
![Page 10: Chapter 3 Database Support in Data Mining Types of database systems How relate to data mining.](https://reader036.fdocuments.us/reader036/viewer/2022062308/56649eda5503460f94be9bbb/html5/thumbnails/10.jpg)
結束
3-10
Data warehousingData warehousing
Within data warehouses, data is classified and organized around subjects meaningful to the company.The data is gathered from operational systems:Barcode readers at cash registers,Information from e-commerce,Daily reports…Industry volumesEconomic data..
Data from different sources (shipping, marketing, billing) are integrated into a common format.
![Page 11: Chapter 3 Database Support in Data Mining Types of database systems How relate to data mining.](https://reader036.fdocuments.us/reader036/viewer/2022062308/56649eda5503460f94be9bbb/html5/thumbnails/11.jpg)
結束
3-11
Data TransformationData Transformation
Consolidate data from multiple sources
Filter to eliminate unnecessary details
Clean dataeliminate incorrect entrieseliminate duplications
Convert & translate data into proper format
Aggregate data as designed
![Page 12: Chapter 3 Database Support in Data Mining Types of database systems How relate to data mining.](https://reader036.fdocuments.us/reader036/viewer/2022062308/56649eda5503460f94be9bbb/html5/thumbnails/12.jpg)
結束
3-12
Data warehousingData warehousing
A data warehouse is a central aggregation of data, intended as a permanent storage facility with normalized, formatted.
Normalized implies the use of small, stable data structure within the database. Normalized data would group data elements by category, making it possible to apply relational principles in data updating.
![Page 13: Chapter 3 Database Support in Data Mining Types of database systems How relate to data mining.](https://reader036.fdocuments.us/reader036/viewer/2022062308/56649eda5503460f94be9bbb/html5/thumbnails/13.jpg)
結束
3-13
Key ConceptsKey Concepts
ScalabilityAbility to accurately cope with changing
conditions (especially magnitude of computing)
GranularityLevel of detail
Data warehouse – tends to be fine granularityOLAP – tends to aggregate to coarse granularity
![Page 14: Chapter 3 Database Support in Data Mining Types of database systems How relate to data mining.](https://reader036.fdocuments.us/reader036/viewer/2022062308/56649eda5503460f94be9bbb/html5/thumbnails/14.jpg)
結束
3-14
Data WarehousingData Warehousing
OLAP On-Line Transactional Processing
summary data detailed operational data
few users many concurrent users
data driven transaction driven
effectiveness efficiency
use spreadsheets to access
![Page 15: Chapter 3 Database Support in Data Mining Types of database systems How relate to data mining.](https://reader036.fdocuments.us/reader036/viewer/2022062308/56649eda5503460f94be9bbb/html5/thumbnails/15.jpg)
結束
3-15
Data MartsData Marts
Intermediate-level database system
Originally, many data marts were marketed as preliminary data warehouses. Currently, many data marts are used in conjunction with data warehouses rather than as competitive products.
Data marts are usually used as repositories of data gathered to serve a particular set of users, providing data extracted from data warehouses and/or other sources.
Often used as temporary storageGather data for study from data warehouse, other sources
(including external)Clean & transform for data mining
![Page 16: Chapter 3 Database Support in Data Mining Types of database systems How relate to data mining.](https://reader036.fdocuments.us/reader036/viewer/2022062308/56649eda5503460f94be9bbb/html5/thumbnails/16.jpg)
結束
3-16
OLAPOLAP
Multidimensional spreadsheet approach to shared data storage designed to allow users to extract data and generate report on the dimensions important to them.Data is segregated into different dimensions and organized in a hierarchical manner.Hypercube – term to reflect ability to sort on many dimensional formsMany forms MOLAP – multidimensional ROLAP – relational (uses SQL) DOLAP – desktop WOLAP – web enabled HOLAP - hybrid
![Page 17: Chapter 3 Database Support in Data Mining Types of database systems How relate to data mining.](https://reader036.fdocuments.us/reader036/viewer/2022062308/56649eda5503460f94be9bbb/html5/thumbnails/17.jpg)
結束
3-17
OLAPOLAP
One function of OLAP is standard report generation, including financial performance analysis on selected dimensions (such as by department, geographical region, product, salesperson, time…).
Supporting the planning and forecasting projects using spreadsheet analytic tools.
An OLAP product including a data warehouse, an OLAP server, and a client server on a local area network (LAN).
OLAP functions – see page. 37
![Page 18: Chapter 3 Database Support in Data Mining Types of database systems How relate to data mining.](https://reader036.fdocuments.us/reader036/viewer/2022062308/56649eda5503460f94be9bbb/html5/thumbnails/18.jpg)
結束
3-18
Relationships of database and DMRelationships of database and DM
Data warehouses are not required for data mining, nor are OLAP system.
However, the existence of either presents many opportunities to data mining.
![Page 19: Chapter 3 Database Support in Data Mining Types of database systems How relate to data mining.](https://reader036.fdocuments.us/reader036/viewer/2022062308/56649eda5503460f94be9bbb/html5/thumbnails/19.jpg)
結束
3-19
Data Warehouse ImplementationData Warehouse Implementation
Data warehouses create the opportunity to provide much better information than what was available in the past. DW can produce consistent views of events and reports.
DW provides Reliable, comprehensive source of clean dataAccurate, complete, in correct format
ProcessesSystem developmentData acquisitionData extraction for use
![Page 20: Chapter 3 Database Support in Data Mining Types of database systems How relate to data mining.](https://reader036.fdocuments.us/reader036/viewer/2022062308/56649eda5503460f94be9bbb/html5/thumbnails/20.jpg)
結束
3-20
Data Warehouse ImplementationData Warehouse Implementation
Implementing processes involve a degree of continuity since data warehousing is a dynamic environment.
To have a suite of software tools to extract data from sources and move it to the data warehouse itself and provide user access to this information.
Data acquisition is supported data warehouse generation.
![Page 21: Chapter 3 Database Support in Data Mining Types of database systems How relate to data mining.](https://reader036.fdocuments.us/reader036/viewer/2022062308/56649eda5503460f94be9bbb/html5/thumbnails/21.jpg)
結束
3-21
Data Warehouse GenerationData Warehouse Generation
Extract data from sources
Transform
Clean
Load into data warehouse60-80% of effort in operating data warehouse
![Page 22: Chapter 3 Database Support in Data Mining Types of database systems How relate to data mining.](https://reader036.fdocuments.us/reader036/viewer/2022062308/56649eda5503460f94be9bbb/html5/thumbnails/22.jpg)
結束
3-22
Data Extraction RoutinesData Extraction Routines
Extraction programs are executed periodically to obtain records, and copy the information to an intermediate file.
Data extraction routines:Interpret data formatsIdentify changed recordsCopy information to intermediate file
![Page 23: Chapter 3 Database Support in Data Mining Types of database systems How relate to data mining.](https://reader036.fdocuments.us/reader036/viewer/2022062308/56649eda5503460f94be9bbb/html5/thumbnails/23.jpg)
結束
3-23
Data TransformationData Transformation
Transformation programs accomplish final data preparation, including:The consolidation of data from multiple sourcesFiltering data to eliminate unnecessary detailsCleaning data eliminate incorrect entries of duplicationsConverting and translating data into the format
established for the data warehouseThe aggregation of data
![Page 24: Chapter 3 Database Support in Data Mining Types of database systems How relate to data mining.](https://reader036.fdocuments.us/reader036/viewer/2022062308/56649eda5503460f94be9bbb/html5/thumbnails/24.jpg)
結束
3-24
Data ManagementData Management
Data Management involve in:Retrieve information from data warehouseRun extraction programs to generate
repetitive reports and serve specific needsImplementation Problems:
Required data not availableInitial data warehouse scope too broadNot enough time to do prototyping, or needs
analysisInsufficient senior direction
![Page 25: Chapter 3 Database Support in Data Mining Types of database systems How relate to data mining.](https://reader036.fdocuments.us/reader036/viewer/2022062308/56649eda5503460f94be9bbb/html5/thumbnails/25.jpg)
結束
3-25
Meta DataMeta Data
Data warehouse management vs. data management:Data management concerns the management of all of the
enterprise’s data.Data warehouse management refers to the designs and
operation of the data warehouse through all phases of its life cycle.
Manage meta data Design data warehouse Ensure data quality Manage system during operations
![Page 26: Chapter 3 Database Support in Data Mining Types of database systems How relate to data mining.](https://reader036.fdocuments.us/reader036/viewer/2022062308/56649eda5503460f94be9bbb/html5/thumbnails/26.jpg)
結束
3-26
Meta DataMeta Data
Metadata is the set of reference (Data) to keep track of data, and is used to describe the organization of the warehouse.
A data catalog provides users with the ability to see specifically what the data warehouse contains.
The content of the data warehouse is defined by metadata, which provides business views of data (information access tools) and technical views (warehouse generation tools).
![Page 27: Chapter 3 Database Support in Data Mining Types of database systems How relate to data mining.](https://reader036.fdocuments.us/reader036/viewer/2022062308/56649eda5503460f94be9bbb/html5/thumbnails/27.jpg)
結束
3-27
Business MetadataBusiness Metadata
What data are available
Source of each data element
Frequency of data updates
Location of specific data
Predefined reports & queries
Methods of data access
![Page 28: Chapter 3 Database Support in Data Mining Types of database systems How relate to data mining.](https://reader036.fdocuments.us/reader036/viewer/2022062308/56649eda5503460f94be9bbb/html5/thumbnails/28.jpg)
結束
3-28
Technical Meta DataTechnical Meta Data
Data source(internal or external)
Data preparation features (transformation & aggregation rules)
Logical structure of dataPhysical structure & contentData ownershipSecurity aspects (access rights, restrictions)
System information (date of last update, retention policy, data usage)
![Page 29: Chapter 3 Database Support in Data Mining Types of database systems How relate to data mining.](https://reader036.fdocuments.us/reader036/viewer/2022062308/56649eda5503460f94be9bbb/html5/thumbnails/29.jpg)
結束
3-29
Wal-Mart’s Data WarehouseWal-Mart’s Data Warehouse
Heavy user of IT
Core competency – supply chain distribution2900 outletsData warehouse of 101 terabytes ($4 billion)65 million transactions per weekSubject-oriented, integrated, time-variant, nonvolatile
data65 weeks of data by item, store, day
![Page 30: Chapter 3 Database Support in Data Mining Types of database systems How relate to data mining.](https://reader036.fdocuments.us/reader036/viewer/2022062308/56649eda5503460f94be9bbb/html5/thumbnails/30.jpg)
結束
3-30
Wal-MartWal-Mart
Use data warehouse to:Support decision makingBuyers, merchandisers, logistics, forecasters3,500 vendor partners can queryCan handle 35 thousand queries per week
Benefit $12,000 per querySome users about 1 thousand queries per day
![Page 31: Chapter 3 Database Support in Data Mining Types of database systems How relate to data mining.](https://reader036.fdocuments.us/reader036/viewer/2022062308/56649eda5503460f94be9bbb/html5/thumbnails/31.jpg)
結束
3-31
Summers Rubber CompanySummers Rubber Company
Distribution firm7 operating locations10,000 items3,000 customers
Old system:OLAPDatabases transactional & summarized,
distributed
![Page 32: Chapter 3 Database Support in Data Mining Types of database systems How relate to data mining.](https://reader036.fdocuments.us/reader036/viewer/2022062308/56649eda5503460f94be9bbb/html5/thumbnails/32.jpg)
結束
3-32
Summers Data Storage SystemSummers Data Storage System
Built in-house, PCs, Access database
Visual Basic & Excel
Distributed systemData warehouse server controlled queries, managed
resources
SecurityPasswords gave some protectionTo protect from leaving employees, used data marts
with small versions of central database
![Page 33: Chapter 3 Database Support in Data Mining Types of database systems How relate to data mining.](https://reader036.fdocuments.us/reader036/viewer/2022062308/56649eda5503460f94be9bbb/html5/thumbnails/33.jpg)
結束
3-33
Summers – Negative featuresSummers – Negative features
Too much disk space on user local drives
Often difficult to understand & use
Updating multiple data sites slow, limited access
Summary data often wrong
Couldn’t use data mining toolsProblem was aggregated data stored
![Page 34: Chapter 3 Database Support in Data Mining Types of database systems How relate to data mining.](https://reader036.fdocuments.us/reader036/viewer/2022062308/56649eda5503460f94be9bbb/html5/thumbnails/34.jpg)
結束
3-34
ComparisonComparison
Product Use Duration Granularity
Warehouse Repository Permanent Finest
MartSpecific study
Temporary Aggregate
OLAPReport & analysis
Repetitive Summary
![Page 35: Chapter 3 Database Support in Data Mining Types of database systems How relate to data mining.](https://reader036.fdocuments.us/reader036/viewer/2022062308/56649eda5503460f94be9bbb/html5/thumbnails/35.jpg)
結束
3-35
Examples of Data UsesExamples of Data Uses
Customer information systems
Fingerhut
![Page 36: Chapter 3 Database Support in Data Mining Types of database systems How relate to data mining.](https://reader036.fdocuments.us/reader036/viewer/2022062308/56649eda5503460f94be9bbb/html5/thumbnails/36.jpg)
結束
3-36
Customer Information SystemsCustomer Information Systems
Massive databases
Detailed information about individuals and households
Use automated analysisidentify focused market target
![Page 37: Chapter 3 Database Support in Data Mining Types of database systems How relate to data mining.](https://reader036.fdocuments.us/reader036/viewer/2022062308/56649eda5503460f94be9bbb/html5/thumbnails/37.jpg)
結束
3-37
MicromarketingMicromarketing
Target small groups of highly responsive customers
Own niches like smaller competitors
EXAMPLES:Great Atlantic & Pacific Tea Company (A&P)
target customers, centralize buyingFingerhut
sell on credit to households <$25,000 income
![Page 38: Chapter 3 Database Support in Data Mining Types of database systems How relate to data mining.](https://reader036.fdocuments.us/reader036/viewer/2022062308/56649eda5503460f94be9bbb/html5/thumbnails/38.jpg)
結束
3-38
System demonstrationsSystem demonstrations
A dealer wholesaler.
A small portion for the first 10 shipments (Table. 3.1).
Data warehouse are normalized into relational form. The data is organized into a series of tables connected by keys.
Revenue
![Page 39: Chapter 3 Database Support in Data Mining Types of database systems How relate to data mining.](https://reader036.fdocuments.us/reader036/viewer/2022062308/56649eda5503460f94be9bbb/html5/thumbnails/39.jpg)
結束
3-39
Data martData mart
Examining the characteristics of customers who buy the products. (Advertising by mail, internet, …)Data marts could extract the data and aggregate it in a form useful for data mining.Table 3.2 shows entries that might be found in a data mart. (on product D428 in two-year interval)
![Page 40: Chapter 3 Database Support in Data Mining Types of database systems How relate to data mining.](https://reader036.fdocuments.us/reader036/viewer/2022062308/56649eda5503460f94be9bbb/html5/thumbnails/40.jpg)
結束
3-40
OLAPOLAP
An OLAP application focuses more on analyzing trends or other aspects of organizational operations. It may obtain much of its information from the data warehouse, but extracts granular information.
This information could be accessed to make a report by product category. Table. 3.3.
positive
![Page 41: Chapter 3 Database Support in Data Mining Types of database systems How relate to data mining.](https://reader036.fdocuments.us/reader036/viewer/2022062308/56649eda5503460f94be9bbb/html5/thumbnails/41.jpg)
結束
3-41
OLAPOLAP
Evaluating the value of each client to the firm.
Data can be aggregated within data mart, or on an OLAP system.
![Page 42: Chapter 3 Database Support in Data Mining Types of database systems How relate to data mining.](https://reader036.fdocuments.us/reader036/viewer/2022062308/56649eda5503460f94be9bbb/html5/thumbnails/42.jpg)
結束
3-42
OLAPOLAP
Organizing volume according to the shipper.
Table 3.5 displays the results of cases by shipper for each shipper.
![Page 43: Chapter 3 Database Support in Data Mining Types of database systems How relate to data mining.](https://reader036.fdocuments.us/reader036/viewer/2022062308/56649eda5503460f94be9bbb/html5/thumbnails/43.jpg)
結束
3-43
Data QualityData Quality
Data warehouse projects can fail, one of the most common reason is the refusal (reject) of users to accept the validity of data obtained from a data warehouse. Because: The corruption of data or missing data from the original sources. Failure of the software transferring data into or out of the data
warehouse. Failure of the data-cleansing process to resolve data inconsistence.
The responsible staff must verify the integrity of data, ensuring the data loading and storing process.
Data Integrity: Do not allow any meaningless, corrupt, or redundant data into the data warehouse.
Controls can be implemented prior to loading data, in the data migration, cleansing, transforming, and loading processes.
![Page 44: Chapter 3 Database Support in Data Mining Types of database systems How relate to data mining.](https://reader036.fdocuments.us/reader036/viewer/2022062308/56649eda5503460f94be9bbb/html5/thumbnails/44.jpg)
結束
3-44
Data QualityData Quality
An example of multiple variations, as illustrated in Table. 3.6.
What are the variations?1. Variations of the same customer
2. Misspell
3. Corrected spell but with a more complete definition
![Page 45: Chapter 3 Database Support in Data Mining Types of database systems How relate to data mining.](https://reader036.fdocuments.us/reader036/viewer/2022062308/56649eda5503460f94be9bbb/html5/thumbnails/45.jpg)
結束
3-45
Data QualityData Quality
Matching involves associating variables.Software used to introduce new data into the data warehouse needs to check that the appropriate spelling and entry values are used. Also, matching companies with addresses… and some maintenance.Software tools to ensure data quality, including:The analysis of data for typeThe construction of standardization schemesThe identification of redundant dataThe adjustment of matching criteria to achieve selected
levels of discriminationThe transformation of data into designed format
![Page 46: Chapter 3 Database Support in Data Mining Types of database systems How relate to data mining.](https://reader036.fdocuments.us/reader036/viewer/2022062308/56649eda5503460f94be9bbb/html5/thumbnails/46.jpg)
結束
3-46
Software productsSoftware products