Data Mining
-
Upload
nasim-mcintyre -
Category
Documents
-
view
23 -
download
2
description
Transcript of Data Mining
Data MiningData Mining
By : Tung, Sze Ming ( Leo )By : Tung, Sze Ming ( Leo )
CS 157BCS 157B
DefinitionDefinition
A class of database application that A class of database application that analyze analyze data in a database using tools which look fodata in a database using tools which look for trends or anomalies. r trends or anomalies.
Data mining was invented by IBM.Data mining was invented by IBM.
PurposePurpose
To look for hidden patterns or previously unknTo look for hidden patterns or previously unknown relationships among the data in a group of own relationships among the data in a group of data that can be used to predict future behavior.data that can be used to predict future behavior.
Ex: Data mining software can help retail compEx: Data mining software can help retail companies find customers with common interests.anies find customers with common interests.
Background InformationBackground Information
Many of the techniques used by today's data mMany of the techniques used by today's data mining tools have been around for many years, hining tools have been around for many years, having originated in the artificial intelligence reaving originated in the artificial intelligence research of the 1980s and early 1990s. search of the 1980s and early 1990s.
Data Mining tools are only now being applied Data Mining tools are only now being applied to large-scale database systems. to large-scale database systems.
The Need for Data MiningThe Need for Data Mining
The amount of raw data stored in corporate datThe amount of raw data stored in corporate data warehouses is growing rapidly. a warehouses is growing rapidly.
There is too much data and complexity that miThere is too much data and complexity that might be relevant to a specific problem. ght be relevant to a specific problem.
Data mining promises to bridge the analytical Data mining promises to bridge the analytical gap by giving knowledgeworkers the tools to ngap by giving knowledgeworkers the tools to navigate this complex analytical space. avigate this complex analytical space.
The Need for Data Mining, cont’The Need for Data Mining, cont’
The need for information has resulted in the prThe need for information has resulted in the proliferation of data warehouses that integrate inoliferation of data warehouses that integrate information multiple sources to support decision formation multiple sources to support decision making. making.
Often include data from external sources, such Often include data from external sources, such as customer demographics and household inforas customer demographics and household information. mation.
Approach to Data MiningApproach to Data Mining
association association sequence-based analysis sequence-based analysis clustering clustering classification classification
AssociationAssociation
Classic market-basket analysis, which treats the purchClassic market-basket analysis, which treats the purchase of a number of items (for example, the contents of ase of a number of items (for example, the contents of a shopping basket) as a single transaction. a shopping basket) as a single transaction.
This information can be used to adjust inventories, mThis information can be used to adjust inventories, modify floor or shelf layouts, or introduce targeted proodify floor or shelf layouts, or introduce targeted promotional activities to increase overall sales or move smotional activities to increase overall sales or move s
pecific products.pecific products. Example : 80 percent of all transactions in which beer Example : 80 percent of all transactions in which beer
was purchased also included potato chips.was purchased also included potato chips.
Sequence-based analysisSequence-based analysis
Traditional market-basket analysis deals with a Traditional market-basket analysis deals with a collection of items as part of a point-in-time trcollection of items as part of a point-in-time transaction. ansaction.
to identify a typical set of purchases that might to identify a typical set of purchases that might predict the subsequent purchase of a specific itpredict the subsequent purchase of a specific item. em.
ClusteringClustering
Clustering approach address segmentation probleClustering approach address segmentation problems. ms.
These approaches assign records with a large numThese approaches assign records with a large number of attributes into a relatively small set of grouber of attributes into a relatively small set of groups or "segments." ps or "segments."
Example : Buying habits of multiple population seExample : Buying habits of multiple population segments might be compared to determine which segments might be compared to determine which segments to target for a new sales campaign. gments to target for a new sales campaign.
ClassificationClassification
Most commonly applied data mining techniquMost commonly applied data mining technique e
Algorithm uses preclassified examples to deterAlgorithm uses preclassified examples to determine the set of parameters required for proper mine the set of parameters required for proper discrimination. discrimination.
Example : A classifier derived from the ClassifExample : A classifier derived from the Classification approach is capable of identifying risky ication approach is capable of identifying risky loans, could be used to aid in the decision of wloans, could be used to aid in the decision of whether to grant a loan to an individual. hether to grant a loan to an individual.
Issues of Data MiningIssues of Data Mining
Present-day tools are strong but require Present-day tools are strong but require significant expertise to implement effectively. significant expertise to implement effectively.
Issues of Data MiningIssues of Data Mining Susceptibility to "dirty" or irrelevant data.Susceptibility to "dirty" or irrelevant data. Inability to "explain" results in human terms.Inability to "explain" results in human terms.
IssuesIssues
susceptibility to "dirty" or irrelevant data susceptibility to "dirty" or irrelevant data Data mining tools of today simply take everything Data mining tools of today simply take everything
they are given as factual and draw the resulting cothey are given as factual and draw the resulting conclusions. nclusions.
Users must take the necessary precautions to ensurUsers must take the necessary precautions to ensure that the data being analyzed is "clean." e that the data being analyzed is "clean."
Issues, cont’Issues, cont’
inability to "explain" results in human terms inability to "explain" results in human terms Many of the tools employed in data mining Many of the tools employed in data mining
analysis use complex mathematical algorithms that analysis use complex mathematical algorithms that are not easily mapped into human terms.are not easily mapped into human terms.
what good does the information do if you don’t what good does the information do if you don’t understand it?understand it?
The EndThe End