Data Mining

15
Data Mining Data Mining By : Tung, Sze Ming ( Le By : Tung, Sze Ming ( Le o ) o ) CS 157B CS 157B

description

Data Mining. By : Tung, Sze Ming ( Leo ) CS 157B. Definition. A class of database application that analyze data in a database using tools which look for trends or anomalies. Data mining was invented by IBM. Purpose. - PowerPoint PPT Presentation

Transcript of Data Mining

Page 1: Data Mining

Data MiningData Mining

By : Tung, Sze Ming ( Leo )By : Tung, Sze Ming ( Leo )

CS 157BCS 157B

Page 2: Data Mining

DefinitionDefinition

A class of database application that A class of database application that analyze analyze data in a database using tools which look fodata in a database using tools which look for trends or anomalies. r trends or anomalies.

Data mining was invented by IBM.Data mining was invented by IBM.

Page 3: Data Mining

PurposePurpose

To look for hidden patterns or previously unknTo look for hidden patterns or previously unknown relationships among the data in a group of own relationships among the data in a group of data that can be used to predict future behavior.data that can be used to predict future behavior.

Ex: Data mining software can help retail compEx: Data mining software can help retail companies find customers with common interests.anies find customers with common interests.

Page 4: Data Mining

Background InformationBackground Information

Many of the techniques used by today's data mMany of the techniques used by today's data mining tools have been around for many years, hining tools have been around for many years, having originated in the artificial intelligence reaving originated in the artificial intelligence research of the 1980s and early 1990s. search of the 1980s and early 1990s.

Data Mining tools are only now being applied Data Mining tools are only now being applied to large-scale database systems. to large-scale database systems.

Page 5: Data Mining

The Need for Data MiningThe Need for Data Mining

The amount of raw data stored in corporate datThe amount of raw data stored in corporate data warehouses is growing rapidly. a warehouses is growing rapidly.

There is too much data and complexity that miThere is too much data and complexity that might be relevant to a specific problem. ght be relevant to a specific problem.

Data mining promises to bridge the analytical Data mining promises to bridge the analytical gap by giving knowledgeworkers the tools to ngap by giving knowledgeworkers the tools to navigate this complex analytical space. avigate this complex analytical space.

Page 6: Data Mining

The Need for Data Mining, cont’The Need for Data Mining, cont’

The need for information has resulted in the prThe need for information has resulted in the proliferation of data warehouses that integrate inoliferation of data warehouses that integrate information multiple sources to support decision formation multiple sources to support decision making. making.

Often include data from external sources, such Often include data from external sources, such as customer demographics and household inforas customer demographics and household information. mation.

Page 7: Data Mining

Approach to Data MiningApproach to Data Mining

association association sequence-based analysis sequence-based analysis clustering clustering classification classification

Page 8: Data Mining

AssociationAssociation

Classic market-basket analysis, which treats the purchClassic market-basket analysis, which treats the purchase of a number of items (for example, the contents of ase of a number of items (for example, the contents of a shopping basket) as a single transaction. a shopping basket) as a single transaction.

This information can be used to adjust inventories, mThis information can be used to adjust inventories, modify floor or shelf layouts, or introduce targeted proodify floor or shelf layouts, or introduce targeted promotional activities to increase overall sales or move smotional activities to increase overall sales or move s

pecific products.pecific products. Example : 80 percent of all transactions in which beer Example : 80 percent of all transactions in which beer

was purchased also included potato chips.was purchased also included potato chips.

Page 9: Data Mining

Sequence-based analysisSequence-based analysis

Traditional market-basket analysis deals with a Traditional market-basket analysis deals with a collection of items as part of a point-in-time trcollection of items as part of a point-in-time transaction. ansaction.

to identify a typical set of purchases that might to identify a typical set of purchases that might predict the subsequent purchase of a specific itpredict the subsequent purchase of a specific item. em.

Page 10: Data Mining

ClusteringClustering

Clustering approach address segmentation probleClustering approach address segmentation problems. ms.

These approaches assign records with a large numThese approaches assign records with a large number of attributes into a relatively small set of grouber of attributes into a relatively small set of groups or "segments." ps or "segments."

Example : Buying habits of multiple population seExample : Buying habits of multiple population segments might be compared to determine which segments might be compared to determine which segments to target for a new sales campaign. gments to target for a new sales campaign.

Page 11: Data Mining

ClassificationClassification

Most commonly applied data mining techniquMost commonly applied data mining technique e

Algorithm uses preclassified examples to deterAlgorithm uses preclassified examples to determine the set of parameters required for proper mine the set of parameters required for proper discrimination. discrimination.

Example : A classifier derived from the ClassifExample : A classifier derived from the Classification approach is capable of identifying risky ication approach is capable of identifying risky loans, could be used to aid in the decision of wloans, could be used to aid in the decision of whether to grant a loan to an individual. hether to grant a loan to an individual.

Page 12: Data Mining

Issues of Data MiningIssues of Data Mining

Present-day tools are strong but require Present-day tools are strong but require significant expertise to implement effectively. significant expertise to implement effectively.

Issues of Data MiningIssues of Data Mining Susceptibility to "dirty" or irrelevant data.Susceptibility to "dirty" or irrelevant data. Inability to "explain" results in human terms.Inability to "explain" results in human terms.

Page 13: Data Mining

IssuesIssues

susceptibility to "dirty" or irrelevant data susceptibility to "dirty" or irrelevant data Data mining tools of today simply take everything Data mining tools of today simply take everything

they are given as factual and draw the resulting cothey are given as factual and draw the resulting conclusions. nclusions.

Users must take the necessary precautions to ensurUsers must take the necessary precautions to ensure that the data being analyzed is "clean." e that the data being analyzed is "clean."

Page 14: Data Mining

Issues, cont’Issues, cont’

inability to "explain" results in human terms inability to "explain" results in human terms Many of the tools employed in data mining Many of the tools employed in data mining

analysis use complex mathematical algorithms that analysis use complex mathematical algorithms that are not easily mapped into human terms.are not easily mapped into human terms.

what good does the information do if you don’t what good does the information do if you don’t understand it?understand it?

Page 15: Data Mining

The EndThe End