Artificial Intelligence: Data Mining
-
Upload
the-integral-worm -
Category
Technology
-
view
269 -
download
4
description
Transcript of Artificial Intelligence: Data Mining
Data Mining• Motivation
• Synonym
• Process of DM
• Operation of DM
• DM techniques
• Business Application
• Application Selection
• Current Issues
Motivations for Data Mining• Raw data rarely generates direct
benefits
• Its real value is realized when we extract information and knowledge useful
• Some queries are difficult to generate with SQL– Which records indicate fraud? – Which customers are likely to buy product
A?
Motivations for DM
• Only 5%-10% of the collected data has been ever analyzed to support the decision-making process
• The amount of the data collected in an organization continues to increase, while our ability to analyze that data has not kept up proportionately
Data Mining (DM)• A technique which extracts knowledge
from massive data
• It is also known as Knowledge Discovery and Databases (KDD) – KDD is defined as the overall process
necessary to discover knowledge, while DM is one particular activity which applies a specific algorithm to extract knowledge
– However, these two terms are often used interchangeably
Process of Data Mining
• Extracting knowledge from databases is a five-step process
• The five-step process of knowledge discovery is an interactive, iterative process through which discovery is evolved
Process of Data Mining
• Selecting Application Domain
• Selecting Target Data
• Preprocessing Data
• Extracting Information/Knowledge
• Interpretation and Evaluation
Operations of DM
• Classification
• Regression
• Link Analysis
• Segmentation
• Detecting Deviations
DM Techniques
• Machine Learning– Induction– Conceptual Clustering
• ANN
• Statistical Techniques
• Example-based Methods
Business Application
• Marketing– Market Segmentation– Market Basket Analysis– Trend Analysis– Sales Prediction
• Finance– Bankruptcy prediction– Credit approval
Business Application
• Finance– Bond rate prediction– Mutual fund selection
• Insurance– Fraud Detection
Application Selection
• Non-Technical Criteria– Potential benefits and payoffs– Management support– Domain expert– End user interest and involvement– Potential for privacy/legal issues
Application Selection
• Technical Criteria– Sufficient amount of data– High quality data– Prior Knowledge
Current Issues
• Integration– DM with OLAP
• Limited power of commercial DM Tools
• Data quality problem
• Multimedia data: video, audio, images, etc.
• Scaling-up problem