Datamining Intro IEP 2
-
Upload
aslam2001in -
Category
Documents
-
view
214 -
download
0
Transcript of Datamining Intro IEP 2
-
7/28/2019 Datamining Intro IEP 2
1/16
HITKARINI COLLAGE OF
ENGGNERING ANDTECHNOLOGY
-
7/28/2019 Datamining Intro IEP 2
2/16
Data mining
Process of semi-automatically analyzinglarge databases to find patterns that are:
valid: hold on new data with some certainity
novel: non-obvious to the system
useful: should be possible to act on the item
understandable: humans should be able tointerpret the pattern
Also known as Knowledge Discovery in
Databases (KDD)
-
7/28/2019 Datamining Intro IEP 2
3/16
Applications
Banking: loan/credit card approvalpredict good customers based on old customers
Customer relationship management:
identify those who are likely to leave for a competitor.Targeted marketing:identify likely responders to promotions
Manufacturing and production:automatically adjust knobs when process parameter changes
-
7/28/2019 Datamining Intro IEP 2
4/16
Applications
Medicine: disease outcome, effectiveness oftreatments
analyze patient disease history: find relationshipbetween diseases
Scientific data analysis:
identify new galaxies by searching for sub clusters
Web site/store design and promotion:find affinity of visitor to pages and modify layout
-
7/28/2019 Datamining Intro IEP 2
5/16
The KDD process
Data collectionsubset data: sampling might hurt if highly skewed data
feature selection: principal component analysis, heuristic
search
Pre-processing: cleaningname/address cleaning, different meanings (annual, yearly),
duplicate removal, supplying missing values
Transformation:map complex objects e.g. time series data to features e.g.
frequency
Choosing mining task and mining method:
Result evaluation and Visualization:
-
7/28/2019 Datamining Intro IEP 2
6/16
Some basic operations
Predictive:
Regression
Collaborative FilteringDescriptive:
Clustering / similarity matching
Association rules and variantsDeviation detection
-
7/28/2019 Datamining Intro IEP 2
7/16
Clustering
Unsupervised learning when old data with classlabels not available e.g. when introducing a new
product.Group/cluster existing customers based on time
series of payment history such that similarcustomers in same cluster.
Key requirement: Need a good measure ofsimilarity between instances.
Identify micro-markets and develop policies for
each
-
7/28/2019 Datamining Intro IEP 2
8/16
Clustering methods
Hierarchical clustering
agglomerative Vs divisive
single link Vs complete link
Partitional clustering
distance-based: K-means
model-based: EMdensity-based:
-
7/28/2019 Datamining Intro IEP 2
9/16
Variants
High confidence may not imply highcorrelation
Use correlations. Find expected supportand large departures from thatinteresting..
see statistical literature on contingency tables.
Still too many rules, need to prune...
-
7/28/2019 Datamining Intro IEP 2
10/16
Data Mining in Practice
-
7/28/2019 Datamining Intro IEP 2
11/16
Application Areas
Industry Application
Finance Credit Card Analysis
Insurance Claims, Fraud AnalysisTelecommunication Call record analysis
Transport Logistics management
Consumer goods promotion analysis
Data Service providers Value added data
Utilities Power usage analysis
-
7/28/2019 Datamining Intro IEP 2
12/16
Data Mining works with
Warehouse Data
Data Warehousing provides theEnterprise with a memory
Data Mining provides theEnterprise with intelligence
-
7/28/2019 Datamining Intro IEP 2
13/16
Mining market
Around 20 to 30 mining tool vendors
Major tool players:Clementine,
IBMs Intelligent Miner,
SGIs MineSet,
SASs Enterprise Miner.
All pretty much the same set of toolsMany embedded products:fraud detection:
electronic commerce applications,
health care,
customer relationship management: Epiphany
-
7/28/2019 Datamining Intro IEP 2
14/16
OLAP Mining integration
OLAP (On Line Analytical Processing)
Fast interactive exploration of multidim.aggregates.
Heavy reliance on manual operations foranalysis:
Tedious and error-prone on large
multidimensional dataIdeal platform for vertical integration of mining
but needs to be interactive instead of batch.
-
7/28/2019 Datamining Intro IEP 2
15/16
-
7/28/2019 Datamining Intro IEP 2
16/16
THANK YOU