`Data mining

15
Page 1 DATA MINING BY SARANYA

description

 

Transcript of `Data mining

  • 1. DATA MINING BY SARANYA Page 1

2. INTRODUCTION New buzzword, old idea. Inferring new information from alreadycollected data. Traditionally job of Data Analysts Computers have changed this.Far more efficient to comb throughdata using a machine than eyeballingstatistical data. Page 2 3. DEFINITIONData mining is the entireprocess of applying computer-based methodology, includingnew techniques for knowledgediscovery, from data. Page 3 4. Two Main ComponentsKnowledge Discovery Concrete information gleaned from known data. Data you may not have known, but which is supported by recorded facts.Knowledge PredictionUses known data to forecast future trends, events, etc. (ie: Stock market predictions) Page 4 5. Uses of Data Mining AI/Machine LearningCombinatorial/Game Data MiningGood for analyzing winning strategies to games, andthus developing intelligent AI opponents. (ie: Chess) Business StrategiesMarket Basket AnalysisIdentify customer demographics, preferences, andpurchasing patterns. Risk AnalysisProduct Defect AnalysisAnalyze product defect rates for given plants andpredict possible complications (read: lawsuits) downthe line. Page 5 6. (Continued) User Behavior ValidationFraud DetectionIn the realm of cell phonesComparing phone activity to calling records.Can help detect calls made on clonedphones.Similarly, with credit cards, comparingpurchases with historical purchases. Candetect activity with stolen cards.Page 6 7. Sources of Data for Mining Databases (most obvious) Text Documents Computer Simulations Social NetworksPage 7 8. Data Mining DevelopmentPage 8 9. Database Processing vs. Data MiningProcessing Query Query Well defined Poorly defined SQL No precise querylanguage Data Data -Operational data - Not operational data Output Output - Precise - Fuzzy - Subset of database - Not a subset ofdatabase Page 9 10. Data Mining Models and Tasks Page 10 11. Basic Data Mining Tasks Classification maps data into predefinedgroups or classes Supervised learning Pattern recognition Prediction Regression is used to map a data itemto a real valued prediction variable. Clustering groups similar data together intoclusters. Unsupervised learning Segmentation PartitioningPage 11 12. (contd) Summarization maps data into subsets withassociated simple descriptions. Characterization Generalization Link Analysis uncovers relationshipsamong data. Affinity Analysis Association Rules Sequential Analysis determines sequentialpatterns. Page 12 13. Data Mining Techniques Statistical Point Estimation Models Based on Summarization Bayes Theorem Hypothesis Testing Regression and Correlation Similarity Measures Decision Trees Neural Networks Activation Functions Genetic Algorithms Page 13 14. Challenges of Data Mining q Scalability q Dimensionality q Complex and Heterogeneous Data q Data Quality q Data Ownership and Distribution q Privacy Preservation q Streaming DataPage 14 15. THANKU Page 15