April 21, 2023 1Data Mining
27/Sep/2008
Evolution of Database Evolution of Database technologytechnology
YEAR PURPOSE1960’s Network Model, Batch Reports
1970’s Relational data model, Executive information Systems
1980’s Application specific DBMS(spatial data, scientific data, image data, …)
1990’s Terabyte Data warehouses, Object Oriented, middleware and web technology
2000’s Business Process
2010’s Sensor DB systems, DBs on embedded systems, large scale pub/ sub systems
April 21, 2023 2Data Mining
April 21, 2023Data Mining 3
Data explosion problem
◦ Automated data collection tools and mature database
technology lead to tremendous amounts of data stored in
databases, data warehouses and other information
repositories
We are drowning in data, but starving for knowledge!
Solution: Data warehousing and data mining
◦ Extraction of interesting knowledge (rules, regularities,
patterns, constraints) from data in large databases
Motivation : Necessity is Motivation : Necessity is the mother of inventionthe mother of invention
Why Data Mining?Why Data Mining?
Data, Data, Data Every where …
I can’t find data I need – data is scattered over network
I can’t get the data I need
I can’t understand the data I need
I can’t use the data I found
April 21, 2023 4Data Mining
An abundance of data Super Market Scanners, POS
data Credit cards transactions Call Center records ATM Machines Demographic data Sensor Networks Cameras Web server logs Customer web site trails Geographic Information
System National Medical Records Weather Images
This data occupies
Terabytes - 10^12 bytes
Petabytes - 10^15 bytes
Exabytes - 10^18bytes
Zettabytes - 10^21bytes
Zottabytes -10^24bytes
Walmart - 24 Terabytes
April 21, 2023 5Data Mining
Process of sorting through large amounts of data and picking out relevant information
Process of analyzing data from different perspectives and summarizing it into useful information
Discovering hidden value in database
It is non-trivial process of identifying valid, novel, useful and understandable patterns in data
Extracting or mining knowledge from large amounts of data
April 21, 2023 6Data Mining
April 21, 2023Data Mining 7
History Notes – Many Names of History Notes – Many Names of Data MiningData Mining
YEAR Names USES
1960 Data Fishing, Data Dredging
Statisticians
1990 Data Mining DB Community, business
1989 Knowledge Discovery in databases
AI, Machine Learning community
Other Names
Data Archaeology, Information Harvesting, Information Discovery, Knowledge Extraction,
Data Warehousing provides the Enterprise with a
memory
Data Mining provides the Enterprise with intelligence
April 21, 2023 8Data Mining
Why Data Mining?(Cont..)
April 21, 2023 9Data Mining
Data Warehouse is single, complete and consistent store of data from variety of different sources available to end users
For example, AT and T handles billions of calls per day. Europe's Very Long Baseline Interferometer (VLBI) has 16 telescopes, each of which produces 1 Gigabit/second of astronomical data over a 25-day observation session
We need data mining for Transforming data into useful information to users Present data in useful format Provide data access to business analyst, Information
technology professionals
April 21, 2023Data Mining 10
Data Mining is the technique used to carry out KDD.
Data Mining turns data into information and then to knowledge
Data Mining Process
Information
Data
Knowledge
1.Data cleaning To remove noise and inconsistent data
2. Data integrationTo integrate (compile) multiple data
sources3. Data selection
Data relevant to analysis is selected4. Data transformation
Summary normalization aggregation operations are performed (convert data into two dimension form) and consolidate the data
Steps in Data Mining
April 21, 2023 11Data Mining
5. Data miningIntelligent methods are applied to the data to discover knowledge or patterns
6. Pattern evaluationEvaluation of the interesting patterns by thresholding
7. Knowledge DiscoveryVisualization and presentation methods are used to present the mined knowledge to the user.
April 21, 2023Data Mining 12
Steps in Data Mining(Cont..)
◦ Data mining: the core of knowledge discovery process.
Data Cleaning
Data Integration
Databases
Data Warehouse
Task-relevant Data
Selection
Data Mining
Pattern Evaluation
April 21, 2023 13Data Mining
1. Classification• Classification maps data into predefined groups or classes.• It may be represented by methods such as decision trees,
etc.
Decision tree Flow chart like tree structure Each node denotes test of
an attribute value Each branch represents
outcome of test Leaves represent classes
or class distribution.
April 21, 2023Data Mining 14
Data Mining Tasks
2. RegressionUsed to map a data item to a real valued
prediction variable.
Example. A manager wants to reach a certain level of savings before his retirement. Periodically he predicts his retirement savings by current value and several past values. He uses a simple linear regressive formula to predict the values of savings in future.
3. Prediction
Many real world applications can be seen
predicting future data states based on
past and current data.Example - Predicting flooding is difficult problem
April 21, 2023Data Mining 15
4. ClusteringClustering is similar to classification
except that the groups are not predefined.
5. Association RuleAssociation refers to uncovering relationship among data. Used in retail sales community to identify the items (products) that are frequently purchased together.
April 21, 2023Data Mining 16
1998
Zzzz...
Bread and Jam sell
together!
6. SummarizationSummarization of general characteristics or features of
target class of data. Data characterization presented in various forms - pie charts,
bar charts, curves.Data discrimination comparison of general features of target
class of data objects with general features of objects from one or a set of contrasting classes.
7. Outlier Analysis Database may contain data objects that do not comply with
general behavior model of data. These data objects are called as outliers.
Data mining methods discard outliers as noise or exceptions. In applications such as fraud detection, rare events may be more
interesting than regularly occurring events.
April 21, 2023Data Mining 17
Relational data and transactional data
Text
Images, video
Mixtures of data
Data Mining: Types of Data
April 21, 2023 18Data Mining
19
DataMind -- neurOagent Information Discovery -- IDIS SAS Institute -- SAS/Neuronets
Data Mining Products
April 21, 2023Data Mining
RapidMiner and Weka – Defining data mining process
Top 8 data mining software in 2008
1. Angoss software2. Infor CRM Epiphany3. Portrait Software4. SAS5. SPSS6. ThinkAnalytics7. Unica8. Viscovery
April 21, 2023 20Data Mining
Data Mining Software
Industry ApplicationFinance Credit Card AnalysisInsurance Fraud Analysis
Telecommunication Call record analysis
Application Areas
April 21, 2023 21Data Mining
Data Mining 22
Financial Industry, Banks, Businesses, E-commerce◦ Stock and investment analysis◦ Identify loyal customers and risky customer◦ Predict customer spending
Database analysis and decision support◦ Market analysis and management
target marketing, customer relation management, market basket analysis.
◦ Risk analysis and management Forecasting, quality control, competitive analysis
◦ Fraud detection and management
Applications
April 21, 2023
1. Intelligent Miner
It is IBM data mining product
Distinct feature is include scalability of its mining algorithm and tight integration with IBM DB2 related data base system.
2. DB Miner
Developed by DBMiner Technologies Inc.
Distinct features of DBMiner are Data cube based Online Analytical Mining
Data Mining in Usage
April 21, 2023 23Data Mining
April 21, 2023Data Mining 24
India
Product
Sales Channel
Regio
ns
Retail Direct Special
Household
Telecomm
Video
AudioFar East
Europe
The Telecomm Slice
April 21, 2023Data Mining 25
Data mining: discovering interesting patterns from large amounts of data
A KDD process includes data cleaning, data integration, data selection, transformation, data mining, pattern evaluation, and knowledge presentation
Mining can be performed in a variety of information repositories
Data mining functionalities: characterization, discrimination, association, classification, clustering, outlier etc
Conclusion
April 21, 2023Data Mining 26
Thank you !!!Thank you !!!
Top Related