Data mining

23
Submitted by II MCA, PSNACET.

Transcript of Data mining

Submitted by

II MCA,

PSNACET.

A review paper on various data mining techniques

Survey on varoius types of credit fraud and security

measures

Data mining in cloud computing

Survey paper on clustering techniques

A data mining framework for prevention and detection

of financial statement fraud

A review paper:mining educational data to forecast

failure of engineering students

Data mining model for insurance trade in CRM system

Data mining• Data mining is the exploration and analysis of large data sets, inorder to discover

meaningful pattern and rules.

• The objective of data mining is to design and work efficiently with large data sets.

• Data mining is the component of wider process called knowledge discovery from

database.

• Data mining is the process of analysing data from different perspectives and

summarizing the results as useful information

• Data mining is a multi-step process,requires accessing and preparing data for a

mining the data, data mining algorithm, analysing results and taking appropriate

action.

Why Data mining?

• Database analysis and decision support

Market analysis and management : Target marketing, customer relation

management,market basket analysis ,cross selling,market segmentation

Risk analysis and management: Forecasting,customer retention,improved under

writing,quality control,competitive analysis

Fraud detection and management

• Other applications

Text mining

Intelligent query answering

In data mining the data is mined using two learning approaches i.e.supervised

learning and unsupervised learning

supervised learning

In supervised learning (often also called directed data mining) the variables

under investigation can be split into two groups: explanatory variables and other is

dependent variable.The goal of analysis is to specify a relationship between the

dependent variable and explanatory variable the as it is done in regression analysis.

Unsupervised learning

In unsupervised learning , all the variables are treated in same way, there is no

distinction between dependent and explantory variables.

Tasks Of Data Mining Data Mining as a term for the specific classes of six activities or tasks as

follows:

Classification

Estimation

Prediction

Affinity grouping or association rules

Clustering

Description and visualization

The first three tasks- classification, estimation,and prediction rules are

examples of directed data mining or supervised learning. The next three

tasks are the examples of undirected data mining.

Classification

classification consits 0f examining the features of a newly

presented object and assigning to it a predefined class.

Estimation

Estimation deals with continuously valued outcomes.

Prediction

Any prediction can be thought of as classification or estimation.

Predictive tasks feel different because the records are classified according to

some predicted future behavior or estimated future value.

Association Rules

An association rule is a rule which implies certain association

relationships among a set of objects in a database.

Clustering

Clustering is the task of segmenting a diverse group into a number of

similar subgroup or cluster. In clustering , there are no predefined classes.

General Types of Cluster

Well separated cluster

Center-based cluster

Contiguous cluster

Density-based cluster

Shared property or conceptual cluster

Well separated cluster

A cluster is a set of point so that any point in acluster is nearest to every

other point in the cluster as compared to any other point that is not in the

cluster.

Center-based cluster

A cluster is a set of object such that an object in a cluster is nearest to the

“center” of a cluster, than to the center of any other cluster.The center of

cluster is often centroid.

Contiguous cluster

A cluster is a set of point so that a point in a cluster is nearest to one or

more other point in the cluster as compared to any point that is not in the

cluster.

Density-based cluster

A cluster is a dense region of points, which is separated by according to the

low-density regions, from other regions that is of high density.

Shared property

Find clusters that share some common property or represent a particular

concept.

Description and visualization

Data visualization is a powerful form of descriptive data mining. It is not

always easy to come up with meaning visualizations, but the right picture really

can be worth a thousand association rules since the human beings are extremely

practiced at extracting meaning from visual scenes.

Data mining: KDD process

Steps of a KDD process

•Learning the application domain

relevant prior knowledge and goals of application

•Creating a target data set: data selection

•Data cleaning and preprocessing: (may take 60% of effort!)

•Data reduction and transformation

Find useful features, dimensionality/variable reduction, invariant representation

•Choosing functions of data mining

summarization, classification, regression, association, clustering

•Choosing the mining algorithm(s)

•Data mining: search for patterns of interest

•Pattern evaluation and knowledge presentation

visualization, transformation, removing redundant patterns, etc.

•Use of discovered knowledge

Major Issues in Data Mining Mining methodology

•Mining different kinds of knowledge from diverse data types, e.g., bio, stream, Web

•Performance: efficiency, effectiveness, and scalability

•Pattern evaluation: the interestingness problem

•Incorporation of background knowledge

•Handling noise and incomplete data

•Parallel, distributed and incremental mining methods

•Integration of the discovered knowledge with existing one: knowledge fusion

Data mining in various fields

Market Analysis and Management

• Where does the data come from?—Credit card transactions, loyalty cards,

discount coupons, customer complaint calls, plus (public) lifestyle studies

• Target marketing

Find clusters of “model” customers who share the same characteristics:

interest, income level, spending habits, etc.,

Determine customer purchasing patterns over time

• Cross-market analysis—Find associations/co-relations between product sales,

& predict based on such association

• Customer profiling—What types of customers buy what products

(clustering or classification)

Market Analysis and Management (cont)

•Customer requirement analysis

Identify the best products for different customers

Predict what factors will attract new customers

• Provision of summary information

Multidimensional summary reports

Statistical summary information (data central tendency and

variation)

Corporate Analysis & Risk Management

•Finance planning and asset evaluation

cash flow analysis and prediction

contingent claim analysis to evaluate assets

cross-sectional and time series analysis (financial-ratio, trend analysis, etc.)

•Resource planning

summarize and compare the resources and spending

•Competition

monitor competitors and market directions

group customers into classes and a class-based pricing procedure

set pricing strategy in a highly competitive market

Fraud Detection & Mining Unusual Patterns

•Approaches: Clustering & model construction for frauds, outlier analysis

•Applications: Health care, retail, credit card service, telecomm.

Auto insurance: ring of collisions

Money laundering: suspicious monetary transactions

Medical insurance

Professional patients, ring of doctors, and ring of references

Unnecessary or correlated screening tests

Telecommunications: phone-call fraud

Phone call model: destination of the call, duration, time of day or

week. Analyze patterns that deviate from an expected norm

Fraud Detection & Mining Unusual Patterns(contd)

Credit card fraud

Application fraud

Fake doctored card

Lost and stolen card

Duplicate site

Intercept fraud(postal service)

Mining to forecast failure of engineering students

Using this mining what are the problems affected by engineering

students and what is the solution to solve that particular problem.

Mining in Insurance trade in CRM system

The large data stored in CRM database is increasing rapidly. Many things are

hidden in database . Using this data mining technique we can retrieve the

data about CRM relationship in insurance.

Mining in cloud computing

Advantage in cloud:

Reduced cost

Increased storage

Highly automated and high mobility

There are three types of services in cloud

Iaas(virtual machines, servers)

Paas(execution runtime,database,webserver)

Saas(email,games)

ConclusionsData mining involves useful rules or interesting patterns from huge historical

data. Many data mining tasks are available and each of them further has many

techniques. Data mining is an interdisciplinary, artificial and intelligence,

integrated database, machine learning, statistics, etc. Data mining is a large

number of incomplete, noisy, fuzzy, random application of the data found in

hidden, regularity which are noy known by people in advance, but is potentially

useful and ultimately understandable information and knowledge of non-trivial

process.

Reference

[1]V.Saurkar,Vaibhav,Bhujade(data mining techniques)

[2]Amandeep Kaur Mann,Navneet Kaur(clustering)

[3]Avinash Ingole, DR.R.C.Thool(credit card fraud)

[4]Parikshit Prasad,Rattan Lal( cloud computing)

[5]Nasib Singh Gill, Rajan gupta(financial statement fraud)

[6]Komal S.Sahedani, B.Supriya Reddy(failure of Engineering students)

[7]C.Verhoef,Bas Donkers( Insurance in CRM model)