Data mining

13
By : Birju Tank (141060753017) Introduction to Data Mining GTU PG School, BISAG, GANDHINAGAR

Transcript of Data mining

By :Birju Tank (141060753017)

Introduction to Data Mining

GTU PG School, BISAG, GANDHINAGAR

2

• Data mining is also called knowledge discovery and data mining (KDD)

• Data mining is

– extraction of useful patterns from data sources, e.g., databases, texts, web, image.

• Other Definitions

– Non-trivial extraction of implicit, previously unknown and potentially useful information from data

– Exploration & analysis, by automatic or semi-automatic means, of large quantities of data in order to discover meaningful patterns

What is Data Mining?

3

4

• 80% of customers who buy cheese and milk also buy bread, and 5% of customers buy all of them together

• Cheese, Milk Bread [sup =5%, confid=80%]

Example

5

What is (not) Data Mining?

• What is not Data Mining?

• Look up phone number in phone directory

• Query a Web search engine for information about “Amazon”

• What is Data Mining?

• Certain names are more prevalent in certain locations (O’Brien, O’Rurke, O’Reilly… in Boston area)

• Group together similar documents returned by search engine according to their context (e.g. Amazon rainforest, Amazon.com,)

7

Area DBMS OLAP Data Mining

TaskExtraction of detailed and summary data

Summaries, trends and forecasts

Knowledge discovery of hidden patterns and insights

Type of result Information Analysis Insight and Prediction

MethodDeduction (Ask the question, verify with data)

Multidimensional data modeling, Aggregation, Statistics

Induction (Build the model, apply it to new data, get the result)

Example question

Who purchased mutual funds in the last 3 years?

What is the average income of mutual fund buyers by region by year?

Who will buy a mutual fund in the next 6 months and 

8

• Classification :

– mining patterns that can classify future data into known classes.

• Clustering :

– identifying a set of similarity groups in the data

• Prediction Methods :

– Use some variables to predict unknown or future values of other variables.

Data Mining Tasks

9

• Association rule mining

– mining any rule of the form X Y, where X and Y are sets of data items.

• Deviation detection :

– discovering the most significant changes in data.

• Data visualization:

– using graphical methods to show patterns in data.

Data Mining Tasks (Cont..)

10

• Rapid computerization of businesses produce huge amount of data

• To make best use of data

• Knowledge discovered from data can be used for competitive advantage

• There is a big gap from stored data to knowledge; and the transition won’t occur automatically.

• Many interesting things you want to find cannot be found using database queries

“find people likely to buy my products”

“Who are likely to respond to my promotion”

Why Data Mining is Necessary?

11

• Marketing, customer profiling and retention, identifying potential customers, market segmentation.

• Fraud detection

– Ex. identifying credit card fraud, intrusion detection

• Scientific data analysis

• Text and web mining

• Any application that involves a large amount of data

Applications

12

• Your data is full of undiscovered gems; start digging!

Conclusion

13

1. “Research on data mining models for the internet of things”, Shen Bin; Liu Yuan; Wang Xiaoyi Image Analysis and Signal Processing (IASP), 2010 International Conference on DOI: 10.1109/IASP.2010.5476146 Publication Year: 2010.

2. “Data mining and ware housing” Bora, S.P. Electronics Computer Technology (ICECT), 2011 3rd International Conference on Volume: 1 DOI: 10.1109/ICECTECH.2011.5941548 Publication Year: 2011.

3. “A study on classification techniques in data mining” Kesavaraj, G. Sukumaran, S. Computing, Communications and Networking Technologies (ICCCNT),2013 Fourth International Conference on DOI: 10.1109/ICCCNT.2013.6726842 Publication Year: 2013

References

14