1 1 Slide Introduction to Data Mining and Business Intelligence.

20
1 Introduction to Data Mining and Business Intelligence

Transcript of 1 1 Slide Introduction to Data Mining and Business Intelligence.

Page 1: 1 1 Slide Introduction to Data Mining and Business Intelligence.

1 1 Slide

Slide

Introduction to Data Mining and Business Intelligence

Page 2: 1 1 Slide Introduction to Data Mining and Business Intelligence.

2 2 Slide

Slide

Why Mine Data? Commercial Viewpoint

Lots of data is being collected and warehoused • Web data, e-commerce• purchases at department/

grocery stores• Bank/Credit Card

transactions

Computers have become cheaper and more powerful Competitive Pressure is Strong

• Provide better, customized services for an edge (e.g. in Customer Relationship Management)

Page 3: 1 1 Slide Introduction to Data Mining and Business Intelligence.

3 3 Slide

Slide

Why Mine Data? Scientific Viewpoint

Data collected and stored at enormous speeds (GB/hour)

• remote sensors on a satellite

• telescopes scanning the skies

• microarrays generating gene expression data

• scientific simulations generating terabytes of data

Traditional techniques infeasible for raw data Data mining may help scientists

• in classifying and segmenting data• in Hypothesis Formation

Page 4: 1 1 Slide Introduction to Data Mining and Business Intelligence.

4 4 Slide

Slide

Mining Large Data Sets - Motivation

There is often information “hidden” in the data that is not readily evident

Human analysts may take weeks to discover useful information

Much of the data is never analyzed at all

0

500,000

1,000,000

1,500,000

2,000,000

2,500,000

3,000,000

3,500,000

4,000,000

1995 1996 1997 1998 1999

The Data Gap

Total new disk (TB) since 1995

Number of analysts

From: R. Grossman, C. Kamath, V. Kumar, “Data Mining for Scientific and Engineering Applications”

Page 5: 1 1 Slide Introduction to Data Mining and Business Intelligence.

5 5 Slide

Slide

What is business intelligence?

Page 6: 1 1 Slide Introduction to Data Mining and Business Intelligence.

6 6 Slide

Slide

BUSINESS INTELLIGENCE

Business intelligence (BI) – applications and technologies used to gather, provide access to, and analyze data and information to support decision-making efforts

Page 7: 1 1 Slide Introduction to Data Mining and Business Intelligence.

7 7 Slide

Slide

The Problem: Data Rich, Information Poor

Businesses face a data explosion as digital images, email in-boxes, and broadband connections doubles by 2010

The amount of data generated is doubling every year

Some believe it will soon double monthly

Page 8: 1 1 Slide Introduction to Data Mining and Business Intelligence.

8 8 Slide

Slide

The Solution: Business Intelligence

Improving the quality of business decisions has a direct impact on costs and revenue

BI systems and tools results in creating an agile intelligent enterprise

Page 9: 1 1 Slide Introduction to Data Mining and Business Intelligence.

9 9 Slide

Slide

The Solution: Business Intelligence

BI enables business users to receive data for analysis that is:• Reliable• Consistent• Understandable• Easily manipulated

Page 10: 1 1 Slide Introduction to Data Mining and Business Intelligence.

10 10 Slide

Slide

The Solution: Business Intelligence

BI can answer tough customer questions

Page 11: 1 1 Slide Introduction to Data Mining and Business Intelligence.

11 11 Slide

Slide

What is data mining?

Page 12: 1 1 Slide Introduction to Data Mining and Business Intelligence.

12 12 Slide

Slide

DATA MINING

Data mining (knowledge discovery from data) •Extraction of interesting (non-trivial, implicit, previously unknown and potentially useful) patterns or knowledge from huge amount of data

Page 13: 1 1 Slide Introduction to Data Mining and Business Intelligence.

13 13 Slide

Slide

What is Data Mining?

Many Definitions• Non-trivial extraction of implicit,

previously unknown and potentially useful information from data

• Exploration & analysis, by automatic or semi-automatic means, of large quantities of data in order to discover meaningful patterns

Page 14: 1 1 Slide Introduction to Data Mining and Business Intelligence.

14 14 Slide

Slide

What is (not) Data Mining?

What is Data Mining?

– Certain names are more prevalent in certain US

locations (O’Brien, O’Rurke, O’Reilly… in

Boston area)

– Group together similar documents returned by search engine according

to their context (e.g. Amazon rainforest,

Amazon.com,)

What is not Data Mining?

– Look up phone number in

phone directory

– Query a Web search engine for information

about “Amazon”

Page 15: 1 1 Slide Introduction to Data Mining and Business Intelligence.

15 15 Slide

Slide

DATA MINING

Data-mining tools – use a variety of techniques to find patterns and relationships in large volumes of information • Clustering • Classification • Affinity grouping (Association

Detection)• Statistical Estimation and Prediction

Page 16: 1 1 Slide Introduction to Data Mining and Business Intelligence.

16 16 Slide

Slide

Cluster Analysis

Cluster analysis – a technique used to divide an information set into mutually exclusive groups such that the members of each group are as close together as possible to one another and the different groups are as far apart as possible

CRM systems depend on cluster analysis to segment customer information and identify behavioral traits

Page 17: 1 1 Slide Introduction to Data Mining and Business Intelligence.

17 17 Slide

Slide

Cluster Analysis

Page 18: 1 1 Slide Introduction to Data Mining and Business Intelligence.

18 18 Slide

Slide

Classification

Classification – finds a model to categorize input information into several pre-defined groups.

E.g. classification of credit card approval applications, classification of documents, etc.

Page 19: 1 1 Slide Introduction to Data Mining and Business Intelligence.

19 19 Slide

Slide

Association Detection

Association detection – reveals the degree to which variables are related and the nature and frequency of these relationships in the information

• Market basket analysis• E.g. beer and diapers were often

purchased together move them closer

Page 20: 1 1 Slide Introduction to Data Mining and Business Intelligence.

20 20 Slide

Slide

Statistical Analysis

Statistical analysis – performs such functions as information correlations, distributions, calculations, and variance analysis

• Forecast – predictions made on the basis of time-series information

• Time-series information – time-stamped information collected at a particular frequency