Suggestion Mining: Towards an Opinionated Artificial Intelligence
1 1 Slide Introduction to Data Mining and Business Intelligence.
-
Upload
marjorie-tate -
Category
Documents
-
view
215 -
download
1
Transcript of 1 1 Slide Introduction to Data Mining and Business Intelligence.
1 1 Slide
Slide
Introduction to Data Mining and Business Intelligence
2 2 Slide
Slide
Why Mine Data? Commercial Viewpoint
Lots of data is being collected and warehoused • Web data, e-commerce• purchases at department/
grocery stores• Bank/Credit Card
transactions
Computers have become cheaper and more powerful Competitive Pressure is Strong
• Provide better, customized services for an edge (e.g. in Customer Relationship Management)
3 3 Slide
Slide
Why Mine Data? Scientific Viewpoint
Data collected and stored at enormous speeds (GB/hour)
• remote sensors on a satellite
• telescopes scanning the skies
• microarrays generating gene expression data
• scientific simulations generating terabytes of data
Traditional techniques infeasible for raw data Data mining may help scientists
• in classifying and segmenting data• in Hypothesis Formation
4 4 Slide
Slide
Mining Large Data Sets - Motivation
There is often information “hidden” in the data that is not readily evident
Human analysts may take weeks to discover useful information
Much of the data is never analyzed at all
0
500,000
1,000,000
1,500,000
2,000,000
2,500,000
3,000,000
3,500,000
4,000,000
1995 1996 1997 1998 1999
The Data Gap
Total new disk (TB) since 1995
Number of analysts
From: R. Grossman, C. Kamath, V. Kumar, “Data Mining for Scientific and Engineering Applications”
5 5 Slide
Slide
What is business intelligence?
6 6 Slide
Slide
BUSINESS INTELLIGENCE
Business intelligence (BI) – applications and technologies used to gather, provide access to, and analyze data and information to support decision-making efforts
7 7 Slide
Slide
The Problem: Data Rich, Information Poor
Businesses face a data explosion as digital images, email in-boxes, and broadband connections doubles by 2010
The amount of data generated is doubling every year
Some believe it will soon double monthly
8 8 Slide
Slide
The Solution: Business Intelligence
Improving the quality of business decisions has a direct impact on costs and revenue
BI systems and tools results in creating an agile intelligent enterprise
9 9 Slide
Slide
The Solution: Business Intelligence
BI enables business users to receive data for analysis that is:• Reliable• Consistent• Understandable• Easily manipulated
10 10 Slide
Slide
The Solution: Business Intelligence
BI can answer tough customer questions
11 11 Slide
Slide
What is data mining?
12 12 Slide
Slide
DATA MINING
Data mining (knowledge discovery from data) •Extraction of interesting (non-trivial, implicit, previously unknown and potentially useful) patterns or knowledge from huge amount of data
13 13 Slide
Slide
What is Data Mining?
Many Definitions• Non-trivial extraction of implicit,
previously unknown and potentially useful information from data
• Exploration & analysis, by automatic or semi-automatic means, of large quantities of data in order to discover meaningful patterns
14 14 Slide
Slide
What is (not) Data Mining?
What is Data Mining?
– Certain names are more prevalent in certain US
locations (O’Brien, O’Rurke, O’Reilly… in
Boston area)
– Group together similar documents returned by search engine according
to their context (e.g. Amazon rainforest,
Amazon.com,)
What is not Data Mining?
– Look up phone number in
phone directory
– Query a Web search engine for information
about “Amazon”
15 15 Slide
Slide
DATA MINING
Data-mining tools – use a variety of techniques to find patterns and relationships in large volumes of information • Clustering • Classification • Affinity grouping (Association
Detection)• Statistical Estimation and Prediction
16 16 Slide
Slide
Cluster Analysis
Cluster analysis – a technique used to divide an information set into mutually exclusive groups such that the members of each group are as close together as possible to one another and the different groups are as far apart as possible
CRM systems depend on cluster analysis to segment customer information and identify behavioral traits
17 17 Slide
Slide
Cluster Analysis
18 18 Slide
Slide
Classification
Classification – finds a model to categorize input information into several pre-defined groups.
E.g. classification of credit card approval applications, classification of documents, etc.
19 19 Slide
Slide
Association Detection
Association detection – reveals the degree to which variables are related and the nature and frequency of these relationships in the information
• Market basket analysis• E.g. beer and diapers were often
purchased together move them closer
20 20 Slide
Slide
Statistical Analysis
Statistical analysis – performs such functions as information correlations, distributions, calculations, and variance analysis
• Forecast – predictions made on the basis of time-series information
• Time-series information – time-stamped information collected at a particular frequency