2010 Data Miner Survey
-
Upload
sunpreet-singh -
Category
Documents
-
view
216 -
download
0
Transcript of 2010 Data Miner Survey
-
8/7/2019 2010 Data Miner Survey
1/13
Karl Rexer, PhD
PresidentRexer Analytics
www.RexerAnalytics.com
2010 Data Miner Survey Highlights The Views of 735 Data Miners
Predictive Analytics World
Washington, DCOctober 2010
-
8/7/2019 2010 Data Miner Survey
2/13
2010 Rexer Analytics 2
2010 Data Miner Survey: Overview
Fourth annual survey
47 questions
10,000+ invitations emailed
plus newsgroups, vendors,and snowball referrals
Respondents:735 data miners
from 60 countries
33%
31%
12%
5%
19%
Corporate
Consultants
Note: Data from tool vendors was
excluded from many analyses
Academics
NGO / Govt
Vendors
45%
36%
12%
North America
USA 40% Canada 4%
Europe
Germany 7%
UK 5% France 4%
Poland 4%
Asia Pacific
India 4% Australia 3% China 2%
Central & South
America (4%)
Columbia 2% Brazil 1%
Middle East & Africa (3%)
Israel 1%
Turkey 1%
-
8/7/2019 2010 Data Miner Survey
3/13
2010 Rexer Analytics 3
10%
10%
10%
11%
13%
13%
14%
15%
15%
25%
29%
41%
0% 10% 20% 30% 40% 50%
Government
Internet-based
Manufacturing
Medical
Technology
Pharmaceutical
Retail
Telecommunications
Insurance
Academic
Financial
CRM / Marketing
Fields Applying Data Mining
Question: In what fields do youTYPICALLY apply data mining?
(Select all that apply)
CRM / Marketing, Financial and Academic are the most commonlyreported fields. This has been consistent since the 2007 survey.
Many data miners work in several fields.
-
8/7/2019 2010 Data Miner Survey
4/13
2010 Rexer Analytics 4
8%
9%
9%
11%
12%
13%
14%
16%
21%
21%
22%
25%
26%
27%
31%
32%
60%
68%
69%
0% 10% 20% 30% 40% 50% 60% 70%
MARS
Uplift Modeling
Link Analysis
Genetic Algorithms
Social Network Analysis
Rule Induction
Survival Analysis
Anomoly Detection
Bayesian
Support Vector
Ensemble Models
Association Rules
Text Mining
Factor AnalysisNeural Nets
Time Series
Cluster Analysis
Regression
Decision Trees
Data Mining Algorithms
Decision trees, regression, and cluster analysis continue to form a triad of corealgorithms for most data miners. This is very consistent, year to year.
However, a wide variety of algorithms are being used.
Question: What algorithms/analytic methods do you TYPICALLY use? (Select all that apply)
Corporate Consultants Academic NGO / Govt
10% 12% 4% 5%
Ensemble Models
Uplift Modeling
Corporate Consultants Academic NGO / Govt
21% 27% 20% 18%
-
8/7/2019 2010 Data Miner Survey
5/13
2010 Rexer Analytics 5
Text Mining
STATISTICA Text Miner 19%
IBM SPSS Modeler 17%
SAS Text Miner 9%
IBM SPSS Text Analytics 7%
Rapid Miner 6%
Provalis Wordstat 2%
GATE 2%
KXEN 2%
Oracle Text or ODM 1%Megaputer Text Analyst 1%
Autonomy 1%
Other 35%Text Miners
About a third of data minerscurrently incorporate text
mining into their analyses,
and another third plan to.
Software Used
Plan to Start
Text Mining
No Plans to
Conduct TextMining
0% 20% 40% 60%
The focus of our text miningis to extract key themes
(sentiment analysis)
We use text fields as inputs /predictors in a larger model
We use text mining as part ofsocial network analyses
30%
34%
36%
55%
59%
21%
-
8/7/2019 2010 Data Miner Survey
6/13
2010 Rexer Analytics 6
35%
24%
49%
39%
26%
18%
7%
0%60%
Computing Environments
A lot of data mining happens on desktop and laptop computers. Frequently the data and processing is local(not on servers, mainframe or cloud).
Only a small minority of data mining is on the cloud.
Question: What are the computing environments/platforms on which datamining/analytics occurs at your company/organization? (Check all that apply)
Corporate
Consultant
Academic
NGO/Govt
Vendor
5% 10% 7% 3% 14%
20% 16% 14% 32% 26%
28% 30% 19% 29% 45%
48% 36% 25% 47% 39%
43% 49% 58% 58% 35%
29% 24% 15% 32% 37%
28% 36% 46% 42% 44%
Cloud Computing
Centralized Mainframe/Server
Local Server
Desktop PC/Workstation (with data &processing on server, mainframe or cloud)
Desktop PC/Workstation (withdata & processing locally)
Laptop PC (with data & processingon server, mainframe or cloud)
Laptop PC (with data &processing locally)
Overall
-
8/7/2019 2010 Data Miner Survey
7/13
2010 Rexer Analytics 7
Analytic Capability & Data Quality
Analytic capability: Theres room to improve if were going to Compete on Analytics.
Data Quality Question: How do you rate the quality of dataavailable for analysis at your company/organization?
Data quality: 48% rate it strong or very strong (same as last year)
16% rate it poor or very poor (13% last year)
Analytic Capability Question: How do you rate theanalytic capabilities of your company/organization?
13%35%30%20%
8%40%35%13%
-
8/7/2019 2010 Data Miner Survey
8/13
2010 Rexer Analytics 8
Overcoming Challenges: Best Practices
Top challenges facing data miners: Dirty data: #1 challenge every year, 2007-2010
Explaining data mining to others: always in the top 4 challenges,2007-2010
Difficult access to data: always in the top 3 challenges, 2007-2010
This year survey respondents provided BestPractices for overcoming these challenges. E.g., Dirty Data: Use anomaly detection to flag records to put before
subject matter experts.
E.g., Dirty Data: All projects begin with low-level data reports showingcounts of records, verification of keys (uniqueness, widows/orphans), and
distributions of field contents. These reports are echoed back to the datacontent experts.
See the list of Best Practices at www.RexerAnalytics.com in early
November.
-
8/7/2019 2010 Data Miner Survey
9/13
2010 Rexer Analytics 9
Data Mining SoftwareSurvey Questions:
What Data mining/analytic tools did you use in2009? (rate each as never, occasionally, orfrequently)
What one Data Mining software package do youuse most frequently?
Overall Corporate Consultants Academics NGO / Govt
The average data miner reports using 4.6 software tools. R is used by the most data miners (43%).
STATISTICA is the primary data mining tool chosen most often (18%).
-
8/7/2019 2010 Data Miner Survey
10/13
2010 Rexer Analytics 10
Satisfaction with Data Mining Tools
Question: Please rate your overall satisfaction
with your primary Data Mining software package.
2010 2009
Sample size < 20
STATISTICA received the highest satisfaction ratings. Consistent with
the 2009 findings, R and SPSS Modeler users are also quite satisfied.
About 80% of STATISTICA and R users also report that they are extremely likely to
stay with these primary tools over the next 3 years. This is reported by only 42-45%
of SAS, SPSS Statistics, and SAS-EM users; and only 18% of Weka users.
Continued Use question (not graphed): What is the likelihood that you will continue
to use this tool as your primary Data Mining software package over the next 3 years?
-
8/7/2019 2010 Data Miner Survey
11/13
2010 Rexer Analytics 11
Data Mining and the Economy
Question: How will the number of data mining projects your
organization conducts in 2010 compare to what has beentypical in the past few years?
There is a strong market for data mining: 73% of data miners foresee increases in the number of data mining projects. Offshoring of data mining is also increasing: It is reported by 14% of data
miners this year (8% last year).
Offshoring Question (not graphed): Has your company moved
any data mining or other analytics to another country to takeadvantage of lower wages in the destination country?
Number of Data Mining Projects in 2010
-
8/7/2019 2010 Data Miner Survey
12/13
2010 Rexer Analytics 12
Number of respondents
What do you envision as the primary future trends in data
mining? (open-ended survey question)
Future Trends in Data Mining
50
32
32
26
15
15
12
11
0 10 20 30 40 50 60
Growth in Data Mining Adoption
Text Mining
Social Network Analysis
Automation
Cloud Computing
Data Visualization
Tools Get Easier to Use
Scaling to Bigger Data
-
8/7/2019 2010 Data Miner Survey
13/13
2010 Rexer Analytics 13
How to Get More Information
Questions? Talk with me at PAW Call or email me if you dont see me in the hallways
Copy of these slides Available now
2010 Data Miner Survey Summary Report (Free) Available in early November
Available at PAW website or email me
Best Practices for overcoming data miningchallenges Available in early November at
www.RexerAnalytics.com
Karl Rexer, PhD
617-233-8185