Better insights. Faster innovation.€¦ · spark meetup members 67% code contributors 2016 1,000...
Transcript of Better insights. Faster innovation.€¦ · spark meetup members 67% code contributors 2016 1,000...
![Page 1: Better insights. Faster innovation.€¦ · spark meetup members 67% code contributors 2016 1,000 2015 600 2016 225,000 2015 66,000 57% companies represented at summits 1,800 2015](https://reader036.fdocuments.us/reader036/viewer/2022071115/5fefb4d02db4fc0c42305897/html5/thumbnails/1.jpg)
Apache Spark is being used to create sophisticated products within the same organization
Features users consider importantRespondents were allowed to select more than one feature.
PERFORMANCE91% EASE OF
PROGRAMMING
76%EASE OFDEPLOYMENT
69%
ADVANCED ANALYTICS
82%
REAL-TIMESTREAMING
51%
Apache Spark’s fastest growing areas
57%STREAMING USERS
38%
ADVANCED ANALYTICSUSERS (MLLIB)
153%DATAFRAME USERS
67%SPARK SQL USERS
The Apache Spark Survey identifies how organizations use Apache Spark. The results from this survey reflect the answers and opinions of 1,615 respondents representing over 900 organizations. Survey respondents were predominantly Apache Spark users.
The survey results suggest that Spark’s growth continues across various industries, as more people in various functional roles use it to build increasingly sophisticated data solutions. Mixing and matching multiple Spark components, different users build various types of products in production. The results indicate that Spark has moved well beyond the early-adopter phase at high-tech companies and is now mainstream in large data-driven enterprises. And with the rise of public cloud computing, the survey findings reflect that users show an increased affinity toward using Spark in the public cloud.
By surveying our community, we’re able to gather essential insights about how our users are leveraging the features of Apache Spark. To learn more on how you can accelerate innovation with the most advanced Apache Spark-based platform, visit us at azure.com/databricks
Spark remains the most active open source project in Big Data. Today, there are over 1,000 Spark contributors, compared to 600 from 250+ organizations. With such large numbers of contributors and organizations investing in Spark’s future development, it is evident that Spark has engaged a community of developers globally.
Apache Spark’s growth continues
Apache Spark in the cloud is growing
Key industries using Apache Spark
1,615RESPONDENTS
900DISTINCT ORGANIZATIONS
ARCHITECTS
ACADEMICS
21%
TECHNICALMANAGEMENT
10%
5%DATA
ENGINEERS
41%
DATASCIENTISTS
23%
240%SPARK MEETUP MEMBERS
67%CODE CONTRIBUTORS
20161,000
2015600
2016225,000
201566,000
57%COMPANIES REPRESENTED AT SUMMITS
20161,800
20151,144
30%SPARK SUMMIT ATTENDEES
20165,100
20153,912
68%
52%
45%
40%
37%
36%
29%
BUSINESS / CUSTOMER INTELLIGENCE
DATA WAREHOUSING
REAL-TIME / STREAMING SOLUTIONS
RECOMMENDATION ENGINES
LOG PROCESSING
USER-FACING SERVICES
FRAUD DETECTION / SECURITY
Languages used in Apache Spark Spark components used in production
71%65%
2015 2016 2015 2016 2015 2016 2015 2016 2015 2016 2015 2016 2015 2016 2015 2016 2015 2016
SCALA
58% 62%
2015 2016
PYTHON
31% 29%
2015 2016
JAVA
18% 20%
2015 2016
R
36% 44%
2015 2016
SQL DATAFRAMES
15%
38%
2015 2016
SQL
24%
40%
2015 2016
STREAMING
14%
22%
2015 2016
ADVANCED ANALYTICS (MLlib)
13%18%
2016
Respondents were allowed to select more than one component. Respondents were allowed to select more than one language.
Apache Spark’s growth and adoption continues as users, industries, development environments, disciplines, and programming languages embrace its ease of use and programming, its unified computer engine, and its performance to solve complex data problems at scale.
The rise of cloud computing is rapid, inexorable, and causing a huge upheaval in the tech industry. We observe this trend reflected in the survey results, as many respondents elect to deploy Spark in the cloud, reaping its many benefits. Not only do cloud deployments have lower deployment costs and fewer management headaches, they have higher and proven performance benefits.
Percentage decrease in on-premises deployments
11%
40%
48%
7%
36%42%
STANDALONEYARNMESOS
2015 2016 2015 2016 2015 2016
18%
2016
Respondents were allowed to select more than one component. Respondents were allowed to select more than one component.
18%CONSULTING
(IT)
25%SOFTWARE
(SAAS, WEB, MOBILE)11% BANKING /
FINANCE
7% ADVERTISING /MARKETING /
PR
6%
ECOMMERCE / RETAIL
5%HEALTH / MEDICAL / PHARMACY / BIOTECH
CARRIERS / TELECOM
5%
4%
3%
EDUCATION
PUBLISHING / MEDIA
COMPUTERS / HARDWARE
3% 13%OTHER
Respondents were allowed to select more than one product.
Apache Spark Streaming is growing. Since its release, Spark Streaming has become one of the most widely used distributed streaming engines. Interest in developing real-time applications and advanced analytics is on the rise.
Apache Spark streaming and machine learning surge in usage
Importance of Apache Spark streaming to users
Apache Spark streaming and MLlib use
The numbers show that Spark Streaming is the preferred streaming engine for organizations building real-time, end-to-end streaming solutions, from evaluation to production.
Types of products developed using Apache Spark
40% of respondents developRECOMMENDATION ENGINE PRODUCTS
45% of respondents developREAL-TIME STREAMING PRODUCTS
29% of respondents developFRAUD DETECTION / SECURITY PRODUCTS
2015 2016
13%
2015
STREAMING ADVANCED ANALYTICS (MLLIB )
14%
22%
57%STREAMING
PRODUCTION CASES
38%ADVANCED ANALYTICS (MLLIB)
PRODUCTION CASES
35%SOMEWHATIMPORTANT
VERY IMPORTANT
51%NOT
IMPORTANT
14%
Azure DatabricksData science, engineering, and business come together like never before with Microsoft Azure Databricks, the most advanced Apache Spark platform. With a high-performance processing engine that’s optimized for Azure, you’re able to improve and scale your analytics on a global scale—saving valuable time and money, while driving new insights and innovation for your organization. Microsoft Azure Databricks offers an intelligent, end-to-end solution for all your data and analytics challenges.
Copyright © 2017 Microsoft, Inc. All rights reserved. This infographic is for informational purposes only. Microsoft makes no warranties, express or implied, with respect to the information presented here.
Apache® SparkTM Survey highlights
Better insights.Faster innovation.