Better insights. Faster innovation.€¦ · spark meetup members 67% code contributors 2016 1,000...

1
Apache Spark is being used to create sophisticated products within the same organization Features users consider important Respondents were allowed to select more than one feature. PERFORMANCE 91 % EASE OF PROGRAMMING 76 % EASE OF DEPLOYMENT 69 % ADVANCED ANALYTICS 82 % REAL-TIME STREAMING 51 % Apache Spark’s fastest growing areas 57 % STREAMING USERS 38 % ADVANCED ANALYTICS USERS (MLLIB) 153 % DATAFRAME USERS 67 % SPARK SQL USERS The Apache Spark Survey identifies how organizations use Apache Spark. The results from this survey reflect the answers and opinions of 1,615 respondents representing over 900 organizations. Survey respondents were predominantly Apache Spark users. The survey results suggest that Spark’s growth continues across various industries, as more people in various functional roles use it to build increasingly sophisticated data solutions. Mixing and matching multiple Spark components, different users build various types of products in production. The results indicate that Spark has moved well beyond the early-adopter phase at high-tech companies and is now mainstream in large data-driven enterprises. And with the rise of public cloud computing, the survey findings reflect that users show an increased affinity toward using Spark in the public cloud. By surveying our community, we’re able to gather essential insights about how our users are leveraging the features of Apache Spark. To learn more on how you can accelerate innovation with the most advanced Apache Spark-based platform, visit us at azure.com/databricks Spark remains the most active open source project in Big Data. Today, there are over 1,000 Spark contributors, compared to 600 from 250+ organizations. With such large numbers of contributors and organizations investing in Spark’s future development, it is evident that Spark has engaged a community of developers globally. Apache Spark’s growth continues Apache Spark in the cloud is growing Key industries using Apache Spark 1,615 RESPONDENTS 900 DISTINCT ORGANIZATIONS ARCHITECTS ACADEMICS 21 % TECHNICAL MANAGEMENT 10 % 5 % DATA ENGINEERS 41 % DATA SCIENTISTS 23 % 240 % SPARK MEETUP MEMBERS 67 % CODE CONTRIBUTORS 2016 1,000 2015 600 2016 225,000 2015 66,000 57 % COMPANIES REPRESENTED AT SUMMITS 2016 1,800 2015 1,144 30 % SPARK SUMMIT ATTENDEES 2016 5,100 2015 3,912 68 % 52 % 45 % 40 % 37 % 36 % 29 % BUSINESS / CUSTOMER INTELLIGENCE DATA WAREHOUSING REAL-TIME / STREAMING SOLUTIONS RECOMMENDATION ENGINES LOG PROCESSING USER-FACING SERVICES FRAUD DETECTION / SECURITY Languages used in Apache Spark Spark components used in production 71 % 65 % 2015 2016 2015 2016 2015 2016 2015 2016 2015 2016 2015 2016 2015 2016 2015 2016 2015 2016 SCALA 58 % 62 % PYTHON 31 % 29 % JAVA 18 % 20 % R 36 % 44 % SQL DATAFRAMES 15 % 38 % SQL 24 % 40 % STREAMING 14 % 22 % ADVANCED ANALYTICS (MLlib) 13 % 18 % Respondents were allowed to select more than one component. Respondents were allowed to select more than one language. Apache Spark’s growth and adoption continues as users, industries, development environments, disciplines, and programming languages embrace its ease of use and programming, its unified computer engine, and its performance to solve complex data problems at scale. The rise of cloud computing is rapid, inexorable, and causing a huge upheaval in the tech industry. We observe this trend reflected in the survey results, as many respondents elect to deploy Spark in the cloud, reaping its many benefits. Not only do cloud deployments have lower deployment costs and fewer management headaches, they have higher and proven performance benefits. Percentage decrease in on-premises deployments 11 % 40 % 48 % 7 % 36 % 42 % STANDALONE YARN MESOS 2015 2016 2015 2016 2015 2016 18 % Respondents were allowed to select more than one component. Respondents were allowed to select more than one component. 18 % CONSULTING (IT) 25 % SOFTWARE (SAAS, WEB, MOBILE) 11 % BANKING / FINANCE 7 % ADVERTISING / MARKETING / PR 6 % ECOMMERCE / RETAIL 5 % HEALTH / MEDICAL / PHARMACY / BIOTECH CARRIERS / TELECOM 5 % 4 % 3 % EDUCATION PUBLISHING / MEDIA COMPUTERS / HARDWARE 3 % 13 % OTHER Respondents were allowed to select more than one product. Apache Spark Streaming is growing. Since its release, Spark Streaming has become one of the most widely used distributed streaming engines. Interest in developing real-time applications and advanced analytics is on the rise. Apache Spark streaming and machine learning surge in usage Importance of Apache Spark streaming to users Apache Spark streaming and MLlib use The numbers show that Spark Streaming is the preferred streaming engine for organizations building real-time, end-to-end streaming solutions, from evaluation to production. Types of products developed using Apache Spark 40 % of respondents develop RECOMMENDATION ENGINE PRODUCTS 45 % of respondents develop REAL-TIME STREAMING PRODUCTS 29 % of respondents develop FRAUD DETECTION / SECURITY PRODUCTS 2015 2016 13 % 2015 STREAMING ADVANCED ANALYTICS (MLLIB ) 14 % 22 % 57 % STREAMING PRODUCTION CASES 38 % ADVANCED ANALYTICS (MLLIB) PRODUCTION CASES 35 % SOMEWHAT IMPORTANT VERY IMPORTANT 51 % NOT IMPORTANT 14 % Azu r e Databricks Data science, engineering, and business come together like never before with Microsoft Azure Databricks, the most advanced Apache Spark platform. With a high-performance processing engine that’s optimized for Azure, you’re able to improve and scale your analytics on a global scalesaving valuable time and money, while driving new insights and innovation for your organization. Microsoft Azure Databricks offers an intelligent, end-to-end solution for all your data and analytics challenges. Copyright © 2017 Microsoft, Inc. All rights reserved. This infographic is for informational purposes only. Microsoft makes no warranties, express or implied, with respect to the information presented here. Apache ® Spark TM Survey highlights Better insights. Faster innovation.

Transcript of Better insights. Faster innovation.€¦ · spark meetup members 67% code contributors 2016 1,000...

Page 1: Better insights. Faster innovation.€¦ · spark meetup members 67% code contributors 2016 1,000 2015 600 2016 225,000 2015 66,000 57% companies represented at summits 1,800 2015

Apache Spark is being used to create sophisticated products within the same organization

Features users consider importantRespondents were allowed to select more than one feature.

PERFORMANCE91% EASE OF

PROGRAMMING

76%EASE OFDEPLOYMENT

69%

ADVANCED ANALYTICS

82%

REAL-TIMESTREAMING

51%

Apache Spark’s fastest growing areas

57%STREAMING USERS

38%

ADVANCED ANALYTICSUSERS (MLLIB)

153%DATAFRAME USERS

67%SPARK SQL USERS

The Apache Spark Survey identifies how organizations use Apache Spark. The results from this survey reflect the answers and opinions of 1,615 respondents representing over 900 organizations. Survey respondents were predominantly Apache Spark users.

The survey results suggest that Spark’s growth continues across various industries, as more people in various functional roles use it to build increasingly sophisticated data solutions. Mixing and matching multiple Spark components, different users build various types of products in production. The results indicate that Spark has moved well beyond the early-adopter phase at high-tech companies and is now mainstream in large data-driven enterprises. And with the rise of public cloud computing, the survey findings reflect that users show an increased affinity toward using Spark in the public cloud.

By surveying our community, we’re able to gather essential insights about how our users are leveraging the features of Apache Spark. To learn more on how you can accelerate innovation with the most advanced Apache Spark-based platform, visit us at azure.com/databricks

Spark remains the most active open source project in Big Data. Today, there are over 1,000 Spark contributors, compared to 600 from 250+ organizations. With such large numbers of contributors and organizations investing in Spark’s future development, it is evident that Spark has engaged a community of developers globally.

Apache Spark’s growth continues

Apache Spark in the cloud is growing

Key industries using Apache Spark

1,615RESPONDENTS

900DISTINCT ORGANIZATIONS

ARCHITECTS

ACADEMICS

21%

TECHNICALMANAGEMENT

10%

5%DATA

ENGINEERS

41%

DATASCIENTISTS

23%

240%SPARK MEETUP MEMBERS

67%CODE CONTRIBUTORS

20161,000

2015600

2016225,000

201566,000

57%COMPANIES REPRESENTED AT SUMMITS

20161,800

20151,144

30%SPARK SUMMIT ATTENDEES

20165,100

20153,912

68%

52%

45%

40%

37%

36%

29%

BUSINESS / CUSTOMER INTELLIGENCE

DATA WAREHOUSING

REAL-TIME / STREAMING SOLUTIONS

RECOMMENDATION ENGINES

LOG PROCESSING

USER-FACING SERVICES

FRAUD DETECTION / SECURITY

Languages used in Apache Spark Spark components used in production

71%65%

2015 2016 2015 2016 2015 2016 2015 2016 2015 2016 2015 2016 2015 2016 2015 2016 2015 2016

SCALA

58% 62%

2015 2016

PYTHON

31% 29%

2015 2016

JAVA

18% 20%

2015 2016

R

36% 44%

2015 2016

SQL DATAFRAMES

15%

38%

2015 2016

SQL

24%

40%

2015 2016

STREAMING

14%

22%

2015 2016

ADVANCED ANALYTICS (MLlib)

13%18%

2016

Respondents were allowed to select more than one component. Respondents were allowed to select more than one language.

Apache Spark’s growth and adoption continues as users, industries, development environments, disciplines, and programming languages embrace its ease of use and programming, its unified computer engine, and its performance to solve complex data problems at scale.

The rise of cloud computing is rapid, inexorable, and causing a huge upheaval in the tech industry. We observe this trend reflected in the survey results, as many respondents elect to deploy Spark in the cloud, reaping its many benefits. Not only do cloud deployments have lower deployment costs and fewer management headaches, they have higher and proven performance benefits.

Percentage decrease in on-premises deployments

11%

40%

48%

7%

36%42%

STANDALONEYARNMESOS

2015 2016 2015 2016 2015 2016

18%

2016

Respondents were allowed to select more than one component. Respondents were allowed to select more than one component.

18%CONSULTING

(IT)

25%SOFTWARE

(SAAS, WEB, MOBILE)11% BANKING /

FINANCE

7% ADVERTISING /MARKETING /

PR

6%

ECOMMERCE / RETAIL

5%HEALTH / MEDICAL / PHARMACY / BIOTECH

CARRIERS / TELECOM

5%

4%

3%

EDUCATION

PUBLISHING / MEDIA

COMPUTERS / HARDWARE

3% 13%OTHER

Respondents were allowed to select more than one product.

Apache Spark Streaming is growing. Since its release, Spark Streaming has become one of the most widely used distributed streaming engines. Interest in developing real-time applications and advanced analytics is on the rise.

Apache Spark streaming and machine learning surge in usage

Importance of Apache Spark streaming to users

Apache Spark streaming and MLlib use

The numbers show that Spark Streaming is the preferred streaming engine for organizations building real-time, end-to-end streaming solutions, from evaluation to production.

Types of products developed using Apache Spark

40% of respondents developRECOMMENDATION ENGINE PRODUCTS

45% of respondents developREAL-TIME STREAMING PRODUCTS

29% of respondents developFRAUD DETECTION / SECURITY PRODUCTS

2015 2016

13%

2015

STREAMING ADVANCED ANALYTICS (MLLIB )

14%

22%

57%STREAMING

PRODUCTION CASES

38%ADVANCED ANALYTICS (MLLIB)

PRODUCTION CASES

35%SOMEWHATIMPORTANT

VERY IMPORTANT

51%NOT

IMPORTANT

14%

Azure DatabricksData science, engineering, and business come together like never before with Microsoft Azure Databricks, the most advanced Apache Spark platform. With a high-performance processing engine that’s optimized for Azure, you’re able to improve and scale your analytics on a global scale—saving valuable time and money, while driving new insights and innovation for your organization. Microsoft Azure Databricks offers an intelligent, end-to-end solution for all your data and analytics challenges.

Copyright © 2017 Microsoft, Inc. All rights reserved. This infographic is for informational purposes only. Microsoft makes no warranties, express or implied, with respect to the information presented here.

Apache® SparkTM Survey highlights

Better insights.Faster innovation.