10 things you need to know about Spark

Suited for real-time applications—such as the

Internet of Things—where much or most of the

data analysis will be performed on cached, live

data, rather than stored, historical data.

.

Includes runtime engines that are optimized

for in-memory processing, streaming analytics,

graph analysis and machine learning.

Leverages existing

programming languages

such as Python,

Scala or SQL and

provides seamless

access to enterprise

data with familiar tools.

http://ibmbigdatahub.com/

Boosts data scientist

productivity through

in-memory performance,

easier APIs, support for

any programming

language and more

workflows.


Evolves user investments in advanced analytics, machine

learning platforms and big data platforms such as Hadoop.


Parallelizes big data analytics models across distributed

in-memory clusters, combining SQL, streaming and graph

analytics within the same application.


Initially developed at University of California Berkeley’s

AMPLab starting in 2009 and deepened through efforts

of an expanding open-source community and industry.


Open-sourced in 2013 by the Apache

Software Foundation to top-level status.


Continues to gain

active members,

with the Apache

Spark community

now boasting over

465 contributors.


Adoption by a growing range of organizations as the future

of their big data analytics environment for new challenges

requiring in-memory, machine learning, stream computing

and graph analysis.


Hungry for more information on Spark?

Get started learning more about Spark today at

BigDataUniversity.com


http://bigdatauniversity.com

10 things you need to know about Spark

Documents

Transcript of 10 things you need to know about Spark