Announcing Spark Driver for Cassandra

19
Spark and Cassandra Martin Van Ryswyk, EVP Engineering

description

Spark Driver v 1.0 is now available with free Apache 2.0 license with DataStax. DataStax and Databricks have partnered together to make it easier than ever before to integrate both open source technologies and help the most progressive data-driven businesses succeed today. Cassandra and Spark together arm users with analysis to power online recommendations, personalized customer experiences, sensor-data detection, fraud detection and more.

Transcript of Announcing Spark Driver for Cassandra

Page 1: Announcing Spark Driver for Cassandra

Spark and CassandraMartin Van Ryswyk, EVP Engineering

Page 2: Announcing Spark Driver for Cassandra

DataStax 2014 2

Partnership

Jonathan EllisChairman, Apache Cassandra

+Matei Zaharia Creator, Apache Spark

Announced in May 2014Based on clear feedback from both communitiesCommitted to an integration between Spark and

Cassandra

Page 3: Announcing Spark Driver for Cassandra

3

What is Apache Cassandra?

Page 4: Announcing Spark Driver for Cassandra

 

Apache CassandraMassively scalable databaseAlways onFully distributedLinear scale performanceFlexible NoSQL data modelOperationally simpleSQL-Like language Free tools and drivers

San Francisco

New York

Dubai

©2014 DataStax

Page 5: Announcing Spark Driver for Cassandra

Apache Cassandra

Page 6: Announcing Spark Driver for Cassandra

6

Netflix Ensures Constant Uptime with DataStax EnterpriseWorld’s leading streaming media provider with digital revenue $1.5BN+95% of all Netflix data stored in DSEIntroduction of ‘Profiles’ drove throughput to over 10M transactions per secondDoes 1 trillion transactions/day with DSEReplaced Oracle in six data centers, worldwide, 100% in the cloud

Page 7: Announcing Spark Driver for Cassandra

7

Delivers 150+ Billion Content Recommendations Per MonthServes content for largest media brands in the world: Reuters, Wall St Journal, USA TodayNeeded a massively scalable data store High velocity of data with 58,000 links to content per secondAlways-on data architectureLost a data center during Hurricane Sandy but never went offline

Page 8: Announcing Spark Driver for Cassandra

8

The Weather Channel

While I have years of experience using Cassandra, my team was mostly new to it; CQL made their transition essentially painless. But where Cassandra really shines is in speed and operational simplicity, and I would say those two points were critical.”

ROBBIE STRICKLAND Software Dev Manager

Page 9: Announcing Spark Driver for Cassandra

Faster Feedback Loops

9

ETL AnalyticalTransactional C*

Page 10: Announcing Spark Driver for Cassandra

10

Announcing cassandra-driver-spark

Page 11: Announcing Spark Driver for Cassandra

cassandra-driver-spark v1.0

• Developed by DataStax with support and review by Databricks

• Free with Apache 2.0 license from DataStax https://github.com/datastax/cassandra-driver-

spark• Question to Apache Spark User List

[email protected]• Offering driver code to Apache Spark community

©2014 DataStax

Page 12: Announcing Spark Driver for Cassandra

• Exposes Cassandra tables as RDD• Map table rows to CassandraRow objects• Data type conversions between Cassandra and Scala• Save RDDs back to Cassandra by implicit saveToCassandra

call• Filter rows on server via CQL WHERE clause

(CassandraRDD#where method)• Select subset of columns (CassandraRDD#select method)• Optimizations for Cassandra vnodes

©2014 DataStax

cassandra-driver-spark v1.0

Page 13: Announcing Spark Driver for Cassandra

Example

©2014 DataStax

Page 14: Announcing Spark Driver for Cassandra

Performance

©2014 DataStax

Ex-treme

Simple0

102030405060

Hadoop M/RSpark

seco

nds

Extreme: data not in memory and fetched from multiple nodesSimple: data set fits in memory

2-30x faster than highly optimized “Hadoop on C*” implementation in DataStax Enterprise

Page 15: Announcing Spark Driver for Cassandra

15

What is DataStax Enterprise?

Page 16: Announcing Spark Driver for Cassandra

Security Analytics Search VisualMonitoring

ManagementServices In-Memory

Delivering Apache Cassandra to the Enterprise

Dev. IDE & Drivers

ProfessionalServices

Support & Training

Strong Data ProtectionIn-Memory OLTP/AnalyticsPoint-and-Click/Automated Mgmt

Certified Production CassandraMulti-Workload/Use Case

CapableIntegrated OLTP, Analytics,

Search

©2014 DataStax

Page 17: Announcing Spark Driver for Cassandra

17

Announcing

Certified Spark

Distribution

Page 18: Announcing Spark Driver for Cassandra

cassandra-driver-spark Roadmap

• Integrate into Apache Spark• Update for Spark 1.0• Support for Additional Spark components

• Spark SQL• Spark Streaming• GraphX

• Listen to both communities…

Page 19: Announcing Spark Driver for Cassandra

Next Steps

datastax.com/cassandrasummit14

Today: Attend Tupshin and Al’s talk at 1pm Visit DataStax booth – meet expertsTonight: Download the integration

github.com/datastax/cassandra-driver-spark

Learn about Cassandra planetcassandra.org Download DataStax Enterprise www.datastax.com/downloads

September: