Harnessing Spark and Cassandra with Groovy

Harnessing the Power of Spark + Cassandra with Groovy

Steve Pember CTO, ThirdChannel Gr8Conf US, 2017

@svpember

Relational Database are Fantastic

SQL makes you Strong

@svpember

Agenda• Spark

• Cassandra

• Spark + Cassandra

• Working with Spark + Cassandra

• Demo

@svpember

Apache Spark• Distributed Execution Engine

–Johnny Appleseed

“Type a quote here.”

@svpember

• What about Hadoop?

@svpember

Hadoop Spark• Map / Reduce

• Storage via HDFS

• Each calculation step written to disk

• More than Map/Reduce

• No dependent storage mechanism

• Clustered Calculations, each step in memory

@svpember

• Creation was a Happy Accident

–Johnny Appleseed

@svpember

• Architecture

–Johnny Appleseed

Your Groovy App

@svpember

• Architecture

• Programatic structure

The SparkContext submits Jobs to the Cluster

Operations are performed against RDDs

@svpember

Resilient Distributed Dataset• Immutable

• Partitioned

• Parallel operations

• Created by performing operations on other RDDs

• Reusable & Composable

@svpember

• Architecture

• APIs

More Than Map/Reduce

@svpember

RDD operations• map

• reduce

• filter

• flatmap

• zip

• groupBy

• … plus many more

–Johnny Appleseed

@svpember

• Architecture

• APIs

• Additional Modules

Spark SQL…!

Spark Streaming!

@svpember

Agenda• Spark

• Cassandra

@svpember

Apache Cassandra (C*)• NoSql Datastore

@svpember

• Distributed

Deterministic Distribution

@svpember

• Distributed

• High Replication

@svpember

• Distributed

• High Durability

@svpember

• Distributed

• High Durability

• Linear Scalability

Each new Node results in increased Storage with no loss

in performance

@svpember

• Distributed

• High Durability

• Data Model (CQL)

Column Oriented Database

But it’s SQL-like!

@svpember

Querying

@svpember

C* Querying• select * from

• all queries must include partition key(s) in where clause

• order by limited to group keys

• cannot alter keys, queries must always be by same keys

@svpember

• Distributed

• High Durability

• Data Model (CQL)

• Designing your Data Model

@svpember

Agenda• Spark

• Cassandra

@svpember

Spark + Cassandra• Reduce each other’s weaknesses

• Filter on the server side (with c*)

• Join tables, filter results (with Spark)

Companies have been formed

–Johnny Appleseed

Cluster Design

@svpember

Data Locality!

@svpember

Pipeline architecture

@svpember

Agenda• Spark

• Cassandra

Coding Spark + C*

@svpember

Terminology• SparkConf

• JavaSparkContext

• JavaFunctions

• Mappers

@svpember

Spark Conf• spark.master -> url to the master node

• spark.app.name -> want to see your client show up in the Spark UI?

• spark.executor.memory -> Limits memory per executor on workers

• spark.executor.cores -> limits cores on each worker (need to share with c*!)

• spark.submit.deployMode -> ‘client’ or ‘cluster

• spark.jars.packages -> maven / gradle type names

• spark.jars.ivy -> specify custom repos for packages

• more at: http://spark.apache.org/docs/latest/configuration.html#available-properties

@svpember

Master Url Overloading• “local” -> use Spark in stand alone mode. One thread

• “local[<K>]” -> Spark, stand alone, with K threads

• “local[*]” -> Spark, stand alone, with ALL YOUR THREADS!

• “spark://<host string>:<port>” -> url for a Spark cluster master node, using Spark’s cluster management

• also options for Mesos and Yarn

@svpember

However, a Warning

But where does my code live?

@svpember

CLASS_PATH: org.apache.spark,

com.fasterxml.jackson, com.yourco.yourapp.pojos.*

com.fasterxml.jackson

@svpember

Agenda• Spark

• Cassandra

• Demo

Thank You!

@svpember

Links• Cassandra on AWS official Whitepaper: https://d0.awsstatic.com/whitepapers/Cassandra_on_AWS.pdf

• Demo code: https://github.com/spember/ratpack-spark-cassandra-demo

@svpember

Images• Database Sharding: https://dzone.com/articles/ebay-secret-database-scaling

• Indian Jones Warehouse: http://logisticalfictions.tumblr.com/page/9

• Strong (Spongebob): www.reactiongifs.com/strongbob/?utm_source=rss&utm_medium=rss&utm_campaign=strongbob

• Cheetah: www.livescience.com/21944-usain-bolt-vs-cheetah-animal-olympics.html

• Big Data Cartoon: http://www.kdnuggets.com/2016/08/cartoon-make-data-great-again.html

• Spark Streaming: http://velvia.github.io/presentations/2015-filodb-spark-streaming/#/

• Picard + Riker: http://www.douxreviews.com/2015/09/star-trek-next-generation-matter-of.html

• Software Engineers: http://pyxurz.blogspot.com/2011/10/office-space-page-2-of-6.html

Harnessing Spark and Cassandra with Groovy

Software

Transcript of Harnessing Spark and Cassandra with Groovy

GROOVY COTBED BED GROOVY LIT GROOVY · GROOVY GROOVY BED LIT. SAFETY INSTRUCTIONS Your child’s safety is your responsibility.. and use only spare parts approved by the manufacturer.

Bootstrapping Groovy

Groovy Scripting - Apache Software Foundationpeople.apache.org/.../presentations/jsug-2016/groovy-scripting.pdf · • Groovy is easy for Java developers • Groovy is nice for non-trivial

Groovy 2.5 features & Groovy 3+ Roadmap · Groovy Roadmap Groovy 2.5 2.5.4 released, 2.5.5 soon Macros, AST transformation improvements, various misc. features JDK 7 minimum, runs

Running Cassandra on Amazon’s ECS - Meetupfiles.meetup.com/7439192/Cassandra-ECS.pdf · • Cassandra • ECS • Cassandra on Docker best practices • Cassandra on ECS. Motivation.

Cassandra Summit 2014: Cassandra Compute Cloud: An elastic Cassandra Infrastructure

Siegfried Goeschl Groovy Scripting - home.apache.orgpeople.apache.org/.../2016/linuxwochen/groovy-scripting.pdf · • Groovy is easy for Java developers • Groovy is nice for non-trivial

Groovy grailstutorial

Agile Development with Groovy and Grails - s3.amazonaws.com · Beginning Groovy and Grails 1. Introduction to Groovy 2. Groovy Basics 3. More Advanced Groovy 4. Introduction to Grails

Groovy Finance

Groovy Documentationdocs.groovy-lang.org/docs/groovy-2.1.3/pdf/wiki-snapshot.pdf · Groovy Introduction Groovy... • is an agile and dynamic language for the Java Virtual Machine

Groovy grails20110120Présentation sur Groovy / Grails par Olivier Gourment

Groovy Programming Cookbook · Groovy Programming Cookbook ii Contents 1 Groovy Script Tutorial for Beginners 1 1.1 Environment ...

Groovy 2.5 features & Groovy 3+ Roadmap...What is Groovy? Groovy = Java –boiler plate code + extensible dynamic and static natures + better functional programming + better OO programming

Groovy scripts with Groovy

Functional Groovy

Groovy Power

Groovy Vampires: Combining Groovy, REST, NoSQL, and more

Programming Groovy 2media.pragprog.com/titles/vslg2/toc.pdf10.3 Using Groovy Classes from Groovy ? 10.4 Intermixing Groovy and Java with Joint Compilation ? 10.5 Creating and Passing

Groovy Language Overview Rapid tour of Groovy · PDF fileRapid tour of Groovy Differences from Java Features used by Grails Groovy Language Overview 1 @ Robert Rodini