Apache Spark talk @ The Amsterdam Applied Machine Learning meetup group

GoDataDrivenPROUDLY PART OF THE XEBIA GROUP

@fzk frisovanvollenhoven@godatadriven.com

Apache Spark

Friso van Vollenhoven

for applied machine learning

GoDataDriven

This talk is about tools.

GoDataDriven

Resilient Distributed Dataset

• Immutable set of records (e.g. tuples)

• Distributed across a cluster of workers

• Stored in RAM or on disk (partially)

• Built through transformations

• Automatically rebuilt on failure

• Possibly replicated

GoDataDriven

Operations

• Operate on RDD’s

• Create a new RDD

• Or materialise RDD and return data

• Transformations: map, filter, groupBy, etc.

• Actions: count, collect, reduce, save, etc.

GoDataDriven

The good parts

• Language bindings for Java, Scala and Python

• Works interactively from a shell:

• Scala + IPython (notebook)

• Plays nice with Hadoop

• Deploy on top of YARN cluster manager

• Read data from HDFS

• Hadoop-like fault tolerance

The better part?https://github.com/Bridgewater/scala-notebook

https://github.com/Sotera/spark-distributed-louvain-modularity

GoDataDriven

We’re hiring / Questions? / Thank you!

@fzk frisovanvollenhoven@godatadriven.com

Friso van Vollenhoven

Apache Spark talk @ The Amsterdam Applied Machine Learning meetup group

Technology

Transcript of Apache Spark talk @ The Amsterdam Applied Machine Learning meetup group

An evening with... Apache hadoop Meetup

3. Apache Tez Introducation - Apache Kylin Meetup @Shanghai

Apache Flink @ NYC Flink Meetup

Apache NiFi- MiNiFi meetup Slides

Apache spark meetup

Toulouse Data Science meetup - Apache zeppelin

Apache spark-melbourne-april-2015-meetup

Meetup#4, Apache Spark as SQL Engine

Amsterdam Beacon Meetup II - Stampions

Apache Apex Meetup at Cask

Apache Lens at Hadoop meetup

Moneyou at Holland Fintech October Meetup Amsterdam

Apache Geode Meetup, London

London Apache Kafka Meetup (Jan 2017)

MEETUP - Unboxing Apache Cassandra 3.10

Introduction to Apache Drill - NYC Apache Drill Meetup

Mondrian update (Pentaho community meetup 2012, Amsterdam)

Jung at Holand Fintech July Meetup Amsterdam

Openstack meetup amsterdam (1)

FemTechGlobal at Holland Fintech October Meetup Amsterdam