@maasg - BigData.bebigdata.be/wp-content/uploads/2014/07/Spark-at-Virdata.pdfSpark @ Virdata -...
Transcript of @maasg - BigData.bebigdata.be/wp-content/uploads/2014/07/Spark-at-Virdata.pdfSpark @ Virdata -...
![Page 1: @maasg - BigData.bebigdata.be/wp-content/uploads/2014/07/Spark-at-Virdata.pdfSpark @ Virdata - BigData.be meetup 09/July/ at BigDataBe Meetup, July 09, 2014 Gerard Maas - Data Processing](https://reader034.fdocuments.us/reader034/viewer/2022052002/60155a46f70bab1e12335ce8/html5/thumbnails/1.jpg)
Spark @ Virdata - BigData.be meetup 09/July/
at
BigDataBe Meetup, July 09, 2014Gerard Maas - Data Processing Team Lead
[email protected] | @maasg
![Page 2: @maasg - BigData.bebigdata.be/wp-content/uploads/2014/07/Spark-at-Virdata.pdfSpark @ Virdata - BigData.be meetup 09/July/ at BigDataBe Meetup, July 09, 2014 Gerard Maas - Data Processing](https://reader034.fdocuments.us/reader034/viewer/2022052002/60155a46f70bab1e12335ce8/html5/thumbnails/2.jpg)
Spark @ Virdata - BigData.be meetup 09/July/
@bout me
@maasg
![Page 3: @maasg - BigData.bebigdata.be/wp-content/uploads/2014/07/Spark-at-Virdata.pdfSpark @ Virdata - BigData.be meetup 09/July/ at BigDataBe Meetup, July 09, 2014 Gerard Maas - Data Processing](https://reader034.fdocuments.us/reader034/viewer/2022052002/60155a46f70bab1e12335ce8/html5/thumbnails/3.jpg)
Spark @ Virdata - BigData.be meetup 09/July/
Virdata: A cloud platform for the Internet of Things
![Page 4: @maasg - BigData.bebigdata.be/wp-content/uploads/2014/07/Spark-at-Virdata.pdfSpark @ Virdata - BigData.be meetup 09/July/ at BigDataBe Meetup, July 09, 2014 Gerard Maas - Data Processing](https://reader034.fdocuments.us/reader034/viewer/2022052002/60155a46f70bab1e12335ce8/html5/thumbnails/4.jpg)
Big Data Developers - Virdata, Internet of Things #virdata
Virdata - 2 COMPONENTS: A CLOUD & A LIBRARY
★ Elastic and Scalable cutting edge technologies★ API’s for different types of information/data consumption★ Cloud agnostic thru self build monitoring tools★ Running on both public & private cloud infrastructure★ Bi-directional messaging★ High performance brokers architecture
★ Lightweight and portable library★ Multiple programming languages★ Supports multiple transport protocols★ Available for all HW and OS★ Supports any type of data in any format/syntax★ Payload is compressed and encrypted
![Page 5: @maasg - BigData.bebigdata.be/wp-content/uploads/2014/07/Spark-at-Virdata.pdfSpark @ Virdata - BigData.be meetup 09/July/ at BigDataBe Meetup, July 09, 2014 Gerard Maas - Data Processing](https://reader034.fdocuments.us/reader034/viewer/2022052002/60155a46f70bab1e12335ce8/html5/thumbnails/5.jpg)
Spark @ Virdata - BigData.be meetup 09/July/
Scala @ Virdata
![Page 6: @maasg - BigData.bebigdata.be/wp-content/uploads/2014/07/Spark-at-Virdata.pdfSpark @ Virdata - BigData.be meetup 09/July/ at BigDataBe Meetup, July 09, 2014 Gerard Maas - Data Processing](https://reader034.fdocuments.us/reader034/viewer/2022052002/60155a46f70bab1e12335ce8/html5/thumbnails/6.jpg)
Spark @ Virdata - BigData.be meetup 09/July/
![Page 7: @maasg - BigData.bebigdata.be/wp-content/uploads/2014/07/Spark-at-Virdata.pdfSpark @ Virdata - BigData.be meetup 09/July/ at BigDataBe Meetup, July 09, 2014 Gerard Maas - Data Processing](https://reader034.fdocuments.us/reader034/viewer/2022052002/60155a46f70bab1e12335ce8/html5/thumbnails/7.jpg)
Spark @ Virdata - BigData.be meetup 09/July/ DataBricks Keynote - Spark Summit 2014
![Page 8: @maasg - BigData.bebigdata.be/wp-content/uploads/2014/07/Spark-at-Virdata.pdfSpark @ Virdata - BigData.be meetup 09/July/ at BigDataBe Meetup, July 09, 2014 Gerard Maas - Data Processing](https://reader034.fdocuments.us/reader034/viewer/2022052002/60155a46f70bab1e12335ce8/html5/thumbnails/8.jpg)
Spark @ Virdata - BigData.be meetup 09/July/
Batch Streaming
HDFS Cassandra
![Page 9: @maasg - BigData.bebigdata.be/wp-content/uploads/2014/07/Spark-at-Virdata.pdfSpark @ Virdata - BigData.be meetup 09/July/ at BigDataBe Meetup, July 09, 2014 Gerard Maas - Data Processing](https://reader034.fdocuments.us/reader034/viewer/2022052002/60155a46f70bab1e12335ce8/html5/thumbnails/9.jpg)
Spark @ Virdata - BigData.be meetup 09/July/
Batch Streaming
HDFS Cassandra
![Page 10: @maasg - BigData.bebigdata.be/wp-content/uploads/2014/07/Spark-at-Virdata.pdfSpark @ Virdata - BigData.be meetup 09/July/ at BigDataBe Meetup, July 09, 2014 Gerard Maas - Data Processing](https://reader034.fdocuments.us/reader034/viewer/2022052002/60155a46f70bab1e12335ce8/html5/thumbnails/10.jpg)
Spark @ Virdata - BigData.be meetup 09/July/
Spark: RDD Transformation
SAVEjoin
MAPFLATMAPGROUPFILTER...
INPUT DATA
HDFSTEXT/
Sequence File
RDD
RDD
.textFile
RDD RDD
OUTPUT
HDFSTEXT/
Sequence File
Cassandra
![Page 11: @maasg - BigData.bebigdata.be/wp-content/uploads/2014/07/Spark-at-Virdata.pdfSpark @ Virdata - BigData.be meetup 09/July/ at BigDataBe Meetup, July 09, 2014 Gerard Maas - Data Processing](https://reader034.fdocuments.us/reader034/viewer/2022052002/60155a46f70bab1e12335ce8/html5/thumbnails/11.jpg)
Spark @ Virdata - BigData.be meetup 09/July/
DStream
Spark: RDD Transformation
RDD
OUTPUT
Cassandra
Web Sockets
...
INPUT STREAM
Kafka
RDD RDD
DStream
RDD RDD RDD
GROUPFILTER ...
MAPFLATMAP...
![Page 12: @maasg - BigData.bebigdata.be/wp-content/uploads/2014/07/Spark-at-Virdata.pdfSpark @ Virdata - BigData.be meetup 09/July/ at BigDataBe Meetup, July 09, 2014 Gerard Maas - Data Processing](https://reader034.fdocuments.us/reader034/viewer/2022052002/60155a46f70bab1e12335ce8/html5/thumbnails/12.jpg)
Spark @ Virdata - BigData.be meetup 09/July/
Batch Streaming
HDFS Cassandra
![Page 13: @maasg - BigData.bebigdata.be/wp-content/uploads/2014/07/Spark-at-Virdata.pdfSpark @ Virdata - BigData.be meetup 09/July/ at BigDataBe Meetup, July 09, 2014 Gerard Maas - Data Processing](https://reader034.fdocuments.us/reader034/viewer/2022052002/60155a46f70bab1e12335ce8/html5/thumbnails/13.jpg)
Spark @ Virdata - BigData.be meetup 09/July/
HDFS
Worker
Worker
Worker
![Page 14: @maasg - BigData.bebigdata.be/wp-content/uploads/2014/07/Spark-at-Virdata.pdfSpark @ Virdata - BigData.be meetup 09/July/ at BigDataBe Meetup, July 09, 2014 Gerard Maas - Data Processing](https://reader034.fdocuments.us/reader034/viewer/2022052002/60155a46f70bab1e12335ce8/html5/thumbnails/14.jpg)
Spark @ Virdata - BigData.be meetup 09/July/
HDFS
Worker
Worker
Worker
![Page 15: @maasg - BigData.bebigdata.be/wp-content/uploads/2014/07/Spark-at-Virdata.pdfSpark @ Virdata - BigData.be meetup 09/July/ at BigDataBe Meetup, July 09, 2014 Gerard Maas - Data Processing](https://reader034.fdocuments.us/reader034/viewer/2022052002/60155a46f70bab1e12335ce8/html5/thumbnails/15.jpg)
Spark @ Virdata - BigData.be meetup 09/July/
Memory CPU’s(and don’t forget to throw some disks in the mix)
Network
![Page 16: @maasg - BigData.bebigdata.be/wp-content/uploads/2014/07/Spark-at-Virdata.pdfSpark @ Virdata - BigData.be meetup 09/July/ at BigDataBe Meetup, July 09, 2014 Gerard Maas - Data Processing](https://reader034.fdocuments.us/reader034/viewer/2022052002/60155a46f70bab1e12335ce8/html5/thumbnails/16.jpg)
Spark @ Virdata - BigData.be meetup 09/July/
Spark Deployment Options
M
Local Standalone Cluster
WW W
Using a ClusterManager
WW
spark.master=local[8] spark.master=spark://host:port spark.master=mesos://host:port
MM
D
W
D
![Page 17: @maasg - BigData.bebigdata.be/wp-content/uploads/2014/07/Spark-at-Virdata.pdfSpark @ Virdata - BigData.be meetup 09/July/ at BigDataBe Meetup, July 09, 2014 Gerard Maas - Data Processing](https://reader034.fdocuments.us/reader034/viewer/2022052002/60155a46f70bab1e12335ce8/html5/thumbnails/17.jpg)
Spark @ Virdata - BigData.be meetup 09/July/
Apache Mesos
“Apache Mesos is a cluster manager that simplifies the complexity of running applications on a shared pool of servers.”
http://mesos.apache.org/
![Page 18: @maasg - BigData.bebigdata.be/wp-content/uploads/2014/07/Spark-at-Virdata.pdfSpark @ Virdata - BigData.be meetup 09/July/ at BigDataBe Meetup, July 09, 2014 Gerard Maas - Data Processing](https://reader034.fdocuments.us/reader034/viewer/2022052002/60155a46f70bab1e12335ce8/html5/thumbnails/18.jpg)
Spark @ Virdata - BigData.be meetup 09/July/
Why Mesos ?
Think in terms of Resources, not Machines
![Page 19: @maasg - BigData.bebigdata.be/wp-content/uploads/2014/07/Spark-at-Virdata.pdfSpark @ Virdata - BigData.be meetup 09/July/ at BigDataBe Meetup, July 09, 2014 Gerard Maas - Data Processing](https://reader034.fdocuments.us/reader034/viewer/2022052002/60155a46f70bab1e12335ce8/html5/thumbnails/19.jpg)
Spark @ Virdata - BigData.be meetup 09/July/
How Mesos Works
M
DO O
O
![Page 20: @maasg - BigData.bebigdata.be/wp-content/uploads/2014/07/Spark-at-Virdata.pdfSpark @ Virdata - BigData.be meetup 09/July/ at BigDataBe Meetup, July 09, 2014 Gerard Maas - Data Processing](https://reader034.fdocuments.us/reader034/viewer/2022052002/60155a46f70bab1e12335ce8/html5/thumbnails/20.jpg)
Spark @ Virdata - BigData.be meetup 09/July/
How Mesos Works
M
DO O
O
Frameworks- Scheduler- Executor
![Page 21: @maasg - BigData.bebigdata.be/wp-content/uploads/2014/07/Spark-at-Virdata.pdfSpark @ Virdata - BigData.be meetup 09/July/ at BigDataBe Meetup, July 09, 2014 Gerard Maas - Data Processing](https://reader034.fdocuments.us/reader034/viewer/2022052002/60155a46f70bab1e12335ce8/html5/thumbnails/21.jpg)
Spark @ Virdata - BigData.be meetup 09/July/
How Mesos Works
M
DO O
O
Master
M
ZK
![Page 22: @maasg - BigData.bebigdata.be/wp-content/uploads/2014/07/Spark-at-Virdata.pdfSpark @ Virdata - BigData.be meetup 09/July/ at BigDataBe Meetup, July 09, 2014 Gerard Maas - Data Processing](https://reader034.fdocuments.us/reader034/viewer/2022052002/60155a46f70bab1e12335ce8/html5/thumbnails/22.jpg)
Spark @ Virdata - BigData.be meetup 09/July/
How Mesos Works
M
DO O
O
M
M
ZKSlaves- run tasks
![Page 23: @maasg - BigData.bebigdata.be/wp-content/uploads/2014/07/Spark-at-Virdata.pdfSpark @ Virdata - BigData.be meetup 09/July/ at BigDataBe Meetup, July 09, 2014 Gerard Maas - Data Processing](https://reader034.fdocuments.us/reader034/viewer/2022052002/60155a46f70bab1e12335ce8/html5/thumbnails/23.jpg)
Spark @ Virdata - BigData.be meetup 09/July/
How Mesos Works - resource offers
M
DO O
O
M
M
ZK
H1, 4CPU,2GB
![Page 24: @maasg - BigData.bebigdata.be/wp-content/uploads/2014/07/Spark-at-Virdata.pdfSpark @ Virdata - BigData.be meetup 09/July/ at BigDataBe Meetup, July 09, 2014 Gerard Maas - Data Processing](https://reader034.fdocuments.us/reader034/viewer/2022052002/60155a46f70bab1e12335ce8/html5/thumbnails/24.jpg)
Spark @ Virdata - BigData.be meetup 09/July/
How Mesos Works - resource offers
M
DO O
O
M
M
ZK
H1, 4CPU,2GB
2C, 2G
2C, 4G
![Page 25: @maasg - BigData.bebigdata.be/wp-content/uploads/2014/07/Spark-at-Virdata.pdfSpark @ Virdata - BigData.be meetup 09/July/ at BigDataBe Meetup, July 09, 2014 Gerard Maas - Data Processing](https://reader034.fdocuments.us/reader034/viewer/2022052002/60155a46f70bab1e12335ce8/html5/thumbnails/25.jpg)
Spark @ Virdata - BigData.be meetup 09/July/
How Mesos Works - resource offers
M
DO O
O
M
M
ZK
H1, 2CPU,2GB
2C, 2G
2C, 4G
![Page 26: @maasg - BigData.bebigdata.be/wp-content/uploads/2014/07/Spark-at-Virdata.pdfSpark @ Virdata - BigData.be meetup 09/July/ at BigDataBe Meetup, July 09, 2014 Gerard Maas - Data Processing](https://reader034.fdocuments.us/reader034/viewer/2022052002/60155a46f70bab1e12335ce8/html5/thumbnails/26.jpg)
Spark @ Virdata - BigData.be meetup 09/July/
How Mesos Works - resource offers
M
DO O
O
M
M
ZK
2C, 4G
![Page 27: @maasg - BigData.bebigdata.be/wp-content/uploads/2014/07/Spark-at-Virdata.pdfSpark @ Virdata - BigData.be meetup 09/July/ at BigDataBe Meetup, July 09, 2014 Gerard Maas - Data Processing](https://reader034.fdocuments.us/reader034/viewer/2022052002/60155a46f70bab1e12335ce8/html5/thumbnails/27.jpg)
Spark @ Virdata - BigData.be meetup 09/July/
The Mesos Paper… Where Spark Started
https://www.usenix.org/legacy/event/nsdi11/tech/full_papers/Hindman.pdf
![Page 28: @maasg - BigData.bebigdata.be/wp-content/uploads/2014/07/Spark-at-Virdata.pdfSpark @ Virdata - BigData.be meetup 09/July/ at BigDataBe Meetup, July 09, 2014 Gerard Maas - Data Processing](https://reader034.fdocuments.us/reader034/viewer/2022052002/60155a46f70bab1e12335ce8/html5/thumbnails/28.jpg)
Spark @ Virdata - BigData.be meetup 09/July/
Marathon“Keep your services
running”
![Page 29: @maasg - BigData.bebigdata.be/wp-content/uploads/2014/07/Spark-at-Virdata.pdfSpark @ Virdata - BigData.be meetup 09/July/ at BigDataBe Meetup, July 09, 2014 Gerard Maas - Data Processing](https://reader034.fdocuments.us/reader034/viewer/2022052002/60155a46f70bab1e12335ce8/html5/thumbnails/29.jpg)
Spark @ Virdata - BigData.be meetup 09/July/
Marathon
MD
O O
https://github.com/mesosphere/marathon
![Page 30: @maasg - BigData.bebigdata.be/wp-content/uploads/2014/07/Spark-at-Virdata.pdfSpark @ Virdata - BigData.be meetup 09/July/ at BigDataBe Meetup, July 09, 2014 Gerard Maas - Data Processing](https://reader034.fdocuments.us/reader034/viewer/2022052002/60155a46f70bab1e12335ce8/html5/thumbnails/30.jpg)
Spark @ Virdata - BigData.be meetup 09/July/
Marathon
MD
O O
Mar
atho
n
https://github.com/mesosphere/marathon
![Page 31: @maasg - BigData.bebigdata.be/wp-content/uploads/2014/07/Spark-at-Virdata.pdfSpark @ Virdata - BigData.be meetup 09/July/ at BigDataBe Meetup, July 09, 2014 Gerard Maas - Data Processing](https://reader034.fdocuments.us/reader034/viewer/2022052002/60155a46f70bab1e12335ce8/html5/thumbnails/31.jpg)
Spark @ Virdata - BigData.be meetup 09/July/
Marathon
M
D
O O
Mar
atho
n
https://github.com/mesosphere/marathon
![Page 32: @maasg - BigData.bebigdata.be/wp-content/uploads/2014/07/Spark-at-Virdata.pdfSpark @ Virdata - BigData.be meetup 09/July/ at BigDataBe Meetup, July 09, 2014 Gerard Maas - Data Processing](https://reader034.fdocuments.us/reader034/viewer/2022052002/60155a46f70bab1e12335ce8/html5/thumbnails/32.jpg)
Spark @ Virdata - BigData.be meetup 09/July/
Spark Job Server“Spark as a Service”
![Page 33: @maasg - BigData.bebigdata.be/wp-content/uploads/2014/07/Spark-at-Virdata.pdfSpark @ Virdata - BigData.be meetup 09/July/ at BigDataBe Meetup, July 09, 2014 Gerard Maas - Data Processing](https://reader034.fdocuments.us/reader034/viewer/2022052002/60155a46f70bab1e12335ce8/html5/thumbnails/33.jpg)
Spark @ Virdata - BigData.be meetup 09/July/
Job Server
MD
val sc = new spark.
SparkContext(conf)
https://github.com/ooyala/spark-jobserver
![Page 34: @maasg - BigData.bebigdata.be/wp-content/uploads/2014/07/Spark-at-Virdata.pdfSpark @ Virdata - BigData.be meetup 09/July/ at BigDataBe Meetup, July 09, 2014 Gerard Maas - Data Processing](https://reader034.fdocuments.us/reader034/viewer/2022052002/60155a46f70bab1e12335ce8/html5/thumbnails/34.jpg)
Spark @ Virdata - BigData.be meetup 09/July/
Job Server
MJob Impl
val sc = new spark.
SparkContext(conf)
Job
Serv
er
https://github.com/ooyala/spark-jobserver
object Job extends
SparkJob {
def runJob(...): Any
def validate(...):
SparkJobValidation
}
![Page 35: @maasg - BigData.bebigdata.be/wp-content/uploads/2014/07/Spark-at-Virdata.pdfSpark @ Virdata - BigData.be meetup 09/July/ at BigDataBe Meetup, July 09, 2014 Gerard Maas - Data Processing](https://reader034.fdocuments.us/reader034/viewer/2022052002/60155a46f70bab1e12335ce8/html5/thumbnails/35.jpg)
Spark @ Virdata - BigData.be meetup 09/July/
Job Server
M
Job Impl
Job
Serv
er
https://github.com/ooyala/spark-jobserver
HTTP/jars/context/jobs
![Page 36: @maasg - BigData.bebigdata.be/wp-content/uploads/2014/07/Spark-at-Virdata.pdfSpark @ Virdata - BigData.be meetup 09/July/ at BigDataBe Meetup, July 09, 2014 Gerard Maas - Data Processing](https://reader034.fdocuments.us/reader034/viewer/2022052002/60155a46f70bab1e12335ce8/html5/thumbnails/36.jpg)
Spark @ Virdata - BigData.be meetup 09/July/
“The Datacenter as the computer”-Luis Barroso
HDFSFileSystem
MesosKernel
MarathonInit.d
![Page 37: @maasg - BigData.bebigdata.be/wp-content/uploads/2014/07/Spark-at-Virdata.pdfSpark @ Virdata - BigData.be meetup 09/July/ at BigDataBe Meetup, July 09, 2014 Gerard Maas - Data Processing](https://reader034.fdocuments.us/reader034/viewer/2022052002/60155a46f70bab1e12335ce8/html5/thumbnails/37.jpg)
Spark @ Virdata - BigData.be meetup 09/July/
What about System Monitor?
![Page 38: @maasg - BigData.bebigdata.be/wp-content/uploads/2014/07/Spark-at-Virdata.pdfSpark @ Virdata - BigData.be meetup 09/July/ at BigDataBe Meetup, July 09, 2014 Gerard Maas - Data Processing](https://reader034.fdocuments.us/reader034/viewer/2022052002/60155a46f70bab1e12335ce8/html5/thumbnails/38.jpg)
Spark @ Virdata - BigData.be meetup 09/July/
Ganglia
![Page 39: @maasg - BigData.bebigdata.be/wp-content/uploads/2014/07/Spark-at-Virdata.pdfSpark @ Virdata - BigData.be meetup 09/July/ at BigDataBe Meetup, July 09, 2014 Gerard Maas - Data Processing](https://reader034.fdocuments.us/reader034/viewer/2022052002/60155a46f70bab1e12335ce8/html5/thumbnails/39.jpg)
Spark @ Virdata - BigData.be meetup 09/July/
How we put it all together at
(Live Demo)
![Page 40: @maasg - BigData.bebigdata.be/wp-content/uploads/2014/07/Spark-at-Virdata.pdfSpark @ Virdata - BigData.be meetup 09/July/ at BigDataBe Meetup, July 09, 2014 Gerard Maas - Data Processing](https://reader034.fdocuments.us/reader034/viewer/2022052002/60155a46f70bab1e12335ce8/html5/thumbnails/40.jpg)
Spark @ Virdata - BigData.be meetup 09/July/
Questions?@virdata_iot | @maasg