Apache Spark
● What is it ?
● How does it work ?
● Benefits
● Tuning
● Examples
www.xoomtrainings.com [email protected]
Spark – What is it ?
● Open Source
● Alternative to Map Reduce for certain applications
● A low latency cluster computing system
● For very large data sets
● May be 100 times faster than Map Reduce for– Iterative algorithms
– Interactive data mining
● Used with Hadoop / HDFS
● Released under BSD License
www.xoomtrainings.com [email protected]
Spark – How does it work ?
● Uses in memory cluster computing
● Memory access faster than disk access
● Has API's written in– Scala
– Java
– Python
● Can be accessed from Scala and Python shells
● Currently an Apache incubator project
www.xoomtrainings.com [email protected]
Spark – Benefits
● Scales to very large clusters
● Uses in memory processing for increased speed
● High Level API's– Java, Scala, Python
● Low latency shell access
www.xoomtrainings.com [email protected]
Spark – Tuning
● Bottlenecks can occur in the cluster via– CPU, memory or network bandwidth
● Tune data serialization method i.e.– Java ObjectOutputStream vs Kryo
● Memory Tuning– Use primitive types
– Set JVM Flags
– Store objects in serialized form i.e.
● RDD Persistence
● MEMORY_ONLY_SER
www.xoomtrainings.com [email protected]
Spark – Examples
• Example from spark-project.org, Spark job in Scala.
• Showing a simple text count from a system log.
• /*** SimpleJob.scala ***/
• import spark.SparkContext
• import SparkContext._
• object SimpleJob {
• def main(args: Array[String]) {
• val logFile = "/var/log/syslog" // Should be some file on your system
• val sc = new SparkContext("local", "Simple Job", "$YOUR_SPARK_HOME",
• List("target/scala-2.9.3/simple-project_2.9.3-1.0.jar"))
• val logData = sc.textFile(logFile, 2).cache()
• val numAs = logData.filter(line => line.contains("a")).count()
• val numBs = logData.filter(line => line.contains("b")).count()
• println("Lines with a: %s, Lines with b: %s".format(numAs, numBs))
• }
• }
www.xoomtrainings.com [email protected]
Contact Us
● Feel free to contact us at
– www.xoomtrainings.com
-- USA : +1-610-686-8077 or India : +91-404-018-3355
● We offer IT project consultancy
● We are happy to hear about your problems
● You can just pay for those hours that you need
● To solve your problems
Top Related