Post on 16-Apr-2017
1© Cloudera, Inc. All rights reserved.
Why your Spark Job is FailingKostas Sakellis
2© Cloudera, Inc. All rights reserved.
Me
• Software Engineering at Cloudera•Contributor to Apache Spark•Before that, worked on Cloudera Manager
3© Cloudera, Inc. All rights reserved.
com.esotericsoftware.kryo.KryoException: Unable to find class: $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$4$$anonfun$apply$3
4© Cloudera, Inc. All rights reserved.
We go about our day ignoring manholes until…
Courtesy of: http://www.independent.co.uk/incoming/article9127706.ece/binary/original/maholev23.jpg
5© Cloudera, Inc. All rights reserved.
… something goes wrong.
Courtesy of: http://greenpointers.com/wp-content/uploads/2015/03/Manhole-Explosion1.jpg
6© Cloudera, Inc. All rights reserved.
org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 0.0 failed 4 times, most recent failure: Lost task 1.3 in stage 0.0 (TID 6, kostas-4.vpc.cloudera.com): java.lang.NumberFormatException: For input string: "3.9166,10.2491,-4.0926,-4.4659,0"
at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:1250)
at java.lang.Double.parseDouble(Double.java:540)at scala.collection.immutable.StringLike[...]
Driver stacktrace:at
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1203)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1192)
[...]
7© Cloudera, Inc. All rights reserved.
org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 0.0 failed 4 times, most recent failure: Lost task 1.3 in stage 0.0 (TID 6, kostas-4.vpc.cloudera.com): java.lang.NumberFormatException: For input string: "3.9166,10.2491,-4.0926,-4.4659,0"
at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:1250)
at java.lang.Double.parseDouble(Double.java:540)at scala.collection.immutable.StringLike[...]
Driver stacktrace:at
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1203)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1192)
[...]
8© Cloudera, Inc. All rights reserved.
Job? What now?
Courtesy of:http://calvert.lib.md.us/jobs_pic.jpg
9© Cloudera, Inc. All rights reserved.
Examplesc.textFile(“hdfs://…”, 4) .map((x) => x.toInt) .filter(_ > 10) .sum()
10© Cloudera, Inc. All rights reserved.
Examplesc.textFile(“hdfs://…”, 4) .map((x) => x.toInt) .filter(_ > 10) .sum()
11© Cloudera, Inc. All rights reserved.
Examplesc.textFile(“hdfs://…”, 4) .map((x) => x.toInt) .filter(_ > 10) .sum()
12© Cloudera, Inc. All rights reserved.
Then what the heck is a stage?
Courtesy of: https://writinginadeadworld.files.wordpress.com/2014/03/rock1.jpeg
13© Cloudera, Inc. All rights reserved.
Partitionssc.textFile(“hdfs://…”, 4) .map((x) => x.toInt) .filter(_ > 10) .sum()
HDFS
Partition 1
Partition 2
Partition 3
Partition 4
14© Cloudera, Inc. All rights reserved.
RDDssc.textFile(“hdfs://…”, 4) .map((x) => x.toInt) .filter(_ > 10) .sum()
…RDD1
HDFS
Partition 1
Partition 2
Partition 3
Partition 4
15© Cloudera, Inc. All rights reserved.
RDDssc.textFile(“hdfs://…”, 4) .map((x) => x.toInt) .filter(_ > 10) .sum()
…RDD1 …RDD2
HDFS
Partition 1
Partition 2
Partition 3
Partition 4
Partition 1
Partition 2
Partition 3
Partition 4
16© Cloudera, Inc. All rights reserved.
RDDssc.textFile(“hdfs://…”, 4) .map((x) => x.toInt) .filter(_ > 10) .sum()
…RDD1 …RDD2
HDFS
Partition 1
Partition 2
Partition 3
Partition 4
Partition 1
Partition 2
Partition 3
Partition 4
…RDD3
Partition 1
Partition 2
Partition 3
Partition 4
17© Cloudera, Inc. All rights reserved.
…RDD1 …RDD2
RDDs
HDFS
Partition 1
Partition 2
Partition 3
Partition 4
sc.textFile(“hdfs://…”, 4) .map((x) => x.toInt) .filter(_ > 10) .sum()
Partition 1
Partition 2
Partition 3
Partition 4
…RDD3
Partition 1
Partition 2
Partition 3
Partition 4
Sum
18© Cloudera, Inc. All rights reserved.
…RDD1 …RDD2
RDD Lineage
HDFS
Partition 1
Partition 2
Partition 3
Partition 4
sc.textFile(“hdfs://…”, 4) .map((x) => x.toInt) .filter(_ > 10) .sum()
Partition 1
Partition 2
Partition 3
Partition 4
…RDD3
Partition 1
Partition 2
Partition 3
Partition 4
Sum
Lineage
19© Cloudera, Inc. All rights reserved.
RDD Dependencies
…RDD1 …RDD2
HDFS
Partition 1
Partition 2
Partition 3
Partition 4
Partition 1
Partition 2
Partition 3
Partition 4
…RDD3
Partition 1
Partition 2
Partition 3
Partition 4
Sum
Narrow Dependencies
•Narrow and Wide Dependencies
20© Cloudera, Inc. All rights reserved.
Wide Dependencies
• Sometimes records need to be grouped together• Examples• join•groupByKey
• Stages created at wide dependency boundaries
21© Cloudera, Inc. All rights reserved.
A more Interesting Spark Job
val rdd1 = sc.textFile(“hdfs://...”) .map(someFunc) .filter(filterFunc)
val rdd2 = sc.hadoopFile(“hdfs://...”) .groupByKey() .map(someOtherFunc)
val rdd3 = rdd1.join(rdd2) .map(someFunc)
rdd3.collect()
22© Cloudera, Inc. All rights reserved.
A more Interesting Spark Job
val rdd1 = sc.textFile(“hdfs://...”) .map(someFunc) .filter(filterFunc)
maptextFile filter
23© Cloudera, Inc. All rights reserved.
A more Interesting Spark Job
val rdd2 = sc.hadoopFile(“hdfs://...”) .groupByKey() .map(someOtherFunc)
groupByKeyhadoopFile map
24© Cloudera, Inc. All rights reserved.
A more Interesting Spark Job
val rdd3 = rdd1.join(rdd2) .map(someFunc)
join map
25© Cloudera, Inc. All rights reserved.
A more Interesting Spark Job
rdd3.collect()
maptextFile filter
groupByKey
hadoopFile map
join map
1
Wide Dependencies
1
2 3
4
26© Cloudera, Inc. All rights reserved.
Get to the point before I stop caring!
27© Cloudera, Inc. All rights reserved.
org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 0.0 failed 4 times, most recent failure: Lost task 1.3 in stage 0.0 (TID 6, kostas-4.vpc.cloudera.com): java.lang.NumberFormatException: For input string: "3.9166,10.2491,-4.0926,-4.4659,0"
at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:1250)
at java.lang.Double.parseDouble(Double.java:540)at scala.collection.immutable.StringLike[...]
Driver stacktrace:at
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1203)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1192)
[...]
28© Cloudera, Inc. All rights reserved.
What was the failure?
org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 0.0 failed 4 times, most recent failure: Lost task 1.3 in stage 0.0 (TID 6, kostas-4.vpc.cloudera.com): java.lang.NumberFormatException: For input string: "3.9166,10.2491,-4.0926,-4.4659,0” [...]
29© Cloudera, Inc. All rights reserved.
What was the failure?
StageTask Task
Task Task
30© Cloudera, Inc. All rights reserved.
What was the failure?
StageTask Task
Task Task
31© Cloudera, Inc. All rights reserved.
What was the failure?
StageTask Task
Task Task
spark.task.maxFailures=4
32© Cloudera, Inc. All rights reserved.
org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 0.0 failed 4 times, most recent failure: Lost task 1.3 in stage 0.0 (TID 6, kostas-4.vpc.cloudera.com): java.lang.NumberFormatException: For input string: "3.9166,10.2491,-4.0926,-4.4659,0"
at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:1250)
at java.lang.Double.parseDouble(Double.java:540)at scala.collection.immutable.StringLike[...]
Driver stacktrace:at
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1203)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1192)
[...]
33© Cloudera, Inc. All rights reserved.
org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 0.0 failed 4 times, most recent failure: Lost task 1.3 in stage 0.0 (TID 6, kostas-4.vpc.cloudera.com): java.lang.NumberFormatException: For input string: "3.9166,10.2491,-4.0926,-4.4659,0"
at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:1250)
at java.lang.Double.parseDouble(Double.java:540)at scala.collection.immutable.StringLike[...]
Driver stacktrace:at
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1203)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1192)
[...]
34© Cloudera, Inc. All rights reserved.
ERROR executor.Executor: Exception in task ID 2866 java.io.IOException: Filesystem closed at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:565) at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:648)
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:706) at java.io.DataInputStream.read(DataInputStream.java:100) at org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:209) at org.apache.hadoop.util.LineReader.readLine(LineReader.java:173) at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:206) at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:45) at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:164) [...]
35© Cloudera, Inc. All rights reserved.
Spark Architecture
36© Cloudera, Inc. All rights reserved.
YARN Architecture
Resource Manager
Node Manager
Container Container
Node Manager
Container Container
Application Master
Client
Process Process
37© Cloudera, Inc. All rights reserved.
Spark on YARN Architecture
Resource Manager
Node Manager
Container Container
Node Manager
Container ContainerClient
Process Process
38© Cloudera, Inc. All rights reserved.
Spark on YARN Architecture
Resource Manager
Node Manager
Container Container
Node Manager
Container Container
Application Master
Client
Process Process
39© Cloudera, Inc. All rights reserved.
spark-submit --executor-memory 2g
--master yarn-client
--num-executors 2
--num-cores 2
40© Cloudera, Inc. All rights reserved.
Container [pid=63375,containerID=container_1388158490598_0001_01_000003] is running beyond physical memory limits. Current usage: 2.2 GB of 2.1 GB physical memory used; 2.8 GB of 4.2 GB virtual memory used. Killing container. [...]
41© Cloudera, Inc. All rights reserved.
Container [pid=63375,containerID=container_1388158490598_0001_01_000003] is running beyond physical memory limits. Current usage: 2.2 GB of 2.1 GB physical memory used; 2.8 GB of 4.2 GB virtual memory used. Killing container. [...]
42© Cloudera, Inc. All rights reserved.
spark-submit --executor-memory 2g
--master yarn-client
--num-executors 2
--num-cores 2
43© Cloudera, Inc. All rights reserved.
yarn.nodemanager.resource.memory-mb
Executor Container
spark.yarn.executor.memoryOverhead (7%) (10% in 1.4)
spark.executor.memory
spark.shuffle.memoryFraction (0.4) spark.storage.memoryFraction (0.6)
Memory allocation
44© Cloudera, Inc. All rights reserved.
Sometimes jobs run slow or even…
Courtesy of: http://blog.sdrock.com/pastors/files/2013/06/time-clock.jpg
45© Cloudera, Inc. All rights reserved.
java.lang.OutOfMemoryError: GC overhead limit exceeded at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1986) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) [...]
46© Cloudera, Inc. All rights reserved.
GC Stalls
47© Cloudera, Inc. All rights reserved.
Too much spilling!
Courtesy of: http://tgnp.me/wp-content/uploads/2014/05/spilled-starbucks.jpg
48© Cloudera, Inc. All rights reserved.
Shuffle Boundaries
maptextFile filter
groupByKey
hadoopFile map
join map
Shuffle
49© Cloudera, Inc. All rights reserved.
Most performance issues are in shuffles!
50© Cloudera, Inc. All rights reserved.
Inside a Task: Fetch & Aggregate
ExternalAppendOnlyMapBlock
Block
deserialize
deserialize
key1 -> valueskey2 -> valueskey3 -> valueskey4 -> values
Sort & Spill
key1 -> valueskey2 -> valueskey3 -> values
51© Cloudera, Inc. All rights reserved.
rdd.reduceByKey(reduceFunc, numPartitions=1000)
Inside a Task: Specify partitions
52© Cloudera, Inc. All rights reserved.
Why not set partitions to ∞ ?
53© Cloudera, Inc. All rights reserved.
Excessive parallelism
•Overwhelming scheduler overhead•More fetches -> more disk seeks•Driver needs to track state per-task
54© Cloudera, Inc. All rights reserved.
So how to choose?
• Easy answer:•Keep multiplying by 1.5 and see what works
55© Cloudera, Inc. All rights reserved.
Is Spark bad?
Courtesy of: https://theferkel.files.wordpress.com/2015/04/250474-breaking-bad-quotes.jpg
56© Cloudera, Inc. All rights reserved.
Thank you