Boulder/Denver BigData: Cluster Computing with Apache Mesos and Cascading
Spark / Mesos Cluster Optimization
-
Upload
ebiznext -
Category
Data & Analytics
-
view
3.958 -
download
0
Transcript of Spark / Mesos Cluster Optimization
HAYSSAM SALEHSpark/Mesos ClusterOptimizationParisSparkMeetup April28th 2016@Criteo
Grappe
Clic
Nav
AT
DataLake
Avis
Evts.
Supports
Comptes
Spark Job
DataMart
Contribution
VisiteurSessionRechercheBandeau
[LR][FD]
[Clic]
ProjectContext
3
InteractiveDiscovery
Elastcsearch DataMart(KPI)
InteractiveDiscovery
Elastcsearch DataMart(KPI)
OBJECTIVES
� 100Gbofdata/day=>50millionrequests� 40To/year
� Howwe turned from a4hours jobon:¢ 6nodes¢ 8cores &32Gbpernode
¢ Toa20minutesJobon:¢ 4nodes,8cores &8Gbpernode
SUMMARY
¢ Sparkconcepts¢ SparkUIoffline¢ ApplicationOptimization
� Shuffling� Partitioning� Closures
¢ ParametersOptimization� SparkShuffling� Mesos applicationdistribution
¢ Elasticsearch Optimization� Google“elasticsearch performancetuning”->blog.ebiznext.com
SPARK :LES CONCEPTS¢ Application
� Mainapplication¢ Job
� RoundtripDriver->Cluster¢ Stage
� ShuffleBoundary¢ Task
� ThreadworkingonasingleRDDpartition¢ Partition
� RDDaresplit intopartitions� Partitionisunitofworkforeachtask
¢ Executor� Systemprocess
Application
Job
Stage
Task
1
n
n
n
1
1
Driver code
Spark Action
Shuffle Boundary
One task / partition
RDD
Partition
SPARK UIOFFLINE
Spark-env.sh
Onthedriver
APPLICATION
JOBS
STAGES
STAGES
TASKS
SHUFFLING APPLICATION OPTIMIZATION
WHY OPTIMIZE SHUFFLING
DistancebetweenData&CPU Duaration (scaled)
CacheL1 1secondeRAM 3minutesNode tonode communication 3jours
TRANSFORMATIONS LEADING TO SHUFFLING
¢ repartition
¢ cogroup
¢ ...join
¢ ...ByKey
¢ distinct
SHUFFLING OPTIMIZATION 1/2(k1,v1,w1)(k2,v2,w2)(k3,v3,w3)
(k1,v4w4)(k1,v5,w5)(k3,v6,w6)
(k2,v7,w7)(k2v8,w8)(k3,v9,w9)
(k1,v1)(k2,v2)(k3,v3)
(k1,v4)(k1,v5)(k3,v6)
(k2,v7)(k2v8)(k3,v9)
map map map
(k1,[v1,v4,v5]) (k3,[v3,v6,v9]) (k2,[v2,v7,v8])
groupByKey groupByKey groupByKey
(k1,v1+v4+v5) (k3,v3+v6+v9) (k2,v2+v7+v8])
Shuffling
Bonus:OOM
SHUFFLING OPTIMIZATION 2/2(k1,v1,w1)(k2,v2,w2)(k3,v3,w3)
(k1,v4w4)(k1,v5,w5)(k3,v6,w6)
(k2,v7,w7)(k2v8,w8)(k3,v9,w9)
(k1,v1)(k2,v2)(k3,v3)
(k1,v4)(k1,v5)(k3,v6)
(k2,v7)(k2v8)(k3,v9)
map map map
(k1,v1+v45)) (k3,v3+v6+v9) (k2,v2+v78)
reduceByKey (2/2) reduceByKey (2/2) reduceByKey (2/2)
(k1,v1)(k2,v2)(k3,v3)
(k1,v4+v5)(k3,v6)
(k2,v7+v8)(k3,v9)
reduceByKey (1/2) reduceByKey (1/2) reduceByKey (1/2)
Lessshuffling
OPTIMIZATIONS BROUGHT BY SPARK SQL
LWeaktyping
L LimitedJavaSupportL Schemainferencenotsupported
• requiresscala.Product
LES DATASETS – EXPERIMENTAL -
Scala
Java
J APIsimilartoRDD,stronglytyped
REDUCE OBJECT SIZE
CLOSURES
CLOSURES
Indirectreference.Uselocalvariablesinstead
RIGHT PARTITIONING
PARTITIONING
¢ Symptoms� Importantnumberofshortlivedtasks
¢ Solutions� Review your implementation� Define custompartitionner� Controlthenumber ofpartitionswith
¢ coalesceetrepartition
CHOOSE THE RIGHT SERIALIZATION ALGORITHM
REDUCE OBJECT SIZE
¢ Kryo Serialization� Reducespace– upto10X� Performgain– 2Xto3X
1.RequireSparktouseKryo serialization2.Optimizeclassnaming=>reduceobjectsize3.Forceallserializedclassestoregister
NotapplicabletotheDatasetAPIthatuseEncodersinstead
DATASETS PROMISE
MEMORY TUNING
SPARK MEMORYMANAGEMENT– SPARK 1.6.X
Shuffle fraction
Memory reserved to Spark (300Mb)
Memory assigned to user data1 - spark.memory.fraction
Memory assigned to spark dataspark.memory.fraction = 0.75
spark.memory.storageFraction = 0.5
16Go
11.78Go
3.92Go
SPARK MEMORYMANAGEMENT – SPARK 1.6.X
¢ StorageFraction� HostcachedRDDs� Hostbroadcastvariables� Datainthis fractionaresubject toeviction
¢ ShufflingFraction� HoldIntermediateData� Spilltodiskwhenfull� Datathisfractioncannotbeevictedbyotherthreads
Memory reserved to Spark (300Mb)
Memory assigned to user data1 - spark.memory.fraction
Memory assigned to spark dataspark.memory.fraction = 0.75
spark.memory.storageFraction = 0.5
16Go
11.78Go
3.92Go
SPARK MEMORYMANAGEMENT – SPARK 1.6.X
¢ Shuffle&Storagefractionsmayborrowmemoryfromeachotherundercertainconditions:� Theshufflingfractioncannotextendbeyonditsdefinedsizeifstoragefractionusesallitsmemory
� Storagefractioncannotevictdatafromtheshufflingfractionevenifitexpandedbeyonditsdefinedsize.
Memory reserved to Spark (300Mb)
Memory assigned to user data1 - spark.memory.fraction
Memory assigned to spark dataspark.memory.fraction = 0.75
spark.memory.storageFraction = 0.5
16Go
11.78Go
3.92Go
THE SHUFFLE MANAGERSPARK.SHUFFLE.MANAGER
HASH SHUFFLE MANAGER
Executor
Partition
Partition
Partition
Partition
Partition
Task
Task
Task
Number of tasks = spark.executor.cores / spark.task.cpus(1)
YARN & standalone only
Local Filesystemspark.local.dir
File 1
File 2
File …
…
SORT SHUFFLE MANAGERExecutor
Partition
Partition
Partition
Partition
Partition
Task
Task
Task
Number of tasks = spark.executor.cores / spark.task.cpus(1)
YARN & standalone only
Filesystem localspark.local.dir
Single fileMap AppendOnly
Sort & Spill
spark.shuffle.spill = true
id1
id2
id3
id..
sorted by reducer id
SHUFFLE MANAGER
¢ spark.shuffle.manager=hash� Performbetterwhenthenumberofmapper/reducer issmall(generationofM*Rfiles)
¢ spark.shuffle.manager=sort� Performbetterforanimportantnumberofmapper/reducer
¢ Bestofbothworld� spark.shuffle.sort.bypassMergeThreshold
¢ spark.shuffle.manager=tungsten-sort???� Similartosortwiththefollowingbenefits:
¢ Off-Heapallocation¢ Workdirectlyonbinaryobjectserialization(muchsmallerthanJVMobjects)– akamemcopy J¢ Use8bytes/recordpour lessort=>takeadvantageonL*cache
SPARK ON MESOS
COARSE GRAINED MODE
J Staticresourceallocation
Spark Driver
CoarseMesosSchedulerBackendMesos Master
Mesos WorkerCoarseGrainedExecutorBackend
Mesos WorkerCoarseGrainedExecutorBackend
Mesos WorkerCoarseGrainedExecutorBackend
Mesos WorkerCoarseGrainedExecutorBackend
Executor
Task Task
Task Task
Task
Task
…
COARSE GRAINED MODE
L Onlyoneexecutor/node
L Nocontroloverthenumberofexecutors 8 cores
16g8 cores
16g
8 cores16g
8 cores16g
16 cores16g
16 cores16g
32 cores / 64Go 32 cores / 32Go
FINE GRAINED MODE
L Onlyoneexecutor/node
L Noguarantyofresourceavailability
L Nocontroloverthenumberofexecutors
Spark Driver
MesosSchedulerBackendMesos Master
Mesos WorkerCoarseGrainedExecutorBackend
Mesos WorkerCoarseGrainedExecutorBackend
Mesos WorkerCoarseGrainedExecutorBackend
Mesos WorkerMesosExecutorBackend
Executor
Task
…
Executor
Task
Executor
Task
Executor
Task
spark.mesos.mesosExecutor.cores (1)
SOLUTION 1:USE MESOS ROLES
¢ Limitthenumberofcores/nodeoneachmesos worker
LMustbe configured foreach Spark application
• Assign aMesos role toyour Spark job8 cores
16g8 cores
16g
8 cores16g
8 cores16g
SOLUTION 2:DYNAMIC ALLOCATION (COARSED GRAINED MODE ONLY)
¢ Principles� Dynamicallyadd/removeSparkexecutors
¢ Requiresadedicatedprocesstomoveshuffleddata(spawnedoneachMesosworkerthroughMarathon)
QUESTIONS ?