Building a REST Job Server for interactive Spark as a service by Romain Rigaux and Erick Tryzelaar
-
Upload
spark-summit -
Category
Data & Analytics
-
view
6.226 -
download
0
Transcript of Building a REST Job Server for interactive Spark as a service by Romain Rigaux and Erick Tryzelaar
BUILDING A REST JOB SERVER FOR INTERACTIVE SPARK AS A SERVICERomain Rigaux - Cloudera Erick Tryzelaar - Cloudera
NOTEBOOKS
EASYACCESSFROMANYWHERE
SHARESPARKCONTEXTSANDRDDs
BUILDAPPS
SPARKMAGIC
…
WHY SPARKAS A SERVICE?
HISTORYV1: OOZIE
• Itworks
• Codesnippet
THE GOOD
• SubmitthroughOozie
• Shellac:on
• VerySlow
• Batch
THE BAD
workflow.xmlsnippet.py
stdout
HISTORYV2: SPARK IGNITER
• ItworksbeAer
THE GOOD
• CompilerJar
• Batchonly,noshell
• NoPython,R
• Security
• Singlepointoffailure
THE BAD Compile
Implement
Upload
jsonoutput
Batch
Scala
jar
Ooyala
HISTORYV3: NOTEBOOK
• Likespark-submit/sparkshells
• Scala/Python/Rshells
• Jar/PythonbatchJobs
• NotebookUI
• YARN
THE GOOD
• Beta?
THE BAD
Livy
codesnippet batch
LIVYSPARK SERVER
•RESTWebserverinScalaforSparksubmissions
• Interac:veShellSessionsorBatchJobs
•Backends:Scala,Java,Python,R
•NodependencyonHue
•OpenSource:hAps://github.com/cloudera/
hue/tree/master/apps/spark/java
•Readaboutit:hAp://gethue.com/spark/
ARCHITECTURE
• Standardwebservice:wrapperaroundspark-submit/Sparkshells• YARNmode,Sparkdriversruninsidethecluster(supportscrashes)• Noneedtoinheritanyinterfaceorcompilecode• Extendedtoworkwithadditionalbackends
LOCAL MODE
LivyServer
Scalatra
SessionManager
Session
SparkContextSpark
Client
SparkClient
SparkInterpreter
LOCAL MODE
LivyServer
Scalatra
SessionManager
Session
SparkClient
SparkClient
SparkContext
SparkInterpreter
LOCAL MODE
SparkClient
1
LivyServer
Scalatra
SessionManager
Session
SparkClient
SparkContext
SparkInterpreter
LOCAL MODE
SparkClient
1
2
LivyServer
Scalatra
SessionManager
Session
SparkClient
SparkContext
SparkInterpreter
LOCAL MODE
SparkClient
SparkInterpreter
1
2
LivyServer
Scalatra
SessionManager
Session
SparkClient
SparkContext
3
LOCAL MODE
SparkClient
1
2
LivyServer
Scalatra
SessionManager
Session
SparkClient
SparkContext
3
4 SparkInterpreter
LOCAL MODE
SparkClient
1
2
LivyServer
Scalatra
SessionManager
Session
SparkClient
SparkContext
3
4
5
SparkInterpreter
YARNMaster
SparkClient
YARNNode
SparkContext
YARNNode
SparkWorker
YARNNode
SparkWorker
LivyServer
Scalatra
SessionManager
Session
YARN-CLUSTERMODE
SparkInterpreter
LivyServer
YARNMaster
Scalatra
SparkClient
SessionManager
Session
YARNNode
SparkContext
YARNNode
SparkWorker
YARNNode
SparkWorker
1
YARN-CLUSTERMODE
SparkInterpreter
YARNMaster
SparkClient
YARNNode
SparkContext
YARNNode
SparkWorker
YARNNode
SparkWorker
1
2
LivyServer
Scalatra
SessionManager
Session
YARN-CLUSTERMODE
SparkInterpreter
YARNMaster
SparkClient
YARNNode
SparkContext
YARNNode
SparkWorker
YARNNode
SparkWorker
1
2
3
LivyServer
Scalatra
SessionManager
Session
YARN-CLUSTERMODE
SparkInterpreter
YARNMaster
SparkClient
YARNNode
SparkContext
YARNNode
SparkWorker
YARNNode
SparkWorker
1
2
3
4LivyServer
Scalatra
SessionManager
Session
YARN-CLUSTERMODE
SparkInterpreter
YARNMaster
SparkClient
YARNNode
SparkContext
YARNNode
SparkWorker
YARNNode
SparkWorker
1
2
3
4
5
LivyServer
Scalatra
SessionManager
Session
YARN-CLUSTERMODE
SparkInterpreter
YARNMaster
SparkClient
YARNNode
SparkContext
YARNNode
SparkWorker
YARNNode
SparkWorker
1
2
3
4
5
6
LivyServer
Scalatra
SessionManager
Session
YARN-CLUSTERMODE
SparkInterpreter
YARNMaster
SparkClient
YARNNode
SparkContext
YARNNode
SparkWorker
YARNNode
SparkWorker
1 7
2
3
4
5
6
LivyServer
Scalatra
SessionManager
Session
YARN-CLUSTERMODE
SparkInterpreter
SESSION CREATION AND EXECUTION%curl-XPOSTlocalhost:8998/sessions\-d'{"kind":"spark"}'{"id":0,"kind":"spark","log":[...],"state":"idle"}
%curl-XPOSTlocalhost:8998/sessions/0/statements-d'{"code":"1+1"}'{"id":0,"output":{"data":{"text/plain":"res0:Int=2"},"execution_count":0,"status":"ok"},"state":"available"}
SHELL OR BATCH?YARNMaster
SparkClient
YARNNode
SparkInterpreter
SparkContext
YARNNode
SparkWorker
YARNNode
SparkWorker
LivyServer
Scalatra
SessionManager
Session
SHELLYARNMaster
SparkClient
YARNNode
pyspark
SparkContext
YARNNode
SparkWorker
YARNNode
SparkWorker
LivyServer
Scalatra
SessionManager
Session
BATCHYARNMaster
SparkClient
YARNNode
spark-submit
SparkContext
YARNNode
SparkWorker
YARNNode
SparkWorker
LivyServer
Scalatra
SessionManager
Session
REMEMBER?YARNMaster
SparkClient
YARNNode
SparkContext
YARNNode
SparkWorker
YARNNode
SparkWorker
LivyServer
Scalatra
SessionManager
Session
SparkInterpreter
INTERPRETERS
• Pipestdin/stdouttoarunningshell
• Executethecode/sendtoSparkworkers
• Performmagicopera:ons
• Oneinterpreterperlanguage• “Swappable”withotherkernels(python,spark..)
Interpreter
>println(1+1)2
println(1+1)
2
{“data”:{“application/json”:“2”}}
LivyServer
2
Interpreter
1+1{“code”:“1+1”}
>1+1
Magic
INTERPRETER FLOW
{“data”:{“application/json”:“2”}}
LivyServer
2
Interpreter
1+1{“code”:“1+1”}
>1+1
2 Magic
INTERPRETER FLOW
INTERPRETER FLOW CHART
ReceivelinesSplitintoChunks
Sendoutputtoserver
Senderrortoserver
Success
ExecuteChunkMagic!
Chunksle[?
Magicchunk?
No
Yes
NoYes
Exampleofparsing
[('',506610),('the',23407),('I',19540)...]
JSON MAGIC
>countssparkIMain.valueOfTerm(“counts”)
.toJson()
Interpreter
vallines=sc.textFile("shakespeare.txt");valcounts=lines.flatMap(line=>line.split("")).map(word=>(word,1)).reduceByKey(_+_).sortBy(-_._2).map{case(w,c)=>Map("word"->w,"count"->c)}
%jsoncounts
JSON MAGIC
>countssparkIMain.valueOfTerm(“counts”)
.toJson()
Interpreter
{"id":0,"output":{"application/json":[{"count":506610,"word":""},{"count":23407,"word":"the"},{"count":19540,"word":"I"},...]...}
vallines=sc.textFile("shakespeare.txt");valcounts=lines.flatMap(line=>line.split("")).map(word=>(word,1)).reduceByKey(_+_).sortBy(-_._2).map{case(w,c)=>Map("word"->w,"count"->c)}
%jsoncounts
[('',506610),('the',23407),('I',19540)...]
TABLE MAGIC
>counts
Interpreter
vallines=sc.textFile("shakespeare.txt");valcounts=lines.flatMap(line=>line.split("")).map(word=>(word,1)).reduceByKey(_+_).sortBy(-_._2).map{case(w,c)=>Map("word"->w,"count"->c)}
%tablecounts
sparkIMain.valueOfTerm(“counts”).guessHeaders().toList()
TABLE MAGIC
>countssparkIMain.valueOfTerm(“counts”)
.guessHeaders().toList()
Interpreter
vallines=sc.textFile("shakespeare.txt");valcounts=lines.flatMap(line=>line.split("")).map(word=>(word,1)).reduceByKey(_+_).sortBy(-_._2).map{case(w,c)=>Map("word"->w,"count"->c)}
%tablecounts"application/vnd.livy.table.v1+json":{"headers":[{"name":"count","type":"BIGINT_TYPE"},{"name":"name","type":"STRING_TYPE"}],"data":[[23407,"the"],[19540,"I"],[18358,"and"],...]}
PLOT MAGIC
>
sparkIMain.interpret(“png(‘/tmp/plot.png’)barplotdev.off()”)
Interpreter
...barplot(sorted_data$count,names.arg=sorted_data$value,main="Resourcehits",las=2,col=colfunc(nrow(sorted_data)),ylim=c(0,300))
PLOT MAGIC
>
sparkIMain.interpret(“png(‘/tmp/plot.png’)barplotdev.off()”)
Interpreter
...barplot(sorted_data$count,names.arg=sorted_data$value,main="Resourcehits",las=2,col=colfunc(nrow(sorted_data)),ylim=c(0,300))
PLOT MAGIC
>png(‘/tmp/..’)>barplot>dev.off()
sparkIMain.interpret(“png(‘/tmp/plot.png’)barplotdev.off()”)
Interpreter
...barplot(sorted_data$count,names.arg=sorted_data$value,main="Resourcehits",las=2,col=colfunc(nrow(sorted_data)),ylim=c(0,300))
PLOT MAGIC
>png(‘/tmp/..’)>barplot>dev.off()
sparkIMain.interpret(“png(‘/tmp/plot.png’)barplotdev.off()”)
File(’/tmp/plot.png’).read().toBase64()
Interpreter
...barplot(sorted_data$count,names.arg=sorted_data$value,main="Resourcehits",las=2,col=colfunc(nrow(sorted_data)),ylim=c(0,300))
PLOT MAGIC
>png(‘/tmp/..’)>barplot>dev.off()
sparkIMain.interpret(“png(‘/tmp/plot.png’)barplotdev.off()”)
File(’/tmp/plot.png’).read().toBase64()
Interpreter
...barplot(sorted_data$count,names.arg=sorted_data$value,main="Resourcehits",las=2,col=colfunc(nrow(sorted_data)),ylim=c(0,300))
{"data":{"image/png":"iVBORw0KGgoAAAANSUhEUgAAAe…"...}...}
• PluggableBackends• Livy'sSparkBackends– Scala– pyspark– R
• IPython/Jupytersupportcomingsoon
PLUGGABLE INTERPRETERS
REMEMBER AGAIN?YARNMaster
SparkClient
YARNNode
SparkContext
YARNNode
SparkWorker
YARNNode
SparkWorker
LivyServer
Scalatra
SessionManager
Session
SparkInterpreter
MULTI USERS
YARNNode
SparkContext
LivyServer
Scalatra
SessionManager
Session
SparkInterpreter YARN
Node
SparkContext
SparkInterpreter
YARNNode
SparkContext
SparkInterpreter
SparkClient
SparkClient
SparkClient
SHARED CONTEXTS?
YARNNode
SparkContext
LivyServer
Scalatra
SessionManager
Session
SparkInterpreter
SparkClient
SparkClient
SparkClient
SHARED RDD?
YARNNode
SparkContext
LivyServer
Scalatra
SessionManager
Session
SparkInterpreter
SparkClient
SparkClient
SparkClient
RDD
SHARED RDDS?
YARNNode
SparkContext
LivyServer
Scalatra
SessionManager
Session
SparkInterpreter
SparkClient
SparkClient
SparkClient
RDD
RDD
RDD
YARNNode
SparkContext
LivyServer
Scalatra
SessionManager
Session
SparkInterpreter
SparkClient
SparkClient
SparkClient
RDD
RDD
RDD
SECURE IT?
YARNNode
SparkContext
LivyServer
Scalatra
SessionManager
Session
SparkInterpreter
SparkClient
SparkClient
SparkClient
RDD
RDD
RDD
SECURE IT?
PySparkshell
RDD{'ak':'Alaska'}
{'ca':'California'}
ShellPythonShell
r=sc.parallelize([])srdd=ShareableRdd(r)
PySparkshell
RDD{'ak':'Alaska'}
{'ca':'California'}
ShellPythonShell
curl-XPOST/sessions/0/statement{'code':srdd.get('ak')}
r=sc.parallelize([])srdd=ShareableRdd(r)
PySparkshell
RDD{'ak':'Alaska'}
{'ca':'California'}
ShellPythonShell
states=SharedRdd('host/sessions/0','srdd')states.get('ak')
r=sc.parallelize([])srdd=ShareableRdd(r)
curl-XPOST/sessions/0/statement{'code':srdd.get('ak')}
DEMO TIME
https://github.com/romainr/hadoop-tutorials-examples/tree/master/notebook/shared_rdd
SPARK MAGIC
• FromMicrosop
•PythonmagicsforworkingwithremoteSpark
clusters
•OpenSource:hAps://github.com/jupyter-
incubator/sparkmagic
FUTURE
•Movetoextrepo?
• Security• iPython/Jupyterbackendsandfileformat
• SharednamedRDD/contexts?
• Sharedata• Sparkspecific,languagegeneric,both?• LeverageHue4
https://issues.cloudera.org/browse/HUE-2990
• OpenSource:hAps://github.com/cloudera/
hue/tree/master/apps/spark/java
• Readaboutit:hAp://gethue.com/spark/
•Scala,Java,Python,R
•TypeIntrospec:onforVisualiza:on
•YARN-clusterorlocalmodes
•Codesnippets/compiled
•RESTAPI
•Pluggablebackends
•Magickeywords
•Failureresilient
•Security
LIVY’SCHEAT SHEET
BEDANKT!
@gethue
USER GROUP
hue-user@
WEBSITE
hAp://gethue.com
LEARN
hAp://learn.gethue.com