What is Livy?
A Service that manages long running Spark Contexts in your cluster.
• A Service which provides interaction with Apache Spark Cluster through Rest Interface.
• Open Source Apache Licensed.
• multi-tenant environment as it manages multiple Spark context efficiently.
• Livy removes the need of Local Spark Environment due to which we can submit jobs from mobile or web environment.
• Fine grained job submission.
• Retrieve job results over REST asynchronously or synchronously.
• Client APIs in java, Scala and soon in python.
Features of Livy
• Interactive Scala, Python, and R shells
• Batch submissions in Scala, Java, Python
• Can handle Multiple spark jobs at the same time.
• Reliable for Multi-tenant executions.
• Can be used for submitting jobs from anywhere with REST
• Support Spark1/ Spark2, Scala 2.10/2.11 within one build.
• It is 100% open source Apache Licensed API.
• LIVY supports impersonation by which multiple users can share the same server.
• For using Livy there is no need to change the existing code just instead of defining the spark context we have to use the predefined sparkcontext in LIVY.
• Share Cached RDD’s or Dataframes between multiple jobs or clients.
Jupyter-Spark Integration via Livy
Sparkmagic is an open source library that Microsoft is incubating under the Jupyter Incubator program. Thousands of Spark clusters in production providing feedback to further improve the experience
Architectural Advantages of Jupyter integration via Livy
• Run Spark code completely remotely; no Spark components need to be
• installed on the Jupyter server
• Multi-language support; the Python, Scala and R kernels are equally feature-rich
• Support for multiple endpoints; you can use a single notebook to start multiple Spark jobs in different languages and against different remote clusters
• Easy integration with any Python library for data science or visualization, like Pandas or Plotly
Manage multiple independent Spark Contexts
User Impersonation
Zeppelin Livy Interaction
Interactive Session – Create Session
2
1
3
4
curl -X POST --data '{"kind": "spark"}' -H "Content-Type: application/json" localhost:8998/sessions
{"state":"starting","proxyUser":”null","id":1,"kind":"spark","log":[]}
Request
Response
Livy Client
Livy Server
Spark Interactive Session
Spark Context
Interactive Session – Execute Code
{"id":0,"state":"running","output":null}
Request
Response
curl http://localhost:8998/sessions/0/statements -X POST -H 'Content-Type: application/json' -d '{"code":"sc.parallelize(0 to 100).sum()"}'
2
1
3
4
Livy Client
Livy Server
Spark Interactive Session
SparkContext
SparkContext Sharing
Livy Server
Client 1
Client 2
Client 3
Session-1
Session-1
Session-2 Session-2
Session-1SparkSession-1
SparkContext
SparkSession-2
SparkContext
Livy Security
Client Livy Server
(Impersonation)
Shared SecretSpengo
SparkSession
• Only authorized users can launch spark session / submit code
• Each user can access his own session
• Only Livy server can submit job securely to spark session
SPNEGO
Client(Kerbrose TGT)
Livy Server(SPENGO enabled)
• Simple and Protected GSSAPI Negotiation Mechanism (SPNEGO), often pronounced "spen-go”
• It is a GSSAPI "pseudo mechanism" used by client-server software to negotiate the choice of security technology.
Http Get http://site/a.html
Error 401 Unauthorized
Http Get Request
Authorization: Negotiation
Http Get Request
Impersonation
Alice(Kerberos TGT)
Shared Secret
Bob(Kerberos TGT)
Shared SecretSpengo
Spengo
Livy Server
(super user: livy)
Spark Session
Spark Session
Shared Secret
• Livy Server generate secret key
• Livy Server pass secret key to spark session when launching spark session
• Use the secret key to communicate with each other
Spark SessionShared Secret
Livy Server
Top Related