Meetup Apache Flink en Madrid. Futuro de Apache Flink y su rivalidad con Spark Streaming
Zeppelin meetup 2016 madrid
-
Upload
jongyoul-lee -
Category
Technology
-
view
42 -
download
0
Transcript of Zeppelin meetup 2016 madrid
Advanced features of Apache Zeppelinhttp://zeppelin.apache.org
Jongyoul Lee
PMC of Apache Zeppelin from Sep. 2015.
Software Development Engineer at NFLabs
Advanced?• lium
• A new extension for visualization
• Multi-users features
• Users & Permissions
• Per user/Per note & Shared/Scoped/Isolated
• Futures
• Impersonation & Personalized mode
• Scalability & Reliability
He2
liumHe2
Zeppelin
Visualizations : 6 Built-in visualizations comes with pivot
Table Bar Pie Area Line Scatter
Free to draw any customized visualizations inside of notebook
…
He liumHe2
Interpreter Notebook StorageSp
ark
Flin
k
Geo
de
JDBC …
File
Sys
tem
Amaz
on S
3
Git …
Application
Visu
aliz
atio
ns
Map
Wor
dClo
ud
…
Resource PoolSparkContext Flink Environment JDBC connection …
Ana
lytic
s
… …
User object
Extend pluggable visualization to pluggable analytics application
Working in progress to make visualization pluggable
Users and Permissions
• Company complains
• Why security works …
• Why authentication works …
• Why Zeppelin stores my password as plain …
• Why two user use same Spark …
• Why I wait while other run somethings
& Enterprise
Auhentication : Integrated with Apache Shiro
Contributions
- PAM - ActiveDirectory - Jdbc - Jndi - Ldap - Properties
Zeppelin
Notebook Authorization : Owners, Writers, Readers per Note
Zeppelin
Multi-tenancyPer user/Per note & Shared/Scoped/Isolated
SHARED ISOLATED SCOPED
PROCESS 1 N 1
THREADS 1 1 N
Multi-tenancyZeppelin
ZeppelinServer
SparkInterpreter
Run P1 on NoteA
Run SparkInterpreter for P1
User1
User2
Run P2 on NoteB Run SparkInterpreter for P2
SharedZeppelin
• Originally implemented • Pros
• Simple structure • Predictable behavior
• Cons • All resources shared • Interference among users
SharedZeppelin
ZeppelinServer
SparkInterpreter
Run P1 on NoteA
Run SparkInterpreter for P1
User1
User2
Run P2 on NoteB
Run SparkInterpreter for P2 SparkInterpreter
IsolatedZeppelin
• Pros • No pending • No resources shared
• Cons • Lots of memory • Inefficiency of using memory • Limited by resources
IsolatedZeppelin
ZeppelinServer
JDBCInterpreter
Run P2 on NoteA
Run SparkInterpreter for P2
User1
User2
Run P3 on NoteB Run SparkInterpreter for P3
Scoped
JDBCInstance User1
JDBCInstance User2
Zeppelin
• Pros • Less memory • Some resources Isolated
• Cons • Some resources shared • Big single process
ScopedZeppelin
SHARED ISOLATED SCOPED
PROCESS 1 N 1
THREADS 1 1 N
Multi-tenancyZeppelin
• ~ 0.7.0
• Impersonation of JDBC/Spark Interpreter
• Personalized mode
• 0.7.0 ~
• Scalability & Reliability
• …
& Futures