Privacy issues in integrating R environment in scientific workflows Dr. Zhiming Zhao University of...
-
Upload
derick-horn -
Category
Documents
-
view
215 -
download
1
Transcript of Privacy issues in integrating R environment in scientific workflows Dr. Zhiming Zhao University of...
Privacy issues in integrating R environment in scientific workflows
Dr. Zhiming Zhao
University of AmsterdamVirtual Laboratory for e-Science
Privacy issues in integrating Legacy Experiment Environment to Scientific WorkflowsZhiming Zhao, Dmitry A. Vasunin, Adianto Wibisono, Adam Belloum, Cees de Laat, Pieter Adriaans, Bob Hertzberger
Outline
• Scientific experiments and R• Problem description• Optional solutions• Experimental results• Summarizing discussion• Future work
Scientific experiments and support systems
Experiment: on full data scale.
Define goal
Data analysis
Prototype the algorithm
Computing(Test with small data)
Vis./Int.(Validation)
Finding &Dissemination
Apply to full size data
RefineRefine
0
1 0
2 0
3 0
4 0
5 0
6 0
7 0
8 0
9 0
1 s t Q t r 2 n d Q t r 3 r d Q t r 4 t h Q t r
E a s t
W e s t
N o r t h
0
1 0
2 0
3 0
4 0
5 0
6 0
7 0
8 0
9 0
1 s t Q t r 2 n d Q t r 3 r d Q t r 4 t h Q t r
E a s t
W e s t
N o r t h
0
1 0
2 0
3 0
4 0
5 0
6 0
7 0
8 0
9 0
1 s t Q t r 2 n d Q t r 3 r d Q t r 4 t h Q t r
E a s t
W e s t
N o r t h
0
1 0
2 0
3 0
4 0
5 0
6 0
7 0
8 0
9 0
1 s t Q t r 2 n d Q t r 3 r d Q t r 4 th Q t r
E a s t
W e s t
N o r th
Prototype: on small data scale. In such scenarios:• Existing experiment
environments, such as R, are widely used by domain scientists
• Human in the loop computing is important for testing and validating prototypes
• scientific workflows are used to manage different processes and the experiment lifecycle
R and workflow support in VL-e
• R realises rich functionality of data statistics and visualisation, and has been used as an important experimental environment in bio-sciences.– R needs scientific workflow support
• Accessing different e-Science resources• Being coordinated with the other components in a large
scale experiment– E-Science workflows in certain domains also need R
• Reuse the advanced results from legacy systems• Support experiments developed on legacy systems
• Workflow support in VL-e– Four systems are recommended
• Taverna, Kepler and VLAM have support to R– A generic solution is under construction
R in scientific workflows: current solutions Three types of solutions
• Local: local installation of R, through the command line interface of R– Simple configuration– Performance bottleneck
• Web Service: SOAP to pass R script and objects– Standard interface,
distributed computing– High latency
• TCP Socket: socket interface (RServe)– Distributed computing– Maintain states– Poor security
Wf system
User Desktop
Local REnv.
Remote node
Remote REnv.W
SSocke
t
L
S
W
Typical scenario of RServe and requirements on privacy
Different levels of privacy issues
• Data level– Intermediate results not
to be seen by the other users
• Communication level: graphical display– Remote X display and
interaction between multi users
WF1 WF2 R Display
Problem description and desired solution
• Problem description– Most of the legacy experiment environment do not have
strong security management– Workflow systems provide integration without
considering security issues– The deployment of remote environment is required to
be secure
• Desire– Using existing technologies– Provide solutions to privacy issues at workflow level,
preferably in a transparent way
Experiments
• Review optional solutions• Investigate the overhead of security
enhancement on the workflow execution
Different configurations and their level of security
Data management Display management
Static (R engine)Shared engine
Dynamic (R engine) different user account
Static (X server) Dynamic (X server) {Job+VNC}Local X Remote X +
VNC
No. Yes Yes No Yes
Easy to setup The endpoint is unknown at workflow design stage
Individual X server, bounded to user’s desktop
X is not protected
Management overhead of VNC
An experiment: Taverna, RServe and security tunnel
Data transfer between workflow and R
1
10
100
1000
10000
100000
1000 10000 100000 1000000
Size of data between workflow and R
Tim
e (
mill
ise
co
nd
)
Non-Secure
SecureExperiment• Adding security
enhancement in Taverna
• Protect the data channels between Taverna and RServe
• Overhead– Setting up security
tunnels– Runtime data
transfer
Summarizing discussion
• Integrating existing experiment environment with workflow system is important for rapid prototyping
• Privacy issues are demanded by both users and e-Science infrastructure, and can be viewed a generic issue when integrating a user interaction enabled legacy component in workflow
• Privacy protection can be achieved at certain level by customizing the workflow execution
• Enhancing workflow execution not necessarily gives high penalty on execution
Future work
• In the VL-e project, we are developing a bus style generic solution for different workflow systems
• Taking the data privacy into account when realizing the interoperability between different workflow systems
Activities• Int’l workshop on “Workflow systems in e-Science”, organized by
Zhiming Zhao and Adam Belloum, in the context of ICCS, 2006 Reading University, 2007 Beijing, China.– Proceedings is in LNCS, Springer Verlag.– A special issue will be published in Scientific Programming Journal. – http://staff.science.uva.nl/~zhiming/iccs-wses
• Workshop on “Scientific workflows and industrial workflow standards in e-Science ”, organized by Adam Belloum and Zhiming Zhao, in the context of IEEE e-Science and Grid computing conference in Amsterdam December 2006.– Pegasus, Dr. Ewa Deelman (Department of Computer Science University of
South California) – BPEL, Dr. Dieter König (IBM Research Germany Development Laboratory) – Kepler, Dr. Bertram Ludäscher (Department of Computer Science
University of California, Davis) – Taverna, Prof. Peter Rice (European Bioinformatics Institute) – WS and Semantic issues, Dr. Steve Ross-Talbot (CEO, and a co-founder,
of Pi4 Technologies) – Triana, Dr. Ian J. Taylor (Department of Computer Science Cardiff
University) – http://staff.science.uva.nl/~adam/workshop/VL-e-workshop.htm