GAAIN Virtual Appliances: Virtual Machine Technology for Scientific Data Analysis Arihant Patawari...
-
Upload
daniella-may -
Category
Documents
-
view
220 -
download
0
Transcript of GAAIN Virtual Appliances: Virtual Machine Technology for Scientific Data Analysis Arihant Patawari...
![Page 1: GAAIN Virtual Appliances: Virtual Machine Technology for Scientific Data Analysis Arihant Patawari USC Stevens Neuroimaging and Informatics Institute July.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649f2e5503460f94c4862e/html5/thumbnails/1.jpg)
GAAIN Virtual Appliances: Virtual Machine Technology for Scientific Data Analysis
Arihant PatawariUSC Stevens Neuroimaging and Informatics Institute
July 9th 2015
![Page 2: GAAIN Virtual Appliances: Virtual Machine Technology for Scientific Data Analysis Arihant Patawari USC Stevens Neuroimaging and Informatics Institute July.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649f2e5503460f94c4862e/html5/thumbnails/2.jpg)
Organization
1) GAAIN Virtual Appliances
-Expanding the GAAIN application with Docker as well as Virtual Machines
- Objectives: Support production data analysis in GAAIN
2) Medical Datasets Element Name Matching- Integration into larger GEM system
- Scalability issues
- Mine features from data
- Neural Network classifiers9/8/14
![Page 3: GAAIN Virtual Appliances: Virtual Machine Technology for Scientific Data Analysis Arihant Patawari USC Stevens Neuroimaging and Informatics Institute July.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649f2e5503460f94c4862e/html5/thumbnails/3.jpg)
The Virtual Machine
• A (computing) machine purely “made of software”• A machine within a machine• WHY ? : Sharable, transportable over a network
![Page 4: GAAIN Virtual Appliances: Virtual Machine Technology for Scientific Data Analysis Arihant Patawari USC Stevens Neuroimaging and Informatics Institute July.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649f2e5503460f94c4862e/html5/thumbnails/4.jpg)
GAAIN Virtual Machines
Investigator
Data Partner
![Page 5: GAAIN Virtual Appliances: Virtual Machine Technology for Scientific Data Analysis Arihant Patawari USC Stevens Neuroimaging and Informatics Institute July.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649f2e5503460f94c4862e/html5/thumbnails/5.jpg)
Virtual Appliances
9/8/14
![Page 6: GAAIN Virtual Appliances: Virtual Machine Technology for Scientific Data Analysis Arihant Patawari USC Stevens Neuroimaging and Informatics Institute July.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649f2e5503460f94c4862e/html5/thumbnails/6.jpg)
6
• How do we provide a scientific investigator a dedicated analysis development resource
• How do we ensure that an analysis resource is sharable• How do we run applications that require graphical display (such
as a UI)• How can we connect client and server applications• How do we ensure automated cloud backups• How do we send over analysis machine to data partners• How do we access data partner data • How do we get beck analysis results into GAAIN network • …..
Objectives ….
![Page 7: GAAIN Virtual Appliances: Virtual Machine Technology for Scientific Data Analysis Arihant Patawari USC Stevens Neuroimaging and Informatics Institute July.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649f2e5503460f94c4862e/html5/thumbnails/7.jpg)
7
•Designed to provide framework for (specific) application encapsulation
•Provide minimal support for application•Not intended as a general purpose computing machine•Other aspects
- Dockerfile management- Docker Hub- Security
• Relatively new and evolving framework
Recap: Docker Framework
![Page 8: GAAIN Virtual Appliances: Virtual Machine Technology for Scientific Data Analysis Arihant Patawari USC Stevens Neuroimaging and Informatics Institute July.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649f2e5503460f94c4862e/html5/thumbnails/8.jpg)
8
•Intended as full computing machine•Command line control, not scripts•Interoperability with Open Virtualization Format (OVF)•Also VM environments like VMSphere, XenServer and others
Recap: Virtual Machines
![Page 9: GAAIN Virtual Appliances: Virtual Machine Technology for Scientific Data Analysis Arihant Patawari USC Stevens Neuroimaging and Informatics Institute July.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649f2e5503460f94c4862e/html5/thumbnails/9.jpg)
PC
VBox VM
(Pipeline)Server
PC
VBox VM
(Pipeline)Server
PC
Docker
(Pipeline)Server
PC
VBox VM
Docker
(Pipeline)Server
Many Possibilities !
client on PC, server in VM client and server in VM
client and server in Docker client in VM, server in Docker, Docker in VM
√
![Page 10: GAAIN Virtual Appliances: Virtual Machine Technology for Scientific Data Analysis Arihant Patawari USC Stevens Neuroimaging and Informatics Institute July.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649f2e5503460f94c4862e/html5/thumbnails/10.jpg)
10
Docker LifeCycle
Docker File
Hub
Data Partner’s Machine
Result in Shared Folder
![Page 11: GAAIN Virtual Appliances: Virtual Machine Technology for Scientific Data Analysis Arihant Patawari USC Stevens Neuroimaging and Informatics Institute July.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649f2e5503460f94c4862e/html5/thumbnails/11.jpg)
Docker vs Virtual Machines
AspectsVirtualBox Docker
Virtual Image Type (Formats)
The open-virtual-format as ‘.OVF’ and ‘’.OVA’ files
Proprietary Docker image format
Requirements Any virtualization hypervisor that can run the open virtualization format images
Docker Engine
Architecture Typically the core virtual image contains a complete operating system of choice
Minimal system layer is provided and components are added only as required
‘Typical’ Image Sizes Encapsulating a simple application (for instance a single workflow) results in a machine of size ~ 1.5GB. However options are recently becoming available for including only a liminal operating system layer.
Typically only a few hundred MB for the same applications
Management and Sharing
No specific capabilities provided Docker Hub for centralized Docker image storage, tagging, and sharing
Access Control No specific capabilities provided Docker Hub provides account management and access control
Network Access Can provide network access between Virtual Box VM and external machines/networks.
External network access to Docker image can be provided but with limitations
Host Folder Mounting
Possible but with some additional software installation
Host folder mounting can be done more easily with a single command
![Page 12: GAAIN Virtual Appliances: Virtual Machine Technology for Scientific Data Analysis Arihant Patawari USC Stevens Neuroimaging and Informatics Institute July.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649f2e5503460f94c4862e/html5/thumbnails/12.jpg)
12
Virtual Machines and Docker
• Virtual Machines provide
- More robust platform
- Interoperability
- Network access
• Docker provides
- Small application packages
- Hub management
- Security and access control
- “On-demand”
![Page 13: GAAIN Virtual Appliances: Virtual Machine Technology for Scientific Data Analysis Arihant Patawari USC Stevens Neuroimaging and Informatics Institute July.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649f2e5503460f94c4862e/html5/thumbnails/13.jpg)
13
Docker File- Docker can build images automatically by reading the instructions
from a Docker file.
- Only requirement is to have docker installed
- Docker file can be created automatically by recording the actions performed just by a command (Using Auto-commit module), which makes it flexible for any user unfamiliar with docker commands
- Just by executing simple text file, the whole system can built from the scratch.
- The idea behind using docker file, it helps to manage size with requirements.
![Page 14: GAAIN Virtual Appliances: Virtual Machine Technology for Scientific Data Analysis Arihant Patawari USC Stevens Neuroimaging and Informatics Institute July.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649f2e5503460f94c4862e/html5/thumbnails/14.jpg)
14
• Best supported for Linux but some challenges with Windows and Mac
• Graphical Displays- Could be achieved with X Window and other software on Windows- Challenges with Mac OS
• Network access- Restricted due to security issues- Port forwarding
• Frequent updates to framework
Some Challenges with Docker
![Page 15: GAAIN Virtual Appliances: Virtual Machine Technology for Scientific Data Analysis Arihant Patawari USC Stevens Neuroimaging and Informatics Institute July.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649f2e5503460f94c4862e/html5/thumbnails/15.jpg)
15
Working Prototype
Virtual Machine Manager
Auto Pipeline pop-up
Investigator
Data Partner
Image push to Hub
Docker HUB
Web ServiceGAAIN Server
Docker Auto-Invocation Results
![Page 16: GAAIN Virtual Appliances: Virtual Machine Technology for Scientific Data Analysis Arihant Patawari USC Stevens Neuroimaging and Informatics Institute July.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649f2e5503460f94c4862e/html5/thumbnails/16.jpg)
16
Features• Flexible and no interoperability issues.
• Better control and management of workflows images through docker hub.
• Better accessibility and ease of use directly.
• Automated invocation of workflows at data partner’s end using java based web-service.
• Dedicated application just for creating and testing workflows, with automated script for pushing it to the hub.
• Minimal size of overall system (1.5-2.0 GB)
![Page 17: GAAIN Virtual Appliances: Virtual Machine Technology for Scientific Data Analysis Arihant Patawari USC Stevens Neuroimaging and Informatics Institute July.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649f2e5503460f94c4862e/html5/thumbnails/17.jpg)
17
Some technical issues, we faced- Virtual appliance creation with various minimal install OS’s
- Scripts for automatic invocation of pipeline
- Installation of docker on different Operating system version.
- Compatibility of pipeline with different OS’s
- Memory bubble and deleting existing images from the system.
- Memory Overrun can be solved by deletion of images which are not required.
- Web services
![Page 18: GAAIN Virtual Appliances: Virtual Machine Technology for Scientific Data Analysis Arihant Patawari USC Stevens Neuroimaging and Informatics Institute July.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649f2e5503460f94c4862e/html5/thumbnails/18.jpg)
18
• Choice of VM OS
• Choice of Docker OS
• How to get a minimal VM working
• How to work with a minimal Docker image
• How to enable network access in VMs
• Limitations of network access
• How to mount folders from host
• Differences between Linux, Windows and Mac hosts
• How to get GUI displays to work in different Docker images• and VMs
• How to enable external (client/server access) to VMs and• Docker images
• How to autostart applications
• How to manage scripts
Issues Addressed
![Page 19: GAAIN Virtual Appliances: Virtual Machine Technology for Scientific Data Analysis Arihant Patawari USC Stevens Neuroimaging and Informatics Institute July.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649f2e5503460f94c4862e/html5/thumbnails/19.jpg)
19
• All files and code for system is provided on the Google Drive shared folder
• Comprehensive “How-To” Manual
Miscellaneous