Porting of CernVM to AArch64€¦ · LHC@home 2.0) [19] to the latest µCernVM framework. This...
Transcript of Porting of CernVM to AArch64€¦ · LHC@home 2.0) [19] to the latest µCernVM framework. This...
Porting of µCernVM to AArch64 Felix Scheffler, 23/09/2016
Main Supervisor: Jakob Blomer ([email protected]) 2nd Supervisor: Gerardo Ganis ([email protected])
1
Background Virtual Machines (VMs) Virtualisation plays a vital role in computing. The aim is to distribute physical
resources, e.g. CPU power, RAM, or disk space, among several virtual appliances. A so-called hypervisor
is responsible for allocating and managing resources for several guest operating systems (OSs) or VMs.
Virtualisation is motivated by the ease of setting up new testing or production environments across
different physical platforms or OSs, the isolation of individual resources as well as improved efficiency
[1]. In particular, the same physical resources can be used for different applications on demand. This
is why virtualisation is an integral part in data centres world-wide. At CERN, hundreds of VMs are
created and destroyed every hour [2]. This flexibility is impossible with physical machines.
Virtualisation in High Energy Physics (HEP) VMs that comprise whole OSs are usually in the range of
several GBs in size. This makes it hard to efficiently distribute, but also to quickly start them. In HEP,
this shortcoming is addressed through µCernVM [3]. The image to be distributed comprises a stripped-
down Linux OS that connects to a CernVM-Filesystem (CVMFS) [4] repository that resides on a
dedicated web server. In contrast to “usual” VMs, anything that is needed from this repository is only
downloaded on demand, aggressively cached and eventually released again.
ARM and Virtualisation ARM has been the market leader in mobile computing for several years.
Recently, they have started to also enter the server market. However, in this segment, the
predominant architecture is still x86-64. In 2011, ARM introduced virtualisation support with ARMv7
to increase competitiveness and to harness the benefits of virtualisation as outlined. Since then,
considerable effort has been put into the development of a native Linux-based virtualisation solution.
Because Linux can be run on nearly every ARM device, harnessing existing Linux features to generate
such a solution greatly enhances portability and standardisation. The native hypervisor in Linux is its
Kernel-based VM (KVM). With kernel version 3.9, the Linaro Enterprise Group (LEG) successfully
managed to merge an implementation based on KVM upstream [5]. Currently, LEG is also working on
an OpenStack cloud [6] based on ARM’s first 64-bit (AArch64) architecture, ARMv8 [7]. Beyond
operating in a small scale (e.g. on a single machine), this enables developers and users to create, test
and run VMs in a large-scale open-source cloud environment.
Project Motivation ARM has a strong standing in mobile devices and computing, which is a market that is
driven particularly by energy considerations. The HEP community is confronted with computations
carried out in the range of millions of jobs per day [8]. As a result, computing is not only a technical,
but also an economic challenge. In terms of performance-to-energy ratio, ARM is already in good shape
compared to Intel and AMD [9]. Even in terms of only performance, ARM is getting closer to x86
systems [5]. Thus, porting µCernVM to ARM potentially opens up a new market to harness. Should
ARM become established in the server market, it is desirable to have an HEP virtualisation solution for
AArch64. In regard to HEP software (independent of virtualisation), CMSSW is already ported to
AArch64 [10].
Development environment Today, the market for physical ARM hardware is still comparably
immature. In the server segment, HP is represented with the HPE ProLiant m400 (Moonshot) [11]. In
the setting of small-scale development boards, Geekbox [12] provides a good price-performance ratio.
Both platforms are used for porting µCernVM to AArch64. In particular, the HPE ProLiant m400 ships
with an AppliedMicro X-Gene 8-core 64-bit System-on-Chip (SoC) with up to 2.4Ghz/core. The GeekBox
comes with the RK3368, an 8-core 64bit SoC with up to 1.5Ghz/core produced by Rockchip.
Porting of µCernVM to AArch64 Felix Scheffler, 23/09/2016
Main Supervisor: Jakob Blomer ([email protected]) 2nd Supervisor: Gerardo Ganis ([email protected])
2
Porting of µCernVM The entire porting process is broken down into the following tasks: 1.) compiling
a custom Linux kernel, 2.) creating the µCernVM image by combining the kernel with a custom initrd
and 3.) compiling a minimum set of packages to set up a preliminary CVMFS repository.
1.) Custom Linux kernel µCernVM is based on a lightweight Linux kernel compiled from sources.
Together with the initrd and a small set of device drivers, it is about 15MB in size. As such, it is much
smaller than a standard Scientific Linux 6 kernel (>100MB). Kernel configuration options are primarily
based on the existing x86-64 kernel. Remaining options are adjusted interactively through the Linux
kernel build system.
2.) µCernVM image The kernel is combined with a custom initrd into a distributable image. The initrd
contains BusyBox [13], CVMFS, a collection of bash scripts as well as some more packages. Its purpose
is to only boot a preliminary OS that eventually loads the full OS by connecting to the preconfigured
CVMFS repository. Since CVMFS is designed as a read-only filesystem, any user running µCernVM
needs to have a writable scratch area as well. This disk space is hosted by the local storage set up by
the initrd. Both layers are combined through a union file system. In the case of µCernVM, this is AUFS
[14]. The fundamental difference between the AArch64 and the x86-64 distributions of µCernVM is
the system startup. Compared to x86-64, AArch64 uses UEFI/GPT instead of BIOS/MBR standards (see
Appendix). Beyond that, the virtualised boot process basically follows the higher-level software stack
of the x86-64 µCernVM framework. Thus, other parts of the initrd, e.g. contextualisation, did not need
to be adapted.
3. CVMFS repository The CVMFS repository set up as a test environment for this project is a customised
CentOS 7 installation1. Currently, the environment is bound to a limited set of packages sufficient to
run CMS and ROOT in a command-line user interface.
Benchmarking To compare VM and host runtime performance, a subset of ROOT6 [15] and CMS2
benchmarks is run natively and virtualised. Results are shown in Figure 1. As expected, we find that
VM performance is worse than host performance in all cases. To help pinpointing where performance
is lost, a low-level I/O benchmark is performed as well. Figure 2 shows that especially network
bandwidth is significantly lower for the VM3. This is remarkable since network paravirtualisation, i.e.
virtio [16], is enabled. This requires further investigations. However, since the CMS benchmark is run
with a warm cache, it is assumed that network performance is not a major driver of this result. In terms
of serial disk I/O, caching effects are reduced by doing an fsync() call before measuring the actual
throughput4. In addition, the VM is configured such that caching to the host is eliminated5. Apart from
network paravirtualisation, no other devices are paravirtualised. In particular, the use of disk virtio
drivers starts to pay off with multiple threads [17]. In the case of single threading (as with dd), the
effect of using virtio is thus expected to be negligible. This is also verified experimentally (data not
shown). It can be further assumed that the comparably bad CMS result is not connected to whether
using disk virtio drivers or not. This can be concluded from Figure 1 (CMS2 and CMS3). Here, the same
CMS benchmark is run in parallel (2 runs) on a VM with bus=scsi and one with bus=virtio.
1 The upstream repositories can be found at http://mirror.centos.org/altarch/7/os/aarch64/Packages/. 2 Code obtained from https://github.com/cvmfs/cvmfs/test 3 Network performance was measured using iperf3. 4 time dd bs=1M count=4000 if=/dev/zero of=test.log conv=fsync 5 virt-install allows setting the cache value to either ‘none’, ‘writeback’ or ‘writethrough’. In this test case, ‘none’ was chosen (which is also configured to be the default value).
Porting of µCernVM to AArch64 Felix Scheffler, 23/09/2016
Main Supervisor: Jakob Blomer ([email protected]) 2nd Supervisor: Gerardo Ganis ([email protected])
3
Figure 1: Comparison of ROOT6 and CMS benchmarks run on AArch64 µCernVM and host
Figure 2: Low-level I/O benchmark
Porting of µCernVM to AArch64 Felix Scheffler, 23/09/2016
Main Supervisor: Jakob Blomer ([email protected]) 2nd Supervisor: Gerardo Ganis ([email protected])
4
Future work and conclusions Following the successful porting of µCernVM to AArch64, the next step is to get the image running on
cloud infrastructure, preferably OpenStack. In addition, it is recommended to run further test
benchmarks under varying conditions (VM or host configurations, load configurations, parallel threads
etc.) to gain more insight into current bottlenecks. In regard to porting µCernVM to other
architectures, we now have some experience and empirical values of how much effort is involved. The
entire work is merged upstream.
Acknowledgements Special thanks to TechLab for providing access to ARM64 infrastructure. [18]
Appendix Porting µCernVM to IA-32 Beyond the AArch64 port, µCernVM is also ported to the Intel 32bit
architecture (IA-32). This is motivated by bringing Test4Theory (also known as Virtual LHC@Home or
LHC@home 2.0) [19] to the latest µCernVM framework. This affects 15.000+ users and about 30.000
machines [20] (of which around 100 to 200 are connected at any given point in time [21]).
UEFI boot process To manage µCernVM instances on AArch64, QEMU is used as hypervisor (together
with KVM as accelerator). Upon booting, respective virtualisation software that acts as a top layer on
KVM (in our case libvirt and virt-install) loads an architecture-dependent firmware image. In the case
of CentOS 7 and libvirt, this is located under /usr/share/AAVMF/AAVMF_CODE.fd. NVRAM variables
are stored under /usr/share/AAVMF/AAVMF_VARS.fd. Both are located in respective upstream
repositories and are thus automatically available by installing the packages. AAVMF essentially is a
porting of OVMF to AArch64. OVMF enables UEFI support for VMs on x86-64 systems [22].
In compliance with UEFI specifications [23], µCernVM is required to be distributed as an EFI System
Partition (ESP), which is essentially a partition that is formatted by a custom FAT32-variant. Together
with the EFISTUB boot option (CONFIG_EFI_STUB=y), a respective Linux kernel can be interpreted as
just another UEFI application to be executed. After loading a predefined set of UEFI
images/applications, the last application in this chain is the UEFI shell. It is started in the root
directory of the ESP. In compliance with the UEFI shell specifications [24], the UEFI shell searches for
a file called startup.nsh. It contains a command line (<path-to-kernel-image> initrd=<path-to-initrd>
other-kernel-command-line-parameters) that can be interpreted by AAVMF. Based on this command
line, AAVMF is able to start the kernel with the initrd and other (optional) parameters. All three
components, i.e. kernel, initrd and startup.nsh, are located under the root of the ESP.
The command for starting the VM is virt-install -n <VM name> --boot uefi --memory <RAM in MB> --
vcpus <no. of CPUs> --cpu host --disk path=<path-to-hdd>,format=raw --cdrom <path-to-
contextualisation-iso> --virt-type kvm --accelerate. Note that the initrd also needs to take care of
probably repairing any GPT-partitioned disk prior to mounting CVMFS. This is due to the fact that
cloud providers usually do a simple dd if=/dev/zero of=<path-to-hdd> bs=<bs> count=0 seek=<seek>;
dd if=<path-to-cernvm-image> of=<path-to-hdd>. While this does not pose a problem to MBR-
partitioned disks, it certainly does for GPT-partitioned ones. In this case, two actions need to be
taken. First, the secondary GPT table needs to be moved to the (new) end of the disk. Second, the
primary table needs to be updated with the corresponding position of the secondary table.
Otherwise, the newly created space cannot be used. This pitfall was resolved by issuing sgdisk -e
<path-to-hdd>.
Porting of µCernVM to AArch64 Felix Scheffler, 23/09/2016
Main Supervisor: Jakob Blomer ([email protected]) 2nd Supervisor: Gerardo Ganis ([email protected])
5
References
[1] J. Shuja, A. Gani, K. Bilal, A. U. R. Khan, S. A. Madani, S. U. Khan and A. Y. Zomaya, “A Survey of
Mobile Device Virtualization: Taxonomy and State of the Art,” ACM Computing Surveys, vol. 49,
no. 1, April 2016.
[2] CERN, “Data Centre (in numbers),” [Online]. Available: http://information-
technology.web.cern.ch/about/computer-centre. [Accessed 20 September 2016].
[3] J. Blomer, D. Berzano, P. Buncic, I. Charalampidis, G. Ganis, G. Lestaris, R. Meusel and V.
Nicolaou, “Micro-CernVM: slashing the cost of building and deploying virtual machines,”
Journal of Physics: Conference Series, vol. 513, no. 032009, pp. 1-7, 2014.
[4] CERN, “CernVM File System (CernVM-FS),” [Online]. Available:
https://cernvm.cern.ch/portal/filesystem. [Accessed 20 September 2016].
[5] C. Dall and J. Nieh, “KVM/ARM: The Design and Implementation of the Linux ARM Hypervisor,”
March 2014. [Online]. Available: http://systems.cs.columbia.edu/files/wpid-asplos2014-
kvm.pdf. [Accessed 5 September 2016].
[6] Linaro Enterprise Group, “Linaro announces ARM Based Developer Cloud,” 7 March 2016.
[Online]. Available: http://www.linaro.org/news/linaro-announces-arm-based-developer-cloud-
2/. [Accessed 20 September 2016].
[7] ARM, “ARMv8-A Architecture,” ARM Ltd., [Online]. Available:
http://www.arm.com/products/processors/armv8-architecture.php. [Accessed 5 September
2016].
[8] CERN, “Computing,” [Online]. Available: https://home.cern/about/computing. [Accessed 18
September 2016].
[9] B. Tudor and Y. M. Teo, “On understanding the energy consumption of ARM-based multicore
servers,” SIGMETRICS Perform. Eval. Rev., vol. 41, no. 1, pp. 267-278, 2013.
[10] D. Abdurachmanov, “ARM64/AArch64 for Scientific Computing at the CERN CMS Particle
Detector,” September 2015. [Online]. Available:
https://indico.cern.ch/event/443246/contributions/1098100/attachments/1154061/
1658004/Linaro.SFO.Preparation.Talk.Draft.pdf. [Accessed 20 September 2016].
[11] Hewlett Packard Enterprise, “HPE ProLiant m400 Server Cartridge,” Hewlett Packard Enterprise,
[Online]. Available: http://www8.hp.com/us/en/products/proliant-servers/product-
detail.html?oid=7398907. [Accessed 6 September 2016].
[12] GeekBox, “GeekBox - The pioneering versatile open source TV box,” [Online]. Available:
http://www.geekbox.tv/. [Accessed 20 September 2016].
[13] R. Landley, B. Reutner-Fischer and D. Vlasenko, “BusyBox: The Swiss Army Knife of Embedded
Linux,” [Online]. Available: https://busybox.net/about.html. [Accessed 20 September 2016].
Porting of µCernVM to AArch64 Felix Scheffler, 23/09/2016
Main Supervisor: Jakob Blomer ([email protected]) 2nd Supervisor: Gerardo Ganis ([email protected])
6
[14] J. R. Okajima, “AUFS,” [Online]. Available: http://aufs.sourceforge.net/. [Accessed 12
September 2016].
[15] R. Brun and F. Rademakers, “Running the ROOT benchmark suite,” CERN, 8 September 2006.
[Online]. Available: https://root.cern.ch/root/Benchmark.html. [Accessed 7 September 2016].
[16] M. T. Jones, “Virtio: An I/O virtualization framework for Linux - Paravirtualized I/O with KVM
and lguest,” 29 January 2010. [Online]. Available:
http://www.ibm.com/developerworks/library/l-virtio/. [Accessed 15 September 2016].
[17] K. Huynh and S. Hajnoczi, “KVM / QEMU Storage Stack Performance Discussion,” IBM, 3
November 2010. [Online]. Available:
http://www.ibm.com/support/knowledgecenter/linuxonibm/liaav/LPCKVMSSPV2.1.pdf.
[Accessed 15 September 2016].
[18] CERN, “TechLab,” [Online]. Available: https://twiki.cern.ch/twiki/bin/viewauth/IT/TechLab.
[Accessed 20 September 2016].
[19] CERN, “Test4Theory,” [Online]. Available: http://lhcathome.web.cern.ch/projects/test4theory.
[Accessed 20 September 2016].
[20] CERN, “Test4Theory - Server status,” [Online]. Available:
http://lhcathome2.cern.ch/vLHCathome/server_status.php. [Accessed 20 September 2016].
[21] CERN, “MC Production,” [Online]. Available: http://mcplots-dev.cern.ch/production.php.
[Accessed 21 September 2016].
[22] L. Ersek, “Open Virtual Machine Firmware (OVMF) Status Report,” July 2014. [Online].
Available: http://www.linux-kvm.org/downloads/lersek/ovmf-whitepaper-c770f8c.txt.
[Accessed 20 September 2016].
[23] UEFI.org, “Unified Extensible Firmware Interface Specification,” January 2016. [Online].
Available: http://www.uefi.org/sites/default/files/resources/UEFI%20Spec%202_6.pdf.
[Accessed 12 September 2016].
[24] UEFI.org, “UEFI Shell Specification,” 26 January 2016. [Online]. Available:
http://www.uefi.org/sites/default/files/resources/UEFI_Shell_2_2.pdf. [Accessed 12
September 2016].