[IEEE 2012 Third International Conference on Networking and Computing (ICNC) - Okinawa, Japan...
Transcript of [IEEE 2012 Third International Conference on Networking and Computing (ICNC) - Okinawa, Japan...
Evaluation of Performance Degradation in HPCApplications with VM Consolidation
Yuya HashimotoTokyo Institute of Technology
Tokyo, Japan
Kento AidaNational Institute of Informatics/Tokyo Institute of Technology
Tokyo, Japan
Email: [email protected]
Abstract—This paper investigates the performance degradationin application programs running on virtual machines (VMs) in aphysical computing server with a focus on HPC applications.We select three benchmarks as the expected workload in adatacenter: HPC applications, database applications, and webserver applications. Then, we put VMs executing two applicationprograms together in a physical computing server and evaluatethe performance degradation in each application program. Wealso investigate the resource consumption in each applicationprogram and the reason for the performance degradation. Theexperimental results indicate that the interference among VMsexecuting two HPC application programs with high memoryusage and high network I/O in the physical computing serversignificantly degrades the application performance.
I. Introduction
Cloud computing is now widely used as computing plat-
forms. Business communities are changing their computing
platforms from on-site traditional computing servers to cloud
computing services, such as IaaS, PaaS and SaaS. Cloud com-
puting offers users services to utilize computing and storage
resources on demand. The users can access the resources
operated in a datacenter via the Internet whenever they require
resource capacity. Although cloud computing has been pri-
marily used in business communities, academic communities
are currently interested in executing their high-performance
computing (HPC) application programs in a cloud computing
environment.
In the datacenter, virtual machines (VMs) run in physical
computing servers using virtualization technology [1] and use
software that offer services to the users. The virtualization
technology also enables many VMs to execute in fewer phys-
ical computing servers, a feature known as VM consolidation,
so that idle physical computing servers are turned off or oper-
ated in the low power mode. VM consolidation is a promising
approach to improve energy efficiency in the datacenter. For
example, it is reported that a computing server consumes more
than 50% of the peak power even when the server utilization is
10% [2]. Thus, VM consolidation can reduce power consumed
by computing servers with low utilization.
While VM consolidation contributes to improved energy
efficiency in a datacenter, interference among VMs in a
physical computing server may degrade the performance of
application programs running on these VMs. The interference
is caused by access contention for devices, e.g., a CPU core,
memory, and I/O devices in the physical computing server.
Interference among VMs executing HPC application programs
in the physical computing server may be significant, because
HPC application programs require huge resource capacity.
Tackling the problem of performance interference caused by
VM consolidation has been investigated in related literature
[3], [4], [5], [6]. However, the investigations have been limited
to interference among VMs executing non-HPC application
programs, and the performance degradation on HPC applica-
tion programs has not been well investigated.
In this paper, we investigate the performance degradation in
application programs running on VMs in a physical computing
server with a focus on HPC applications. We select three
benchmarks as the expected workload in a datacenter: HPC ap-
plications, database applications, and web server applications.
Then, VMs, executing two application programs from three
benchmarks, are put together in a physical computing server.
We evaluate the performance degradation in each application
program. We also investigate the resource consumption in
each application program and discuss the reason for the
performance degradation observed in the experiments. The
experimental results indicate that the interference among VMs
executing two HPC application programs with high memory
usage and high network I/O in the physical computing server
significantly degrades the application performance.
The rest of the paper is organized as follows: Section II
discusses related work. Section III describes our experimental
setting and benchmarks. Section IV presents experimental
results and briefly discusses the VM consolidation strategy
to reduce performance degradation. Finally, Section V sum-
marizes our contributions and outlines future work.
II. RelatedWork
Performance interference among application programs run-
ning on VMs in a physical computing server has been investi-
gated in [3] and [4]. These researchers conducted experiments
using benchmarks of Unix commands, compilation processes,
a Povray application, micro-benchmark programs, and web
server applications. The work in [5] presents approaches to
create a performance model of application programs run-
ning on VMs in a physical computing server. Those authors
conducted experiments using the benchmark, vConsolidate,
and discussed the performance model. While the goal of the
above work is similar to that of this paper, that work did not
2012 Third International Conference on Networking and Computing
978-0-7695-4893-7/12 $26.00 © 2012 IEEE
DOI 10.1109/ICNC.2012.50
273
investigate performance interference among HPC application
programs running on VMs in a physical computing server.
The work in [6] proposes a VM consolidation algorithm for
a scientific workflow application program. The authors studied
the correlation among workloads with different resource usage
and investigated the impact of the workload interference on the
application performance. The proposed algorithm decides the
placement of VMs using the hierarchical clustering technique.
The performance of the proposed algorithm was evaluated
using a volume rendering program and synthetic application
programs. However, performance interference among these
applications was not discussed.
III. Experimental Setting
This section describes our experimental setting and the
benchmarks used in the experiments.
A. Server Configuration and Measurement Tool
The experiments are conducted on a physical computing
server with dual Intel Xeon L5520 2.27 GHz, 16 GB memory,
and 1 TB SATA HDD. The operating system on the physical
computing server is Ubuntu Server 10.04. We use a Kernel-
based Virtual Machine (KVM) for virtualization and operate
OpenNebula to control VMs running in the physical comput-
ing server. Each CPU has four cores and hyper-threading is
enabled, that is, we can utilize 16 virtual CPU cores in the
physical computing server. We create the VM image with Intel
Xeon 2.27 GHz, 2 GB memory, and 10 GB HDD. The CPU
on the VM has one core and hyper-threading is enabled. The
operating system on the VM is also Ubuntu Server 10.04.
We measure resource consumption, CPU utilization, mem-
ory usage, network I/O, and disk I/O during the execution
of application programs by using the dstat command. The
interference by the measurement tool on the application per-
formance should be small. We conducted preliminary experi-
ments to compare the execution times of application programs
with and without dstat. In other words, we executed dstat
every second to collect CPU utilization, memory usage, net-
work I/O, and disk I/O during the execution of each application
program and compared the execution time with the original
execution time. The results show that the average increase
of the application execution time with the measurement tool
is less than 0.06% and the maximum increase is 1.96%. We
believe that this level of interference is acceptable.
B. Benchmarks
Cloud computing is now widely used in business commu-
nities, and database and web server applications are typical
workloads. Furthermore, academic communities are currently
interested in executing HPC applications in the cloud comput-
ing environment. We expect that the workload in the datacenter
will consist of HPC applications, database applications, and
web server applications in the near future. Thus, we selected
three types of benchmarks in our experiments.
1) HPC Applications: We selected the NAS Parallel Bench-
marks (NPB) [7] as HPC applications. NPB is widely used
in the HPC community as a benchmark to evaluate the
performance of parallel supercomputers. NPB consists of eight
programs representing the characteristics of computational
fluid dynamics (CFD) applications, as listed below:
• IS: kernel code of Integer Sort
• EP: kernel code of Embarrassingly Parallel computation
using the Monte Carlo method
• CG: kernel code of the Conjugate Gradient method
• MG: kernel code of the Multi-Grid method
• FT: kernel code of the discrete 3D fast Fourier Transform
• BT: application code of the Block Tri-diagonal solver
• SP: application code of the Scalar Pensa-diagonal solver
• LU: application code of the Lower-Upper Gauss-Seidel
solver
2) Database Application: We selected MySQL [8] and
Sysbench [9] as database applications. MySQL is popular
open source database software and we execute MySQL on
the database server in our experiments. We execute Sysbench,
which generates the workload for the database server, on the
client machine. Sysbench consists of benchmark programs for
evaluating the following performances of database servers:
• file I/O performance
• scheduler performance
• memory allocation and transfer speed
• POSIX thread implementation performance
• database server performance (on-line transaction process-
ing; OLTP benchmark)
We execute the OLTP benchmark in our experiments, where
a mixture of search and query processes are executed.3) Web Server Application: We selected Apache [10],
which is popular open source web server software. We created
client scripts to generate the workload for the web server.
While some benchmarks to generate the web server workload
are available, e.g., ApacheBench [11], they are usually used to
evaluate the robustness of a web server and generate too much
load on the web server. We did not use the existing bench-
marks, because the goal of our experiments is to investigate
performance with the realistic load.
IV. Experimental Results
This section presents our experimental results. First, we
investigate the resource consumption of a single program
included in NPB. The results are used to analyze the results
in the following experiments. Then, we present the experi-
mental results on the performance degradation of application
programs with VM consolidation.
A. Resource Consumption by a Single Application
Before investigating the performance degradation of appli-
cation programs with VM consolidation, we measured the
resource consumption in a single program included in NPB.
In the experiments, we execute each NPB program with the
problem size of CLASS C. Each program is executed with
eight MPI processes, where two processes are assigned to a
274
��
���
���
���
���
����
����
�� �� ���� ��� ���� ��� ��� ��
����
����� �
����
���������� �����
�� ��
�����
Fig. 1. Network I/O in NPB FT
��
���
���
���
���
����
�� �� ���� ��� ���� ��� ��� ��
���������
��
��
����������������
����
����
Fig. 2. CPU utilization in NPB FT
VM with two virtual CPU cores, and a total of four VMs are
utilized in the physical computing server. Then, we measured
CPU utilization, memory usage, network I/O, and disk I/O
during execution of the program.
Figure 1 through Figure 4 show the resource consumption
on the single VM during the execution of FT in the NPB. FT
computes the discrete 3D fast Fourier Transform and performs
communication among all processes during its execution. This
procedure increases the network I/O, as presented in Figure 1.
The high network I/O causes much I/O wait time in the CPU
and the CPU utilization fluctuates, as presented in Figure 2.
We also see the fluctuation of the CPU utilization and the high
network I/O in the results of IS.
For the results of other programs in the NPB, the network
I/O is small and the CPU utilization is stable and high. For
example, Figure 5 and Figure 6 show the CPU utilization in
LU and the network I/O in EP, respectively. The memory usage
in all programs is stable, as in the example of FT in Figure 3.
All programs except EP consume high memory capacity, e.g.,
1,800 MB in FT. The disk I/O in all programs is sporadic, as
in the example of FT in Figure 4.
��
����
�����
�����
�����
�����
�� ��� ���� ���� ���� ���� ���� ����
���
��������
���
� �������������
Fig. 3. Memory usage in NPB FT
�
��
� �
���
� �
���
� �
���
� � � � � �� � � � �� � � � �� �
���������
��� �����������
�����
�����
Fig. 4. Disk I/O in NPB FT
Table I summarizes the resource consumption in NPB
programs on the VM. We omit the results of CG, BT and
SP because they exhibit similar results to others in the table.
Table II presents those in the physical computing server,
on which four VMs run. The results on Table II indicate
that resources in the physical computing server are not fully
occupied by a single program. This means that there is room to
put VMs executing another application program in the physical
computing server.
B. Performance Degradation with VM Consolidation
In this section, we present the experimental results of the
performance degradation of the NPB benchmark programs
with VM consolidation. We choose two NPB programs and
execute the programs in the physical computing server. In
other words, we execute each program on four VMs (with
eight virtual CPU cores) and put all eight VMs in the physical
computing server. All virtual CPU cores are utilized by the two
programs. The experimental results in Section IV-A indicate
that all NPB programs highly utilize the CPU resources. It is
obvious that the performance is significantly degraded if two
275
�
�
��
��
�
��
� �� � �� �� ��
����������
���
��
��� ������������
��������
Fig. 5. CPU utilization in NPB LU
��
�
���
��
���
��
��
�� ��� ��� ��� ��� ���� ���� ����
����
����� �
�����
���������� �����
�� ��
�����
Fig. 6. Network I/O in NPB EP
programs share the virtual CPU core due to excessive con-
text switching. Furthermore, the performance degradation by
contention for CPU resources between computation intensive
programs has been discussed in the literature. Thus, we avoid
two programs sharing a virtual CPU core in order to investigate
the performance degradation caused for other reasons.
We also execute the NPB program with a program in
other benchmarks, or MySQL and Apache. In this experiment,
we choose one NPB program and execute the program on
four VMs. Then, we execute the MySQL server program on
the other four VMs and put all eight VMs in the physical
computing server. We also perform the same experiments using
Apache. For MySQL, we execute the OLTP benchmark in
Sysbench with the complex mode. In this setting, the bench-
mark program repeats the execution of 10 point queries and
eight complex queries (e.g., a range query). Our preliminary
experiments show that a cache on the database server was filled
3,500 seconds after the start of the application program. Thus,
we measure the resource consumption 3,500 seconds after the
start of the application program, so that we can investigate
the resource consumption in the steady state. For Apache, we
execute the Apache HTTP server on four VMs and execute
TABLE IResource Consumption in a VM
EP MG IS LU FTCPU high high high high highutilization stable stable unstable stable unstablemax. CPUutilization[%]
100 99 97 99 100
network I/O low low high low highsporadic frequent frequent frequent frequent
max.networkI/O [Mbps]
< 0.1 10.0 72.8 3.3 110.0
memory low high high high highusage stable stable stable stable stablemax.memoryusage [MB]
13.5 926.2 421.6 216.7 1811.6
disk I/O sporadic sporadic sporadic sporadic sporadicmax. diskI/O [MB]
3.6 3.7 3.3 2.1 29.7
TABLE IIResource consumption in the physical computing server
EP MG IS LU FTmax. CPU utilization[%]
51 47 49 52 56
max. network I/O[Mbps]
< 0.1 35.8 305.8 17.3 272.2
max. memory usage[MB]
162.4 3909.2 1832.6 831.7 7719.1
max. disk I/O [MB] 10.4 10.1 11.4 11.3 28.3
the script program to access the server on the client machine.
The script program recursively downloads the contents on the
HTTP server. The contents are created by imitating those on
commercial web sites in Japan [12].
Table III shows the performance degradation of NPB pro-
grams when we execute the program with another program,
as indicated in the column of “co-executed program” in
the physical computing server. We compute the performance
degradation, Dper f , by the formula below:
Dper f =Tconsolidation − Tsingle
Tsingle(1)
Here, Tconsolidation indicates the execution time of the pro-
gram when we execute the program with another program,
and Tsingle means the execution time of the program when
we execute the program alone. For example, the performance
degradation of MG when we execute MG with EP is 48. This
result means that the execution time of MG running with EP
is 48% longer than the execution time when we execute MG
alone.
Table III shows that the performances of FT and IS are sig-
nificantly degraded among the programs, and the degradation
in FT is the worst. From the results in the previous section,
we can see that both programs require high memory capacity
as well as high and frequent network I/O. The performance
degradation in EP is minimum, because EP requires less
memory capacity and low network I/O. The results indicate
that the performance of an application program requiring high
276
TABLE IIIApplication performance degradation with VM consolidation
performance degradationco-executedapplications EP MG IS LU FT
EP - 48 92 31 125MG 18 - 110 86 188IS 22 109 - 34 159LU 18 134 126 - 167FT 19 59 87 62 -
MySQL 1 13 6 9 14Apache 3 12 6 5 14
memory capacity and high network I/O is highly affected by
VM consolidation. This phenomenon is also observed in the
comparison of the performance degradation between two ap-
plication programs in Table III. For example, the performance
degradation in FT running with MG is 188, while that in MG
running with FT is 59. The performance degradation in FT is
much larger than that in MG, and we can see that FT requires
more memory capacity and network I/O.
For the performance degradation in FT, FT is affected much
by MG and IS. Table I shows that MG consumes 926.2
MB memory capacity and IS performs 72.8 Mbps network
I/O. This result indicates that running two programs with
high memory usage or network I/O causes high overhead in
the hypervisor. Note that the memory usage in the physical
computing server when we execute FT and MG is 11,538
MB and the free memory space is still available during the
experiments.
The performance degradation of all programs in the NPB
when we execute the program with MySQL or Apache is
small. The reason is that both MySQL and Apache consume
fewer resources compared with the NPB programs.
C. Discussion
Our experimental results indicate that the interference
among VMs executing two HPC application programs with
high memory usage and high network I/O in the physical
computing server can significantly degrade the application per-
formance. The results lead us to a straightforward idea: putting
VMs executing an HPC application program together with
VMs executing a non-HPC application program to guarantee
application performance with VM consolidation. We can see
the effectiveness of the simple strategy in Table III, where the
performance of the NPB programs is not much affected by
MySQL and Apache. However, Table III also indicates that
VMs can execute two HPC application programs together in
the physical computing server with low performance degrada-
tion. For example, the performance degradation in LU is low
when we execute LU with EP or IS. The reason is that LU
requires relatively low resource capacity compared with others.
The results suggest that we can put VMs executing HPC
application programs together in the physical computing server
with low application performance degradation by carefully
selecting the application programs.
We need more quantitative analyses to create a VM con-
solidation strategy for executing HPC applications with a per-
formance guarantee, and this is our future work. However, we
believe that this paper provides some preliminary experimental
results for the discussion of a VM consolidation strategy.
V. Conclusion
We present experimental results to discuss the performance
degradation of application programs on VMs running in a
physical computing server with a focus on HPC applications.
We select NAS Parallel Benchmarks (NPB) as the HPC
benchmark and investigate the performance degradation of
the program when we put VMs running an HPC program
together with VMs running another NPB program or non-HPC
benchmark programs. The experimental results indicate that
the interference among VMs executing two HPC application
programs with high memory usage and high network I/O
in a physical computing server significantly degrades the
application performance. However, the results also suggest
that VMs do have the capability to execute HPC application
programs together in a physical computing server with low
application performance degradation. Our future work includes
more quantitative analysis of the application performance with
VM consolidation and creating the sophisticated performance
model.
Acknowledgment
This work was supported in part by JSPS KAKENHI Grant
Number 24240006.
References
[1] P. Barham, B. Dragovic, K. Fraser, S. Hand, T. Harris, A. Ho, R. Neuge-bauer, I. Pratt, and A. Warfield, “Xen and the art of virtualization,”SIGOPS Oper. Syst. Rev., vol. 37, no. 5, pp. 164–177, Oct. 2003.
[2] S. Srikantaiah, A. Kansal, and F. Zhao, “Energy Aware Consolidationfor Cloud Computing,” in Proc. of the 2008 conference on Power awarecomputing and systems (HotPower ’08). USENIX Association, 2008.
[3] Y. Koh, R. C. Knauerhase, P. Brett, M. Bowman, Z. Wen, and C. Pu, “AnAnalysis of Performance Interference Effects in Virtual Environments,”in Proc. of IEEE International Symposium on Performance Analysis ofSystems & Software (ISPASS 2007). IEEE Computer Society, 2007, pp.200–209.
[4] X. Pu, L. Liu, Y. Mei, S. Sivathanu, Y. Koh, and C. Pu, “UnderstandingPerformance Interference of I/O Workload in Virtualized Cloud Envi-ronments,” in Proc. of the 2010 IEEE 3rd International Conference onCloud Computing (CLOUD ’10). IEEE Computer Society, 2010, pp.51–58.
[5] O. Tickoo, R. Iyer, R. Illikkal, and D. Newell, “Modeling virtual machineperformance: challenges and approaches,” SIGMETRICS Perform. Eval.Rev., vol. 37, no. 3, pp. 55–60, 2010.
[6] Q. Zhu, J. Zhu, and G. Agrawa, “Power-aware Consolidation of Sci-entific Workflows in Virtualized Environments,” in Proc. of the 2010ACM/IEEE International Conference for High Performance Computing,Networking, Storage and Analysis (SC10). IEEE Computer Society,2010, pp. 1–12.
[7] NASA. NAS Parallel Benchmarks. [Online]. Available:http://www.nas.nasa.gov/publications/npb.html
[8] ORACLE. MySQL. [Online]. Available: http://dev.mysql.com/[9] Alexey Kopytov. SysBench: a system performance benchmark. [Online].
Available: http://sysbench.sourceforge.net/[10] The Apache Software Foundation. Apache HTTP SERVER PROJECT.
[Online]. Available: http://httpd.apache.org/[11] ——. ab - Apache HTTP server benchmarking tool. [Online]. Available:
http://httpd.apache.org/docs/2.0/programs/ab.html[12] YAHOO Japan. [Online]. Available: http://www.yahoo.co.jp
277