Container-based Cluster Management Platform for ......Docker is a lightweight and powerful open...

Container-based Cluster Management Platformfor Distributed Computing

Ju-Won Park and Jaegyoon HahmDiv. of Supercomputing, KISTI,

245 Daehak-ro, Yuseong-gu, Daejeon 305-806, Korea

Abstract— Several fields of science have traditionally de-manded large-scale workflows support, which requires thou-sands of CPU cores or more. Since users’ demands forsoftware packages and configuration is the difference, anapproach to making available in real time a service en-vironment desired by users without significant challengesfor administrators is necessary. In this paper, we present acontainer based cluster management platform and introducean implementation case to minimize performance declineand to provide a dynamic distributed computing environmentdesired by users. This paper makes the following contri-butions. First, a container based virtualization technologyis assimilated with resource and job management systemto expand its applicability to support large-scale scientificworkflows. Second, an implementation case in which dockerand HTCondor are interlocked with each other is introduced.Lastly, docker and native performance comparison using twowidely known benchmark tools and Monte-Carlo simulationresults implemented using various programming languagesare presented.

Keywords: Container-based virtualization, Docker, HTCondor,Distributed computing

1. IntroductionTraditionally, high energy physics, oceanography, mete-

orology, astronomy, and space science require large-scaleworkflows demanding CPUs consisting of more than severalthousand cores [1], [2]. These scientific workflows come ina variety of forms, ranging from high throughput computing(HTC) combining millions of loosely-coupled tasks to highperformance computing (HPC) referring to a tightly-coupledfrom such as message passing interface (MPI) tasks beingprocessed simultaneously by several thousand cores. Inorder to handle such large-scale scientific workflows, large-capacity cluster systems such as supercomputers are widelyused. Such a large-capacity cluster system usually supportsRJM (Resource and job Management) functions, enablingmultitude of uses to share resources fairly. However, as theresources are shared by multiple users and organizations,there still exist many challenges, the biggest of which isthe difference in users’ demands for software packages andconfigurations. Because of these challenges, in practice anoperating system (OS) and software stacks are often once

installed and kept unchanged for a very long time [3]. Theserigid utilization practices pose many constraints to newtechnology development initiatives, dampening expectationof performance increase following software version upgrade.

To overcome such issues of rigid utilization practices anddeliver more dynamic service environments to users, manystudies - aimed at configuring and providing users withclusters using virtualization resources built on Xen or KVM-based virtualization technologies - have been conducted [4],[5], [6]. In fact, many scientists are working on researches,using the VM available from Amazon EC2. Although sig-nificant performance improvement has been achieved thanksto the improvement of hypervisor technology and develop-ment of various techniques such as passthrough approach,overhead incurred by hypervisor inevitably compromisesperformance [7]. Because of such constraints, container-based virtualization technologies such as Linux-VServer,OpenVZ, and LXC are frequently utilized recently [7], [8],[9].

This paper presents a container-based cluster managementplatform and introduces an implementation case to minimizeperformance decline and to provide a dynamic distributedcomputing environment desired by users. As container-based virtualization technology compromises performanceless than hypervisor-based one, the former can reach near-native performance. This paper is designed to contribute inthree regards. First, container-based virtualization technol-ogy is assimilated with RJM to expand its applicability tocluster environment in a bid to support large-scale scientificworkflows with near-native performance. As conventionalcontainer resource utilization approaches were focused onproviding user-customized service environment in a singlecomputer, they were not suitable for supporting scientificworkflows utilizing resources on a large scale. Second,an implementation case in which docker [10], which is acontainer-based virtualization technology using LXC, andHTCondor [11] frequently used in HTC applications areinterlocked with each other is introduced to present amethod of implementing with ease the approach presentedherein. Third, docker and native performance comparisonusing widely known benchmarking tools is presented andMonte-Carlo simulation results implemented using variousprogramming languages in an environment where HTCondorand docker are interlocked with each other are presented.

34 Int'l Conf. Par. and Dist. Proc. Tech. and Appl. | PDPTA'15 |

This paper is organized as follows. The motivation andrelated work are described in Section 2. Then, detaileddescriptions of the proposed approach and implementationare proposed in Section 3. Next, Section 4 shows theperformance of cluster system implemented on HTCondorand docker. Finally, we conclude this paper in Section 5.

2. Background2.1 Motivation

Most large-capacity cluster systems are shared by a mul-titude of individual and organizational users. As these usersrequire way different service environments (OS, softwarepackage, configuration, etc.) in this setting, it is significantlychallenging to meet their varying requirements in entirety.In particular, in-house codes developed by scientists on theirown require specific OS and library versions.

To overcome these constraints, PLSI1 provides users withcompilers supported by different clusters, mathematical li-braries, and installation paths on its website. Then, usersneed to find and access clusters where they can compiletheir own codes for execution [12].

This environment poses challenges to both administratorsand users:

• Challenges for administrators: First of all, a lot ofpackages required by users should be installed on allcomputing nodes upon request. Given that most clustershave 500 or more computing nodes and numerouslibraries and versions, this administration approach in-volves very daunting challenge. In addition, if a user-requested kernel version is different from OS alreadyinstalled, it is difficult to fulfill such request. Becauseof this issue, service approach is commonly availablewithout significant modification except for some bug fixand security enhancement in most clusters. This rigidservice approach cannot support new technology de-velopment and compromises performance improvementfollowing compiler version upgrade.

• Challenges for users: A user always has to confirmin advance if essential packages are available and,if not available, sends a request to administrator. Inparticular, if each cluster has different administrator inan environment where multiple clusters are interlockedwith each other like in PLSI, a user has to requestmultiple administrators to install necessary softwarepackages. Therefore, it is difficult for users to beprovided with execution environments in real time asthey desire. Because of these issues, scientists runtheir application programs using public cloud servicesdespite performance degradation.

1To ensure that supercomputing resources are provided to researchers asefficiently as possible, a project named Partnership & Leadership for theNationwide Supercomputing Infrastructure (PLSI) is carried out in Korea,aiming to establish a unified system of resources utilization by integratingsupercomputing resources across the nation.

An approach to making available in real time a serviceenvironment desired by users without significant challengesfor administrators is necessary.

2.2 Related WorkThere has been good research activities in addressing

the performance of virtualized resource in cloud computingenvironments [13], [14], [15], [16]. Walker [13] conductedthe study on HPC in cloud by benchmarking Amazon EC2.Then, He et al. [14] extend to evaluating the technicalcapability of current public cloud computing platforms, andtheir suitability for running scientific applications, especiallyHigh Performance Computing (HPC) applications. Jacksonet al. [15] represents the evaluation comparing conventionalHPC platforms to Amazon EC2, using real applicationsrepresentative of the workload at a typical supercomputingcenter. To evaluate the performance of real scientific work-loads, it uses the NERSC benchmarking framework [17].Iosup et al. [16] analyze the performance of cloud computingservices for scientific computing workloads. Specifically,it focused on the real scientific computing workloads ofMany-Task Computing (MTC) users. Despite these manyactivities, the use of virtualization has been traditionallylaid off in most HPC facilities due to inherent performanceoverhead [3].

Recently, container-based virtualization systems (e.g.,Linux VServer, OpenVZ, and Linux Containers) are in-vestigated since it offer a lightweight virtualization layer,which promises a near-native performance [7], [8], [9].In [7], the performance of three well known open sourcehypervisors, KVM, OpenVZ, and Xen was evaluated inthe context of HPC. Their results showed that OpenVZhad the best performance for I/O throughput among them.Soltesz et al. [8] described a virtualization approach whichis a synthesis of resource containers and security containersapplied to general-purpose, time-shared operating systems.They conducted a network bandwidth benchmark using iperfand macro benchmarks for CPU and disk I/O intensive.From their results, I/O related benchmarks perform worseon Xen when compared to Linux VServer. Xavier et al. [9]conducted a number of experiments of container-based vir-tualized for HPC. Their results showed that the container-based virtualization system had better performance thantraditional hypervisor-based virtualization. Furthermore, theydescribed that LXC demonstrated to be the most suitable ofthe container-based system for HPC since the performancedegradation can be offset by the easy of management.

Docker is a lightweight and powerful open source con-tainer virtualization technology combined with a work flowfor building and containerizing applications [10]. It providesa toolset and unified API for managing kernel-level tech-nologies, such as containers, cgroups, namespace and unionfile systems. Therefore, docker lets us quickly assembleapplications from components and eliminates the friction

Int'l Conf. Par. and Dist. Proc. Tech. and Appl. | PDPTA'15 | 35

Fig. 1: Container-based cluster management platform architecture.

between development and production environments.

3. Container-based cluster managementplatform3.1 Approach

Fig. 1 shows the proposed approach. In general, clustersystems supporting scientific workflows consist of a front-end node allocating computing resources in response to userrequests and multiple execute nodes running actual tasks. Inour approach, a user can submit tasks to the front-end nodeand multiple execute nodes regularly measure node resourcestatus and report measured data to the front-end node.If resources are available, the front-end node dispatchestasks from a queue in accordance with FIFO, round-robin,priority-based preemptive or other scheduling algorithm andmatch-makes them with available resources for resourceallocation. Upon completion of resource allocation, filesneeded for actual task execution are transferred to executenodes. Execute nodes receive tasks to be executed fromthe front-end node and run application programs based oncontainer-based virtualization layer. Execution results aretransferred back to the front-node that submitted the tasksinitially and forwarded to a user.

3.2 ImplementationIn our implementation case, HTCondor was used as a

job and resource scheduler and docker for container-basedvirtualization.

3.2.1 HTCondor daemons

In HTConodr pool, each machine can serve a variety ofroles. Then, different daemons are running on the machinebased on the role [11]. For the sake of simplicity, we focuson the six essential daemons in this paper.

• SCHEDD: This daemon takes responsibility for re-source requests to the HTCondor pool. For this, itadvertises the status of job queue and claims availableresources to serve those requests.

• STARTD: This daemon takes responsibility for resourcemanagement of execute node. It advertises certain at-tributes about the execute node and is responsible forenforcing the policy that the resource owner configures.

• COLLECTOR: It collects all the information aboutthe status of an HTCondor pool. All other demonsperiodically send ClassAd2 updates to COLLECTOR.

2ClassAd is a scheme-free resource allocation language to representarbitrary services and constraints on their allocation [18]


Fig. 2: Container-based cluster management platform implementation.

These ClassAds contain the state of the daemons, theresources, and the queue in the HTCondor pool.

• NEGOTIATOR: It is responsible for the match makingin the HTCondor pool. Specifically, it contacts eachSCHEDD that has waiting resource request and allocateavailable resources to those requests.

• SHADOW: It acts as the resource manager for therequest. For example, Jobs that are linked for standarduniverse perform remote system call using this daemon.It runs on the machine where a job was submitted.

• STARTER: It sets up the execution environment andmonitors the running job.

3.2.2 Implementation using HTCondor and Docker

Table 1: Hardware and software specifications.Hardware spec.

CPU Intel(R) Xeon(R) [email protected] * 2ea

Memory 32GBHDD Western Digital WD 500GB 7200 RPM

Software spec.OS CentOS release 6.5 Final

Job & resourcescheduler HTCondor 8.0.7

Container-basedvirtualization Docker 1.1.2

Image management Docker-registry server (dev) 0.8.0

Table 1 shows the hardware and software specificationsof the system used herein. First of all, as shown in Fig. 2, ascientist creates a dockerized application image in advancefor running his scientific workflow, pushes it to the dockerregistry, and creates a shell script file (launch_docker.sh)to launch dockerized application in execute nodes. Cautionneeds to be taken when a file to be used as argument mustbe forwarded into the container. To this end, the workingdirectory of the host where execution file reside has to bemounted into the container, using –v option of the docker asfollow:

Table 2: launch_docker.sh.

#!/bin/bashsudo docker run -v $(pwd):/data docker_image/data/execute_file

To submit the shell script prepared in this manner tothe HTCondor scheduler, a HTCondor job description file(Table. 3 is an example of job script) is created and submittedto the HTCondor SCHEDD. In the case of HTCondor,STARTD daemon reports the resource status of executablemachines and SCHEDD daemon reports the job queue statusto COLLECTOR daemon in ClassAd format at regularinterval [11]. NEGOTIATOR match-makes job ClassAd andresource ClassAd based on data collected by COLLECTOR


to determine execute machine where tasks are to be executed.Once execute host to run tasks is determined via the match-making by NEGOTIATOR, SCHEDD and STARTD launchSHADOW and STARTER respectively, then a session isestablished between the two launched daemons. Through thissession, launch_docker.sh file and argument files required forexecution are transferred to the execute host and STARTERexecutes script file. At this time, STARTER daemon checksif dockerized application image is available in the localhost where STARTER daemon is run and, if not available,pulls the dockerized image by the user in advance from thedocker registry to execute dockerized application. Upon taskcompletion, result files are transferred to submit node viaSHADOW daemon and the user can confirm the results atsubmit node to which task was submitted.

Table 3: An example of HTCondor job script.

universe = vanillaexecutable = launch_docker.shoutput = output filetransfer_input_files = execute_filequeue 100

Utilizing HTCondor and docker in this fashion bringsabout two advantages as follows. First, it is possible toimplement with ease a container-based cluster managementplatform proposed. Secondly, such approach is applicable toa multi-cluster system like PLSI since HTCondor ensuresthat multiple cluster resources are utilized for a singlescientific workflow.

4. EvaluationThis section analyzes the performance of docker, a

container-based virtualization approach, and evaluates theperformance of cluster system implemented on HTCondorand docker.

4.1 Micro-BenchmarksTwo widely known benchmark tools, unixbench and sys-

bench, were used to measure the performance of docker withthe following measurement results:

• unixbench: The unixbench is a benchmark tool designedto measure overall system performance and providea variety of benchmark results such as Whetstone,Drystone, file copy, pipe throughput, etc. Fig. 3 showsthe index values of each item measured by unixbenchtool. The index values of docker for all items except forthe pipe-based context switching are found to be 90% ormore when compared to the native performance. Pipe-based context switching test measures system perfor-mance by increasing integer through a pipe. The pipe-based context switching test is more like a real-world

Fig. 3: unixbench benchmark results.

Fig. 4: The system benchmark index score.

application [19]. In this value, docker shows 75% of thenative performance. Fig. 4 shows the system benchmarkindex scores measured at 5699.5 and 5492.7 for nativeperformance and docker shows its performance at 96%when compared to the native performance.

• sysbench: The sysbench is a tool using a variety ofscenarios to measure CPU, memory, and File I/O per-formance results. Table 4 shows the benchmark results.First item shows CPU time taken to process 10,000events of arithmetic operations using decimal fractions.It is confirmed that docker and native performancesare identical. Second item shows sequential or randommemory I/O performance in an assigned memory bufferand docker is measured to be better than native per-formance by 3% ∼ 5%3. Last item shows sequentialor random file I/O performance results using 128 GBtest files created in local disk. Form Table. 4, it can be

3This benchmark results conflict with our intuitive understanding. Theanalysis of this problem should be explored in the near future.


confirmed that docker is found to show performancealmost identical to native performance.

When the unixbench and sysbench measurements arecompared in this section, docker is found to show nosignificant performance decline in comparison with nativeperformance. Many studies conducted recently also showthat container-based virtualization technology has a near-native performance [9].

Table 4: Sysbench benchmark results.

Test item Option Docker NativeCPU Total time 24.5 sec 24.5 sec

Memory

Sequence write 2.21 GB/sec 2.14 GB/secRandom write 3.41 GB/sec 3.25 GB/secSequence read 3.82 GB/sec 3.63 GB/secRandom read 3.67 GB/sec 3.50 GB/sec

File I/OSequence write 104.8 MB/sec 105.2 MB/secSequence read 91.7 MB/sec 91.7 MB/sec

Combined R/W 1.5 MB/sec 1.5 MB/sec

4.2 Macro-Benchmarks

Fig. 5: The container-based cluster management systemusing docker and HTCondor.

This section presents the performance measurements of acluster system implemented on docker and HTCondor forsupporting scientific workflows. First of all, as illustratedin Fig. 5, HTCondor pool consisting of 1 central managernode and 3 execute nodes in accordance with the hard warespecification in Table 1 was configured. In addition, a dockerregistry was installed in the central manager node and dockerclients in the execute nodes.

To measure the performance of the system implementedin this manner, a program to calculate Pi with a Monte-Carlotechnique was implemented using C, JAVA, Python, and R.It picks points at random inside the square 10,000,000 timesand checks to see if the point is inside the circle. Then,a simulation workflow to run the implemented program

100 times to reduce errors was created and submitted toHTCondor scheduler to compare execution time.

Fig. 6: Monte-Carlo simulation results (100 times).

Fig. 6 shows the Monte-Carlo simulation results withand without docker. As the figure confirms, difference indocker and native execution times is shown, dependingon implemented languages. Namely, when executed withdocker, execution time significantly increased 3.2 and 3.6folds when compared to native performance in the casesof C and JAVA respectively while it increased only by18% and 9% respectively in the cases of Python and R.The biggest factor underlying such difference is that in asimulation using docker, image loading time is differentamong implementation languages.

Fig. 7: Monte-Carlo simulation results (1 time).

Fig. 7 shows one simulation execution time of docker andnative status. To exclude the possibility of image transfer


time, the dockerized image had been pulled in advanceon each execute host. As the figure shows, execution timeincreased by 8% and 2% respectively in the cases of pythonand R while it rose very significantly 3.9 and 6.9 foldsrespectively in the cases of C and JAVA.

5. ConclusionIn this paper, we presented a container-based cluster man-

agement platform to provide a service environment desiredby users. To provide a dynamic service environment, thevirtualization technology is widely used. However, due tothe inevitable overhead of hypervisor-based virtualization,container-based virtualization technologies such as LinuxVServer, OpenVZ, and LXC are utilized recently. In this pa-per, we introduced an implementation case in which dockerand HTCondor are interlocked. In addition, we conductedmicro-benchmarks using unixbench and sysbench for CPU,memory, and file I/O performance and macro-benchmarksusing Monte-Carlo simulation workflow for the performanceof cluster system. Our results showed that docker had anear-native performance and image loading time is differentamong implementation languages.

References[1] E. Deelman, D. Gannon, M. Shields, and I. Taylor, “Workflows and e-

science: An overview of workflow system features and capabilities,”Future Generation Computer Systems, vol. 25, no. 5, pp. 528–540,2009.

[2] Y. Gil, E. Deelman, M. Ellisman, T. Fahringer, G. Fox, D. Gannon,C. Goble, M. Livny, L. Moreau, and J. Myers, “Examining thechallenges of scientific workflows,” IEEE Computer, vol. 40, no. 12,pp. 24–32, Dec 2007.

[3] K. Chen, J. Xin, and W. Zheng, “Virtualcluster: Customizing thecluster environment through virtual machines,” in Proc. of IEEE/IFIPInternational Conference on Embedded and Ubiquitous Computing,2008, vol. 2, Dec 2008, pp. 411–416.

[4] P. Ruth, P. McGachey, and D. Xu, “Viocluster: Virtualization fordynamic computational domains,” in Proc. of IEEE InternationalCluster Computing, 2005, Sept 2005, pp. 1–10.

[5] M. A. Murphy, B. Kagey, M. Fenn, and S. Goasguen, “Dynamicprovisioning of virtual organization clusters,” in Proc. of the 9thIEEE/ACM International Symposium on Cluster Computing and theGrid, Washington, DC, USA, 2009, pp. 364–371.

[6] P. Marshall, K. Keahey, and T. Freeman, “Elastic site: Using cloudsto elastically extend site resources,” in Proc. of the 10th IEEE/ACMInternational Conference on Cluster, Cloud and Grid Computing,Washington, DC, USA, 2010, pp. 43–52.

[7] N. Regola and J.-C. Ducom, “Recommendations for virtualizationtechnologies in high performance computing,” in Proc. of IEEESecond International Conference on Cloud Computing Technologyand Science (CloudCom), Nov 2010, pp. 409–416.

[8] S. Soltesz, H. Pötzl, M. E. Fiuczynski, A. Bavier, and L. Peterson,“Container-based operating system virtualization: A scalable, high-performance alternative to hypervisors,” SIGOPS Oper. Syst. Rev.,vol. 41, no. 3, pp. 275–287, Mar. 2007.

[9] M. Xavier, M. Neves, F. Rossi, T. Ferreto, T. Lange, and C. De Rose,“Performance evaluation of container-based virtualization for highperformance computing environments,” in Proc. of 21st EuromicroInternational Conference on Parallel, Distributed and Network-BasedProcessing (PDP), 2013, Feb 2013, pp. 233–240.

[10] “Docker.” [Online]. Available: www.docker.com

[11] D. Thain, T. Tannenbaum, and M. Livny, “Distributed computingin practice: the condor experience.” Concurrency - Practice andExperience, vol. 17, no. 2-4, pp. 323–356, 2005.

[12] “PLSI: Partnership & leadership for the nationwide supercomputinginfrastructure.” [Online]. Available: http://www.plsi.or.kr

[13] E. Walker, “Benchmarking amazon EC2 for high-performance scien-tific computing,” LOGIN, vol. 33, pp. 18–23, 2008.

[14] Q. He, S. Zhou, B. Kobler, D. Duffy, and T. McGlynn, “Case studyfor running HPC applications in public clouds,” in Proc. of the19th ACM International Symposium on High Performance DistributedComputing. New York, NY, USA: ACM, 2010, pp. 395–401.

[15] K. Jackson, L. Ramakrishnan, K. Muriki, S. Canon, S. Cholia, J. Shalf,H. J. Wasserman, and N. Wright, “Performance analysis of highperformance computing applications on the amazon web servicescloud,” in Proc. of IEEE Second International Conference on CloudComputing Technology and Science (CloudCom), Nov 2010, pp. 159–168.

[16] A. Iosup, S. Ostermann, M. N. Yigitbasi, R. Prodan, T. Fahringer, andD. H. Epema, “Performance analysis of cloud computing services formany-tasks scientific computing,” IEEE Transactions on Parallel andDistributed Systems, vol. 22, no. 6, pp. 931–945, 2011.

[17] “NERSC: National energy research scientific computing center.”[Online]. Available: https://www.nersc.gov

[18] R. Raman, M. Livny, and M. Solomon, “Matchmaking: distributedresource management for high throughput computing,” in Proc. of TheSeventh International Symposium on High Performance DistributedComputing, 1998., Jul 1998, pp. 140–146.

[19] “Unixbench.” [Online]. Available: https://code.google.com/p/byte-unixbench/


Container-based Cluster Management Platform for ......Docker is a lightweight and powerful open...

Documents

Transcript of Container-based Cluster Management Platform for ......Docker is a lightweight and powerful open...