Performance Analysis of the Impact of Vertical Scaling on …1333534/FULLTEXT02.pdf · 2019. 7....

Master of Science in Telecommunication SystemsJune 2019

Performance Analysis of the Impact ofVertical Scaling on Application

Containerized with DockerKubernetes on Amazon Web Services EC2

Dhananjay Midigudla

Faculty of Computing, Blekinge Institute of Technology, 371 79 Karlskrona, Sweden

This thesis is submitted to the Faculty of Computing at Blekinge Institute of Technology inpartial fulfilment of the requirements for the degree of Master of Science in TelecommunicationSystems. The thesis is equivalent to 20 weeks of full time studies.

The authors declare that they are the sole authors of this thesis and that they have not usedany sources other than those listed in the bibliography and identified as references. They furtherdeclare that they have not submitted this thesis at any other institution to obtain a degree.

Contact Information:Author(s):Dhananjay MidigudlaE-mail: [email protected]

University advisor:Emiliano CasalicchioDepartment of Department of Computer Science and Engineering

Faculty of Computing Internet : www.bth.seBlekinge Institute of Technology Phone : +46 455 38 50 00SE–371 79 Karlskrona, Sweden Fax : +46 455 38 50 57

Abstract

Background. Containers are being used widely as a base technology to pack ap-plications and microservice architecture is gaining popularity to deploy large scaleapplications, with containers running different aspects of the application. Due to thepresence of dynamic load on the service, a need to scale up or scale down compute re-sources to the containerized applications arises in order to maintain the performanceof the application.Objectives. To evaluate the impact of vertical scaling on the performance of acontainerized application deployed with Docker container and Kubernetes that in-cludes identification of the performance metrics that are mostly affected and hencecharacterize the eventual negative effect of vertical scaling.Methods. Literature study on kubernetes and docker containers followed by propos-ing a vertical scaling solution that can add or remove compute resources like cpu andmemory to the containerized application.Results and Conclusions. Latency and connect times were the analyzed perfor-mance metrics of the containerized application. From the obtained results, it wasconcluded that vertical scaling has no significant impact on the performance of acontainerized application in terms of latency and connect times.

Keywords: Docker, Amazon EC2, Kubernetes, Elasticity, Scaling

i

Acknowledgments

I would like to express my sincere gratitude to Prof. Emiliano Casalicchio for hisvaluable inputs and support throughout the thesis. I would like to thank professorPatrik Arlos, whose curriculum gave me a unique educational perspective.

I would like to thank my parents for their unconditional love and support.

ii

Nomenclature

AWS Amazon Web Service

CPU Central Processing Unit

EC2 Elastic Compute Cloud

HTTP Hyper Text Transfer Protocol

TCP Transmission Control Protocol

iii

Contents

Abstract i

Acknowledgments ii

1 Introduction 11.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.2 Aim and Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.2.1 Research Questions . . . . . . . . . . . . . . . . . . . . . . . . 2

2 Background 32.1 Containers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32.2 Docker Containers . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42.3 Kubernetes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2.3.1 Nodes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42.3.2 Pod . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.3.3 Deployment . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.3.4 Namespace . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.3.5 Service . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.4 Vertical Scaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.5 Amazon Web Services and its components . . . . . . . . . . . . . . . 6

2.5.1 EC2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.5.2 Virtual Private Cloud . . . . . . . . . . . . . . . . . . . . . . 62.5.3 S3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.6 Apache Jmeter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

3 Related Work 9

4 Method 104.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104.2 Hosting the containerized application on kubernetes . . . . . . . . . . 114.3 Resource allocation to the containers running on kubernetes . . . . . 164.4 Vertical Scaling Method . . . . . . . . . . . . . . . . . . . . . . . . . 16

4.4.1 Updating a Deployment . . . . . . . . . . . . . . . . . . . . . 174.5 Environment Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194.6 Experimentation Methodology . . . . . . . . . . . . . . . . . . . . . . 21

4.6.1 Scenario 1: Single Replica . . . . . . . . . . . . . . . . . . . . 214.6.2 Scenario 2: Two replicas . . . . . . . . . . . . . . . . . . . . . 22

iv

5 Results and Analysis 245.1 Scenario 1: One replica . . . . . . . . . . . . . . . . . . . . . . . . . . 24

5.1.1 Test case: 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245.1.2 Test case: 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

5.2 Scenario 2: Two replicas . . . . . . . . . . . . . . . . . . . . . . . . . 315.2.1 Test case: 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315.2.2 Test case: 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

5.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

6 Conclusions and Future Work 386.1 Research Questions and Answers . . . . . . . . . . . . . . . . . . . . 38

References 40

v

List of Figures

2.1 Containers vs Virtual Machine [1] . . . . . . . . . . . . . . . . . . . 32.2 Kubernetes Architecture [2] . . . . . . . . . . . . . . . . . . . . . . . 42.3 Kubernetes Node [3] . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.4 EC2 architecture [4] . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.5 Virtual Private Cloud - Default [5] . . . . . . . . . . . . . . . . . . . 72.6 Virtual Private Cloud - Custom [5] . . . . . . . . . . . . . . . . . . . 7

4.1 Creating a kubernetes namespace . . . . . . . . . . . . . . . . . . . . 114.2 A sample deployment file . . . . . . . . . . . . . . . . . . . . . . . . . 124.3 A kubernetes deployment from command line . . . . . . . . . . . . . 124.4 A sample kubernetes deployment description . . . . . . . . . . . . . . 134.5 Network load balancer . . . . . . . . . . . . . . . . . . . . . . . . . . 144.6 A sample kubernetes service configuration . . . . . . . . . . . . . . . 144.7 Creating the Kubernetes service . . . . . . . . . . . . . . . . . . . . . 154.8 load balancer creation in AWS gui . . . . . . . . . . . . . . . . . . . . 154.9 A deployment configuration describing the compute resources . . . . 164.10 Updating a kubernetes deployment . . . . . . . . . . . . . . . . . . . 174.11 Initial pod description . . . . . . . . . . . . . . . . . . . . . . . . . . 184.12 Description of the newly generated pod . . . . . . . . . . . . . . . . . 194.13 ec2 instance terminal sample . . . . . . . . . . . . . . . . . . . . . . . 204.14 A kubernetes cluster description . . . . . . . . . . . . . . . . . . . . . 204.15 Environment for experimentation . . . . . . . . . . . . . . . . . . . . 214.16 Vertical scaling specifications . . . . . . . . . . . . . . . . . . . . . . 23

5.1 Active threads(users) over time . . . . . . . . . . . . . . . . . . . . . 255.2 Bytes throughput over time . . . . . . . . . . . . . . . . . . . . . . . 255.3 Connect times over time . . . . . . . . . . . . . . . . . . . . . . . . . 265.4 Latency over time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265.5 Server hits per second . . . . . . . . . . . . . . . . . . . . . . . . . . 275.6 Active threads(users) over time . . . . . . . . . . . . . . . . . . . . . 285.7 Bytes throughput over time . . . . . . . . . . . . . . . . . . . . . . . 295.8 Connect times over time . . . . . . . . . . . . . . . . . . . . . . . . . 295.9 Latency over time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305.10 Server hits per second . . . . . . . . . . . . . . . . . . . . . . . . . . 305.11 Active threads(users) over time . . . . . . . . . . . . . . . . . . . . . 315.12 Bytes throughput over time . . . . . . . . . . . . . . . . . . . . . . . 325.13 Connect times over time . . . . . . . . . . . . . . . . . . . . . . . . . 325.14 Latency over time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

vi

5.15 Server hits per second . . . . . . . . . . . . . . . . . . . . . . . . . . 335.16 Active threads(users) over time . . . . . . . . . . . . . . . . . . . . . 345.17 Bytes throughput over time . . . . . . . . . . . . . . . . . . . . . . . 345.18 Connect times over time . . . . . . . . . . . . . . . . . . . . . . . . . 355.19 Latency over time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 355.20 Server hits per second . . . . . . . . . . . . . . . . . . . . . . . . . . 36

vii

List of Tables

4.1 Description of ec2 instance . . . . . . . . . . . . . . . . . . . . . . . . 194.2 Vertical scaling specifications . . . . . . . . . . . . . . . . . . . . . . 22

5.1 Data for the HTTP requests for test case 1 . . . . . . . . . . . . . . . 245.2 Data for the HTTP requests for test case 2 . . . . . . . . . . . . . . . 285.3 Data for the HTTP requests for test case 1 . . . . . . . . . . . . . . . 315.4 Data for the HTTP requests for test case 2 . . . . . . . . . . . . . . . 335.5 Error% of HTTP requests with and without vertical scaling . . . . . 37

viii

Chapter 1Introduction

Infrastructure as a Service (IaaS)[6], is a cloud computing infrastructure, which pro-vides virtual machines to users on demand. Over the years, this platform has seensignificant improvements in provisioning of customized virtual machines where userscan choose the operating system and dependencies according to their needs. Smallscale application could be managed efficiently on the virtual machines but it be-came more complex and difficult to manage large scale applications when there wasdynamic load present.

Container technology is gaining popularity in cloud industry for research andcommercial offerings. The ease in deploying, scaling and exposing an application tothe internet are a few contributing factors to container’s widespread usage. Largescale applications are being deployed in a microservice architecture to distribute andisolate sub-processes in order to improve the overall performance. Hence, orchestrat-ing the containers and redefining their behaviour when required is very importantwhen there are varying load demands for the hosted application.

Scalability[7] in cloud computing is a feature that defines the ability to increase ordecrease compute resources or instances(virtual machines or containers), in order tomeet workload demands and ensure that there is no degradation in the performanceof the application. Scaling is popularly used in a microservice architecture, wheredifferent segments of the application are containerized and based on the workloaddemand, specific containers can be scaled to ensure consistent performance, highresource utilization.

Kubernetes[8], is one such popular container orchestrating tool, that is gainingpopularity. In cloud computing terminology, kubernetes is viewed as platform asa service (PaaS). Kubernetes provides platform to manage containers and servicesand its popularity can be linked to its robustness in configuration and automationof containers and services.

1

Chapter 1. Introduction 2

1.1 MotivationWith distributed applications and microservices utilizing containers as the buildingblocks, proper resource allocation to the containers on demand becomes crucial.

Due to the presence of dynamic load to services, containers can be scaled in andout based on the load. Scaling is typically categorized into horizontal and verticalscaling. Horizontal scaling is the addition or removal of container replicas whereasvertical scaling is the addition or removal of computing resources to the runningcontainers. Scaling is crucial in ensuring proper and resource utilization and perfor-mance. This feature is also known as elasticity [7]. The motivation of this thesis isto analyze the affects in the performance of a containerized application when verticalscaling is implemented.

1.2 Aim and ObjectivesThe aim of the thesis project is to evaluate the impact of vertical scaling on the perfor-mance of a containerized application deployed with Docker container and Kubernetesthat includes identification of the performance metrics that are mostly affected andhence characterize the eventual negative effect of vertical scaling.

The objectives of the thesis are:

1. Understanding Docker containers, Kubernetes architecture.

2. Understanding resource allocation in Docker containers running on Kubernetes.

3. Investigate existing methods of vertically scaling a Docker container on Kuber-netes.

4. Develop a vertical scaling model.

5. Evaluate the performance of the application in the process of vertical scaling.

1.2.1 Research Questions

The research questions formulated are :

1. How can vertical scaling be implemented on an application containerized withDocker running on Kubernetes?

2. What performance metrics are affected the most when a docker container run-ning on Kubernetes is vertically scaled?

Chapter 2Background

This chapter provides a brief description of the technologies used in this thesis.Containers, Docker container, Kubernetes architecture, AWS EC2, Apache Jmeterare the technologies described.

2.1 ContainersA container refers to a virtualization technology, which is a package of the applica-tion’s code and its dependencies. A container can therefore be called as a unit ofsoftware that is flexible, independent and runs less overhead. The host operatingsystem will share the same kernel with containers. The applications running insidethe container are independent of the running environment. This ensures that the ap-plication’s performance is consistent regardless of the environment. Containers areoperating system level virtualization and are similar to virtual machines. Containerssupport an entire operating system or a single application [1]. Containers have em-powered the usage of micro-service architectures by being light weight, providing faststart-up times because it does not emulate the physical hardware as a virtual ma-chine does. Containers develop applications based on monolithic and micro-servicesarchitectures. The core of containers relies on Linux namespaces and cgroups. Onecan run a single application within a container whose namespaces are isolated fromother processes. Currently Docker containers are widely used containers.

Figure 2.1: Containers vs Virtual Machine [1]

3

Chapter 2. Background 4

2.2 Docker ContainersDocker [9]is an open source project which provides a platform for users to create con-tainers. In Docker, containers are initially images, which are automatically convertedinto containers at run-time. Standardization, security, compatibility across multipleplatforms makes it quite popular for businesses and research purposes. Docker pro-vides two features to the user namely Docker client and Docker hub. Docker clientallows users to download or create new images for the required applications. Theseimages can be pushed to docker hub, and can be accessed anytime.

2.3 KubernetesKubernetes [8]is an open-source platform designed to automate deploying, scaling,and operating application containers. With kubernetes, one can deploy applicationsquickly; scale and limit hardware usage to required resources only. It is portable,extensible and self-healing. In kubernetes, containers are isolated from each otherand they have their own file systems. Their computational usage can be bounded.Kubernetes can schedule and run application containers on clusters of physical virtualmachines. A kubernetes cluster consists of master and nodes. Master is responsiblefor managing the cluster. It coordinates all activities in the cluster. Node is a virtualmachine or a computer that serves as a worker machine in a kubernetes cluster. Anode should have tools for handling container operations.

Figure 2.2: Kubernetes Architecture [2]

2.3.1 Nodes

As mentioned above, a node [3]in kubernetes can be a virtual machine or a physicalmachine which contains all the necessary resources to run a pod. The nodes in kuber-


netes are managed by master components and kubernetes automatically schedulespods on different available nodes.

Figure 2.3: Kubernetes Node [3]

2.3.2 Pod

A kubernetes pod[10] is a collection of containers, that run in the same network. Apod also defines the run-time behaviour of the containers. A pod is also seen as alogical host on which the containers run. This means that containers in a pod canalso share the same volumes.

2.3.3 Deployment

A kubernetes deployment [11]is a representation of multiple pods. A deployment canbe configured to contain replicas to ensure that there are back-up instances availablein case of failures. A deployment file also contains pod specifications, that includescontainer image, volumes, ports and other container specifications.

2.3.4 Namespace

Kubernetes allows users to run deployments and services in different namespaces[12]. A kubernetes namespace allows users to divide cluster resources according totheir requirements.

2.3.5 Service

A kubernetes service [13] provides gateway to access pods. A service in kubernetesis assigned an IP address and the incoming traffic can be forwarded to as specifiedport of the pods defined in the service.


2.4 Vertical ScalingVertical scaling[14] is the addition or deletion of real computing resources such asCPU, memory to a container. Vertical scaling is done when the workload of a con-tainer increases, and it is applicable to any application. The extent of vertical scalingis limited to the hosting system’s capability. When multiple containers are runningon a single machine, the resources (CPU and memory) are allocated by 2 schedulersnamely: 1) Completely fair scheduler (CFS), 2) Real-time scheduler (RTS). A CFSequally distributes the CPU time to the containers where as an RTS provides specificlimits to the Docker containers.

2.5 Amazon Web Services and its components

2.5.1 EC2

Amazon Elastic Compute Cloud (EC2) [15] [8], is a web service provided by AWS,which provides users with a re-sizable compute capacity in the cloud i.e users canchoose configurations of their virtual machines and deploy them quickly. It also haspreconfigured templates called Amazon Machine Images (AMI), that packages theoperating system and additional dependencies on the server.

Figure 2.4: EC2 architecture [4]

2.5.2 Virtual Private Cloud

A virtual private cloud (VPC) [5]provides users with a virtual network which canrun resources like EC2 etc. The vpc can be modified by the user accordingly suchas specifying the ip range, security rules etc. There are 2 types of available vpc’snamely:


• Default : It is the default option provided by AWS which provides a private ipaddress to every instance launched inside it.

Figure 2.5: Virtual Private Cloud - Default [5]

• Custom: When a custom vpc is created by a user, public ip addresses areassigned to the instances unless specified explicitly.

Figure 2.6: Virtual Private Cloud - Custom [5]


2.5.3 S3

S3 [16]is a service used for cloud storage in AWS. Initially, a bucket is created, whichis globally unique. Cluster states, other data such as photos, files etc. can be storedin the bucket in the form of objects.

2.6 Apache JmeterApache Jmeter [17]is an open source web server testing application written in Java. Itis used to evaluate the performance of web applications under different load scenarios.The important components in Jmeter in web testing are:

1. Thread Group: This parameter is used to specify the number of simulated users[18], number of requests to be sent and the frequency of sending the requests.

2. HTTP Request Defaults: It is a configuration element that can be added tothe thread group, which allows users to specify a target ip address and port tosend requests.

Chapter 3Related Work

This section describes the related previous works that provided motivation for thisresearch work.Peng Wang et al. [19] analyzed the performance of docker container on kubernetesplatform. Docker architecture, cgroups, linux namespaces were studied and CPU,disk i/o performance metrics were analyzed.

Yahya Al-Dhuraibi et al. [14] proposed ELASTICDOCKER, which adds or re-moves compute resources to docker containers based on the workload demands au-tonomously. The proposed model migrates docker container to another host if theworkload demands more resources of the host than available and the migration tech-nique used is based on CRIU functionality in linux systems. End user QoE, resourceutilization, migration time were the performance metrics that were considered. Theperformance of ELASTICDOCKER was measured under different workload scenar-ios and was compared with kubernetes auto-scaling feature and it was concludedthat ELASTICDOCKER outperformed kubernetes autoscaling by 37 %.

In [20], the authors designed an autoscaling system for web applications hosted ondocker, that dynamically scales up or scales down the number of containers accordingto workload demand. A scaling algorithm was developed that spawns new containersand attaches them to the load balancer. An underlying predictive model in thealgorithm ensures to scale down the number of containers only if it predicts lessnumber of containers for k successive periods.

Chuanqi Kan [21] designed DoCloud, an elastic cloud platform that dynamicallyscales up or scales down the number of containers according to the workload demand.Scaling down the number of containers was done keeping in mind that there mightbe an upsurge in demand just after scaling down.

In [22], authors proposed a framework involving a custom container auto-scalerthat scales in or scales out containers based on the resource utilization of the runningcontainers. The auto-scaler performed better when the workload was repetitive,and performed very poorly when there was random workload. In [23], the authorsclassified autoscaling into predictive and reactive systems. Predictive systems employmachine learning, reinforcement learning to predict the future workload and allocateresources accordingly.

9

Chapter 4Method

4.1 OverviewA systematic approach was employed in carrying out the research methodology forthis thesis. It involved an extensive study to figure out a vertical scaling solution,implementation of the solution under different scenarios, data collection and analy-sis. Vertical scaling was done manually based on the workload demand. Literaturestudy on containers, Docker containers, Kubernetes, elasticity of containers was doneinitially which provided insight on the architectures, tools and other technologies re-quired to perform the experiments. Since Kubernetes is orchestrating the dockercontainer running our web service, an investigation on how to perform vertical scal-ing in kubernetes was done.

In kubernetes, containers are generally deployed in pods. A deployment specifi-cation can be configured that inputs the desired characteristics and runs the podsaccordingly. As mentioned in section 2.3.2, containers run inside pods and all theinstructions and specifications to a pod are redirected to the containers inside it.Deployments allows users to manage more than one container, add shared volumesamong them, and bind them together to host a service.

Once the deployment is set up, a network load balancer is required so that ourapplication is exposed to the internet as a service. A service can be defined inkubernetes linking our deployment to a newly created load balancer which makesit possible to access the service through the ingress of the load balancer. Load isgenerated progressively for a time period and this traffic is redirected to the pod bythe load balancer gradually. Kubernetes architecture and its features were studiedextensively to determine the methods that can be used to perform vertical scaling.One such feature that that was of interest was "Rolling update[24]" , which is akubernetes feature that can be used to update an existing deployment. The rollingupdate feature uses a deployment controller [25] that manages updates to the existingdeployments. Rolling update can be used to modify a deployment in many wayssuch as increasing or decreasing the number of container replicas, changing containerimages etc. But the main advantage that rolling update provides is that it does notbreak down the deployment if an invalid update (such as invalid container images,unavailable resources etc.) is requested by the user. The user can monitor thestatus of the deployment update requested and can take necessary actions if required.Rolling update also offers a roll back option, that brings the deployment back to theprevious configuration and cancels the update.The thesis work therefore has been divided into the following steps:

10

Chapter 4. Method 11

• Step 1: Investigate the possibilities to change the predefined compute resourcesallocated in a deployment and analyze the changes in kubernetes configurationafter updating the deployment.

• Step 2: Perform vertical scaling on the deployment and analyze the applicationperformance under different scenarios.

4.2 Hosting the containerized application on kuber-netes

An EC2 instance from AWS is used to serve as a virtual machine upon which thecontainerized application will run on kubernetes. The reason to consider this baseenvironment is because of late, there is a growing popularity in hosting applicationsin the cloud, which makes companies think less of the infrastructure needed andfocus more on the application service.

The containerized application runs inside a pod, which is described as a partof a deployment. A deployment is written in a yaml file which contains a set ofinstructions that describe the deployment behaviour like the number of replicas, thenamespace it runs on etc. and the pod specifications like the container image, port,compute resources etc. In order to isolate the application from other services, a newnamespace is created in kubernetes.Figure 4.1 illustrates the new namespace created using the kubectl api. This

Figure 4.1: Creating a kubernetes namespace

namespace is used to deploy our containerized application as a service.Figure 4.2 describes a sample kubernetes deployment file. Kubernetes and docker

have integrated to provide users with an easy access to use docker containers onkubernetes. The spec "-image" in the deployment file takes a string correspondingto a docker image and kubernetes will automatically pull the docker image fromdocker hub and runs the container inside a pod.


Figure 4.2: A sample deployment file

Figure 4.3: A kubernetes deployment from command line

Figure 4.4 illustrates the description of the deployment create using the yaml fileby the kubectl api. The cpu shares allocated to the container in the deploymentfile can be verified. Once the deployment is successful, the pods generated from thedeployment can be viewed and the cpu shares allocated to it can be verified.


Figure 4.4: A sample kubernetes deployment description

In order to expose the containerized application to the internet, a kubernetesservice has to be described, which assigns a load balancer to our deployment. Ku-bernetes and AWS integration allows users to deploy an AWS load balancer servicethrough the kubernetes service configuration file itself. As illustrated in figure 4.5,the load balancer redirects the incoming traffic to the pod running the containerizedapplication.


Figure 4.5: Network load balancer

A kubernetes service is described in a yaml file, as illustrated in figure 4.6, inwhich the incoming port and the target port of the pod can also be specified. Theincoming port redirects the traffic towards the target port of the pod.

Figure 4.6: A sample kubernetes service configuration

Figure 4.7 illustrates the load balancer ip, dns and the port forwarding and thesemetrics can be verified with the service configuration. The spec "-target port" is thepod port, to which the load balancer forwards the incoming traffic is forwarded The


figure shows the ip and ingress of the load balancer. Figure 4.8 illustrates the loadbalancer configuration created in AWS.

Figure 4.7: Creating the Kubernetes service

Figure 4.8: load balancer creation in AWS gui


4.3 Resource allocation to the containers running onkubernetes

A specification addressing the compute resources (cpu and memory) range of thecontainers [26] can be described in the deployment configuration file. The specifica-tion has 2 arguments namely "–limits" and "–requests". The spec "–limits" refersto the the maximum amount of cpu or memory available for the container beforestart up and the the spec "–requests" refers to the amount of cpu and memory thatthe container can request to start up. This feature is used by kubernetes to ensureproper scheduling of the pods on the nodes, thus optimizing resources.

Figure 4.9: A deployment configuration describing the compute resources

If the value of the spec "–requests" exceeds the value of the spec "–limits", thenthe pods hosting the containers will not start up and the deployment will throw anerror. This is because the container cannot request for more resources than available.If no requests and only the limits are specified, then the container uses all the limitsto start up.

Figure 4.9 illustrates the cpu resource specifications given to the container. Theinput to the specs "–limits" and "–requests" can be an integer which refers to thenumber of cpus in entirety as specified by the integer value or a decimal like "0.5m"which is the one hundredth millicore of the value. For example, consider a kubernetescluster with 1 worker node with 2cpus. If a deployment is configured such that onecontainer is inputted the spec "–limit" as "1" and the spec "–request" as "0.5m",then the container will request 500m (millicore) cpu and start up.

4.4 Vertical Scaling MethodVertical scaling can be defined as the addition or removal of compute nodes to thecontainerized application. While using a kubernetes platform for orchestrating con-tainerized applications, kubernetes pods spring into action. Pods are the building


blocks in a kubernetes architecture and they maintain the containers. Communica-tion with containers can only be done by addressing the pods that host them. Whena deployment is described, the pods are configured such that they strictly follow thedirectives specified in the deployment configuration. If an attempt is made to modifya pod such as terminating it, the pod dies but is immediately re-created accordingto the deployment configuration. This features ensures that only a deployment mod-ification can change the underlying pod.

4.4.1 Updating a Deployment

In order to increase the compute resources of containers in a deployment in kuber-netes, kubectl api can be used from the terminal to edit the existing deploymentconfiguration. This opens up the existing configuration of the deployment that canbe modified. The specs of the containers can be modified and this action will updatethe deployment.

Figure 4.10: Updating a kubernetes deployment

Figure 4.10 illustrates a kubernetes deployment in the namespace "test" whichhas been updated with modifications in the cpu spec "–requests" of the container.This results in the creation of a new pod with the updated deployment configurationand the termination of the old pod. Figure 4.11 illustrates the initial pod descriptionof the deployment. The container’s initial cpu specification can be checked.


Figure 4.11: Initial pod description


Figure 4.12: Description of the newly generated pod

Figure 4.12 illustrates the description of the new pod created in the namespace"test" and the updated cpu specifications can be verified.

4.5 Environment SetupAmazon Web Services EC2 platform was set up to run the Kubernetes cluster thathosts the web application. The virtual machine runs Ubuntu 18.04 with 1 virtualcpu (vcpu). The instance type is denoted as t2.micro. The inbound and outboundtraffic rules are modified so that only the remote host ip’s requests are accepted. Thevirtual machine is also configured to obtain a private ip address, which shall be usedto access its terminal. Once the instance is active, it can be accessed by its publicdns using ssh from a remote host.

Operating System No of virtual cpus Instance TypeUbuntu 18.04 1 t2.micro

Table 4.1: Description of ec2 instance

.In order to store the cluster state, S3 storage service is used. As mentioned in

section 2.5.3, the bucket has a globally unique name. Once the name is configured,


Figure 4.13: ec2 instance terminal sample

the bucket can now be used by other AWS services.The next step involved creating a hosted zone for the kubernetes cluster. Route53

in AWS allows users to create a hosted zone and sub-domains. The sub-domainsare automatically configured by kubernetes cluster nodes if the hosted zone endswith the suffix "k8s.local". The kubernetes cluster is created using kops api in theterminal. A kubernetes cluster with one master and one worker node is configuredfor experimentation. Figure 4.14 illustrates kubernetes nodes that spawn up alongwith their corresponding information.

Figure 4.14: A kubernetes cluster description


A custom cpu intensive containerized web application is deployed as a servicefor experimentation. Apache Jmeter is set up on the local host machine and httprequests are sent to the load balancer ingress of the web service. The experimentationarchitecture is illustrated in figure 4.15.

Figure 4.15: Environment for experimentation

4.6 Experimentation MethodologyTwo deployment scenarios of the application were considered for experimentation,with one scenario considering a single instance of the container and another scenariorunning 2 replicas of the container. Apache Jmeter is used to generate http workloadfrom a physical local machine to the web service running on kubernetes in EC2.Vertical scaling is performed when the web server is under load testing and theperformance metrics of the application are monitored. Apache Jmeter custom plug-ins were used that provide visualization of latency, throughput, number of activethreads over time. Experiments were conducted in the following steps:

• Deploy the web service with single and two replicas of the containers.

• Generate workload from Apache Jmeter to the DNS of the web service.

• Perform vertical scaling while the web server is under load testing and capturethe performance metrics values through Jmeter plug-ins.

4.6.1 Scenario 1: Single Replica

In this scenario, a kubernetes deployment is configured to run only one replica ofthe container (running our web application) and a CPU share of "100m" is assignedto the container. This generates a kubernetes pod for our container and the podruns on a kubernetes node with 1 vCPU. Http load is sent using Jmeter from a localphysical machine with a thread group configuration and vertical scaling is performed


during the load testing. In vertical scaling, the CPU shares to the container areincreased from "100m" to "900m". Table 4.2 describes the vertical scaling range andits details.

Numberof containerreplicas

Number of CPUsavailable on thekubernetes node

Initial CPU sharesallocated to the container(Value represents onehundredth millicore

of the available CPUs)

Vertical scalingrange

Final CPU sharesallocated to the container(Value represents onehundredth millicore

of the available CPUs)1 1 100m 100m - 900m 900m

Table 4.2: Vertical scaling specifications

Two test cases with different request samples were used for load testing. For thetwo test cases, Jmeter is configured with the following parameters:

1. 2000 samples over 60 seconds

• Number of users: 500

• Ramp-up (seconds): 50

• Loop count: 4

• http request defaults: GET




• Loop count: 4


The above Jmeter configurations will generate 2000 http GET requests over a timespan of 60 seconds and 30 seconds respectively to the web service. Since the load isgenerated over a period of 60 seconds and 30 seconds, vertical scaling is performedduring the load testing and the performance metrics are collected from Jmeter plug-ins after the load test.

4.6.2 Scenario 2: Two replicas

In this scenario, a kubernetes deployment is configured to run two replica of thecontainer (running our web application). This generates two kubernetes pods runningone container each and both the pods run on a kubernetes node with 1 vCPU. Thedeployment is configured such that the container uses "50m" CPU share, and sincetwo replicas are specified for the deployment, both the spawned containers use "50m"CPU share each, making a combined "100m" CPU share for the deployment thatwill run as a service. Http load is sent using Jmeter from a local physical machinewith a thread group configuration and vertical scaling is performed during the loadtesting. In vertical scaling, the CPU shares to the container are increased from "50m"to "450m", which translates to vertically scaling the deployment from "100m" to"900m" CPU shares. Figure 4.16 describes the vertical scaling range and its details.


Figure 4.16: Vertical scaling specifications

Two test cases with different request samples were used for load testing. For thetwo test cases, Jmeter is configured with the following parameters:




• Loop count: 4





• Loop count: 4


Chapter 5Results and Analysis

This chapter presents the results and analysis for the experiments conducted in theenvironments described in section 4.6. Latency and connect times are the evalu-ated metrics. Latency is defined as the time taken from sending the request to justafter receiving the response. Connect time is defined as the time taken in estab-lishing a TCP connection between client (Host machine with Apache Jmeter) andserver(Docker containerized application). Server hits per second, throughput, num-ber of active threads over time are also measured for analytical insight.

5.1 Scenario 1: One replicaThe experiments are conducted in the environment described in section 4.6.1. For thesent requests, the minimum response time in milliseconds(ms), maximum responsetime(ms), average response time(ms), throughput and the average bytes per secondare monitored using Jmeter and the data is collected.

5.1.1 Test case: 1

As described in section 4.6.1, the first test was conducted for 500 threads(users)with a loop count of 4 over 60 seconds. Vertical scaling from "100m" to "900m"CPU share to the container was performed at 13th second during the 60 seconds ofload testing. In the table 5.1, the metric error % describes the percentage of failedrequests. Out of 2000 samples, 4 samples (0.2%) failed.

Label SamplesMin.

ResponseTime(ms)

Max.ResponseTime(ms)

Avg.ResponseTime(ms)

StandardDeviation Error% Throughput Avg.

Bytes

HTTPRequests 2000 246 469 231 19.53 0.2 32.8/sec 852.3

Table 5.1: Data for the HTTP requests for test case 1

24

Chapter 5. Results and Analysis 25

Figure 5.1: Active threads(users) over time

Figure 5.2: Bytes throughput over time


Figure 5.3: Connect times over time

Figure 5.4: Latency over time

From figure 5.4, there is a zero latency value at 13th second (or a failed request)i.e. at the time of vertical scaling. After that point, there is no significant impacton the latency except during the interval (24th-30th) second. The reason for thisincreased latency is because of the increase in active threads over time during theinterval (24th-30th) second as seen in figure 5.1. In order to verify that increase inlatency is due to the the increase in the number of threads during the interval, aJmeter plug-in "server hits per second" was configured. This feature calculates thenumber of requests being sent to the service per second and the usage of this featureis an attemp to check if the number of requests to the service increased during theinterval (24th-30th) second.

From figure 5.5, an increase in the number of server hits to a peak of 42 hitscan be seen during the interval (24th-30th) second, which is the peak value duringthe entire load testing period of 60 seconds. From figure 5.3, an increase in connecttimes to the service can be seen during the interval (24th-30th) second. But after


Figure 5.5: Server hits per second

13th second, which is time of vertical scaling, no significant change in latency andconnect time can be observed.


5.1.2 Test case: 2

As described in section 4.6.1, the second test was conducted for 500 threads(users)with a loop count of 4 over 30 seconds. This test case represents a more intensiveworkload during a shorter duration of 30 seconds. Vertical scaling from "100m"to "900m" CPU share to the container was performed at 6th second during the 30seconds of load testing. In the table 5.2, the metric error % describes the percentageof failed requests. Out of 2000 samples, 8 samples (0.4%) failed.

Label SamplesMin.

ResponseTime(ms)




Bytes







From figure 5.6, a surge in number of threads to 24 can be seen before 3rd secondand another increase to 21 threads from 18th to 20th second can be seen. There isalso an increase in latency for the same intervals which can be seen from figure 5.9.After the time of vertical scaling i.e. 6th second, no significant changes in latency andconnect time can be seen and the later spikes in latency is addressed by the increasein the active number of threads and this can also be verified with the number ofserver hits per second from figure 5.10.


5.2 Scenario 2: Two replicasThe experiments are conducted in the environment described in section 4.6.1. Thesame performance metrics as scenario 1 are analyzed using Jmeter.

5.2.1 Test case: 1

As described in section 4.6.2, the first test was conducted for 500 threads(users) witha loop count of 4 over 60 seconds. Vertical scaling from "50m" to "450m" CPU shareto the containers was performed at 13th second during the 60 seconds of load testing.Since the deployment has two replicas of the container and each container is scaledfrom "50m" to "450m" CPU share, this vertical scaling translates to vertical scalingof the deployment from "100m" to "900m". In the table 5.3, the metric error %describes the percentage of failed requests. Out of 2000 samples, 4 samples (0.2%)failed.

Label SamplesMin.

ResponseTime(ms)




Bytes






From figure 5.14, after vertical scaling at 13th second, there is no significantimpact on the latency except during the interval (34th-37th) second. There is a slightincrease in latency from 240ms to 342ms during the interval (34th-37th) second. Fromfigure 5.15, no significant increase in the number of server hits can be seen duringthe interval (34th-37th) second. From figure 5.13, a slight increase in connect timesfrom 119ms to 189ms to the service can be seen interval (34th-37th) second. Butafter 13th second, which is time of vertical scaling, no significant change in latencyand connect time can be observed.

5.2.2 Test case: 2

As described in section 4.6.2, the second test was conducted for 500 threads(users)with a loop count of 4 over 30 seconds. This test case represents a more intensiveworkload during a shorter duration of 30 seconds. Vertical scaling from "50m" to"450m" CPU share to the containers was performed at 6th second during the 60seconds of load testing. In the table 5.4, the metric error % describes the percentageof failed requests. Out of 2000 samples, 9 samples (0.45%) failed.

Label SamplesMin.

ResponseTime(ms)




Bytes

HTTPRequests 2000 231 26163 294 1118.41 0.45 56/sec 856.7



From figure 5.16, a gradual increase and consistency in the number of threads isseen. There is no significant change in latency during the 30 seconds of load testingfrom figure 5.19. After the time of vertical scaling i.e. 6th second, no significantchanges in latency and connect time.





5.3 SummaryAn interesting observation during vertical scaling is the buffer range of CPU shareswhile reducing the CPU shares of a container. Kubernetes updates a deploymentby first creating a new pod followed by the termination of the old pod. The CPUshares of the container in a single replica deployment was reduced to different valuesfrom an initial value of "900m". Kubernetes could update the deployment only whenthe new CPU shares value was less than or equal to "460m". This experiment wasrepeated with different initial CPU shares. It was found that the sum new CPUshares value and the initial shares value should not exceed "1360m". If the sumexceeds "1360m", kubernetes cannot update the deployment and start a new pod forthe updated container.

The workload generated was progressive over time and the metric "error %" hasbeen evaluated additionally for the test cases with and without performing verticalscaling. The redirection of the traffic from the load balancer to the pods is alsoprogressive. Error% refers to the percentage of failed HTTP requests. Table 5.5illustrates the HTTP requests error% for single replica and two replica scenariosunder workload. The data shows a decrease in error % when vertical scaling isperformed which indicates that vertical scaling is performed with a low downtimeand that the traffic is redirected to the newly generated pod.

This section provides a brief summary of the experimental results and conclusionsderived.

• One replica scenario: No significant impact on latency and connect times wasobserved after performing vertical scaling. A spike in latency and connecttimes during a time interval was addressed by comparing the active number ofthreads during the interval.

• Two replica scenario: No significant impact on latency and connect times wasobserved after performing vertical scaling. For the first test case, a slight


increase in latency and connect times was observed during a time interval butthe cause of the spike cannot be correlated due to the active number of threads.For the second test case, no significant changes in latency and connect timeswere observed during the 30 seconds of load testing.

• From the above scenarios, it can be concluded that vertical scaling has nosignificant impact on latency and connect times of the service running a dockercontainerized application on kubernetes.

ScenarioError%

(Failed HTTP Requests)Without Vertical Scaling With Vertical Scaling

Single Replica(2000 HTTP requests

over 60 seconds)0.36% 0.2%

Single Replica(2000 HTTP requests


Two Replicas(2000 HTTP requests


Two Replicas(2000 HTTP requests


Table 5.5: Error% of HTTP requests with and without vertical scaling

Chapter 6Conclusions and Future Work

This thesis work was aimed at analyzing the impact of vertical scaling on a webapplication containerized with docker on kubernetes platform. The design of thevertical scaling model was implemented on a cloud environment (EC2 instance), toconsider a cloud native scenario. Vertical scaling can be implemented on a dockercontainer running on kubernetes by updating the deployment hosting the container.From the experiments conducted, it can be concluded that vertical scaling does nothave any negative impact on the application performance in terms of latency andconnect times.

6.1 Research Questions and AnswersRQ1: How can vertical scaling be implemented on an application con-tainerized with Docker running on Kubernetes?

To implement vertical scaling, the architectures of kubernetes and docker con-tainers were studied as a part of literature study. As described in section 4.3, inkubernetes, a deployment can be configured so that the container’s resource lim-its and requests can be specified. This deployment feature was used as a baselineto perform vertical scaling. This deployment is updated by modifying it with newresource specs and it results in the creation of a new pod with updated containerresources. Figures 4.11 and 4.12 illustrate the initial pod and the newly created podwith updated resource specs.

RQ2: What performance metrics are affected the most when a dockercontainer running on Kubernetes is vertically scaled?

The containerized application’s performance was analyzed under different work-loads for different deployment scenarios as described in section 4.6. Latency, connecttimes were the analyzed performance metrics. Vertical scaling had no significantimpact on latency and connect times of the application. Therefore, none of the con-sidered performance metrics were affected by vertical scaling. It is hence concludedthat there is no negative impact on the containerized application’s performance run-ning on kubernetes in terms of latency and connect times when vertical scaling isperformed.

38

Chapter 6. Conclusions and Future Work 39

For future work, autonomous vertical scaling would qualify as an interesting re-search area. It would involve running a container resource monitoring tool such ascAdvisor continuously to collect compute resource usage of the containers. Based onthe changes in CPU and memory utilization metrics due to dynamic workload, themodel should increase or decrease compute resources to the containerized applica-tion. Another interesting research area is to investigate different machine learningtechniques that can predict future workload demand and trigger vertical scaling au-tomatically.

References

[1] D. Chamberlain. Containers vs. virtual machines (VMs): What’s the difference?[Online]. Available: https://blog.netapp.com/blogs/containers-vs-vms/

[2] C. Chang, S. Yang, E. Yeh, P. Lin, and J. Jeng, “A kubernetes-based monitoringplatform for dynamic cloud resource provisioning,” in GLOBECOM 2017 - 2017IEEE Global Communications Conference, pp. 1–6.

[3] Viewing pods and nodes. [Online]. Available: https://kubernetes.io/docs/tutorials/kubernetes-basics/explore/explore-intro/

[4] Copying an AMI - amazon elastic compute cloud. [Online]. Available:https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/CopyingAMIs.html

[5] What is amazon VPC? - amazon virtual private cloud. [Online]. Available:https://docs.aws.amazon.com/vpc/latest/userguide/what-is-amazon-vpc.html

[6] T. Wang, X. Chang, and B. Liu, “Performability analysis for IaaS cloud datacenter,” in 2016 17th International Conference on Parallel and Distributed Com-puting, Applications and Technologies (PDCAT), pp. 91–94.

[7] D. r. t. s. says. What is scalability in cloud computing? [Online]. Available:https://linuxacademy.com/blog/cloud/scalability-cloud-computing/

[8] What is amazon EC2? - amazon elastic compute cloud. [Online]. Available:https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/concepts.html

[9] Containerization crash course - what is a container? - blog | BoxBoat. [Online].Available: https://boxboat.com/2018/12/11/what-is-a-container/

[10] Pods. [Online]. Available: https://kubernetes.io/docs/concepts/workloads/pods/pod/

[11] Deployment | kubernetes engine. [Online]. Available: https://cloud.google.com/kubernetes-engine/docs/concepts/deployment

[12] Namespaces. [Online]. Available: https://kubernetes.io/docs/concepts/overview/working-with-objects/namespaces/

[13] Services. [Online]. Available: https://kubernetes.io/docs/concepts/services-networking/service/

40

References 41

[14] Y. Al-Dhuraibi, F. Paraiso, N. Djarallah, and P. Merle, “Autonomic verticalelasticity of docker containers with ELASTICDOCKER,” in 2017 IEEE 10thInternational Conference on Cloud Computing (CLOUD), pp. 472–479.

[15] Amazon EC2. [Online]. Available: https://aws.amazon.com/ec2/

[16] Working with amazon s3 buckets - amazon simple storage service. [Online].Available: https://docs.aws.amazon.com/AmazonS3/latest/dev/UsingBucket.html

[17] Apache JMeter - apache JMeter™. [Online]. Available: https://jmeter.apache.org/

[18] Apache JMeter - user’s manual: Building a JMS (java messagingservice) test plan. [Online]. Available: https://jmeter.apache.org/usermanual/build-jms-topic-test-plan.html

[19] X. Xie, P. Wang, and Q. Wang, “The performance analysis of docker and rktbased on kubernetes,” in 2017 13th International Conference on Natural Compu-tation, Fuzzy Systems and Knowledge Discovery (ICNC-FSKD), pp. 2137–2141.

[20] Y. Li and Y. Xia, “Auto-scaling web applications in hybrid cloud based ondocker,” in 2016 5th International Conference on Computer Science and NetworkTechnology (ICCSNT), pp. 75–79.

[21] C. Kan, “DoCloud: An elastic cloud platform for web applications based ondocker,” in 2016 18th International Conference on Advanced CommunicationTechnology (ICACT), pp. 478–483.

[22] X. Tang, F. Zhang, X. Li, S. U. Khan, and Z. Li, “Quantifying Cloud Elasticitywith Container-Based Autoscaling,” in 2017 IEEE 15th Intl Conf on Depend-able, Autonomic and Secure Computing, 15th Intl Conf on Pervasive Intelligenceand Computing, 3rd Intl Conf on Big Data Intelligence and Computing and Cy-ber Science and Technology Congress(DASC/PiCom/DataCom/CyberSciTech),Nov. 2017, pp. 853–860.

[23] T. Lorido-Botran, J. Miguel-Alonso, and J. A. Lozano, “A Review ofAuto-scaling Techniques for Elastic Applications in Cloud Environments,”Journal of Grid Computing, vol. 12, no. 4, pp. 559–592, Dec. 2014. [Online].Available: https://doi.org/10.1007/s10723-014-9314-7

[24] “Performing a Rolling Update.” [Online]. Available: https://kubernetes.io/docs/tutorials/kubernetes-basics/update/update-intro/

[25] “Deployments.” [Online]. Available: https://kubernetes.io/docs/concepts/workloads/controllers/deployment/

[26] Managing compute resources for containers. [On-line]. Available: https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/

Faculty of Computing, Blekinge Institute of Technology, 371 79 Karlskrona, Sweden

Performance Analysis of the Impact of Vertical Scaling on …1333534/FULLTEXT02.pdf · 2019. 7....

Documents

Transcript of Performance Analysis of the Impact of Vertical Scaling on …1333534/FULLTEXT02.pdf · 2019. 7....